The Immutability Fad

Introduction

The programming community goes through fads regularly, and they start repeating after a while. It’s very similar to the fashion industry, where at any moment there is a force to be both not wildly different than what everyone else is doing (out of fashion) while at the same time being slightly different (fashion forward), which sets up an oscillator on variables like tie width, tight vs. loose fit, etc.

I suspect the programming community is going to see-saw back and forth between the object-oriented (OOP) fad and the functional (FP) fad in perpetuity. OOP was “in” in the 90s and 00s, and FP started taking over in the 10s and is now very much “in”. Just look at all the new GUI frameworks these days. Despite the fact the programming is aimed at solving an objective problem in a way that fashion is not, it may seem strange that an engineering discipline would experience the same aesthetics-driven phenomenon. But the same forces are at play: a balance of conservatism (this is the official way everyone does it, if you do it differently other coders won’t be familiar with it) and progressivism (programming is hard, and if it’s at all the fault of patterns we’re following that it’s this hard, doing things differently is the way to make it not hard).

They don’t balance, so a gradual evolution occurs. But it’s not the Whig theory of history: the progressivism doesn’t just plainly make things better over time. The source of today’s difficulty is misdiagnosed, a lot of it is simply wishful thinking (trying to make programming Simply Not Hard™ anymore), so it really does come down to “different for the sake of different”, and because it doesn’t really make things any less hard, once those different patterns become the norm, and it turns out things are still hard, people are going to want to be different going back in the opposite direction.

The primary mechanism by which this occurs in programming is that old patterns get followed for years, giving birth to large, complex years-old codebases that follow those patterns, implementing the behavior of very complex, sophisticated programs (because entire “product” departments of businesses are paid to do nothing but invent more complexity to add), and gradually accumulating cruft, tech debt and design warts due mostly to a combination of being rushed by the business and being built by novice or low-skill programmers.

The revolutionaries present these complex codebases alongside tiny, trivial “sample apps” contrived solely for the purpose of demonstrating their patterns as better, and say “behold! Look how much simpler this tiny sample app is compared to the production codebase you’ve been working on!” People buy this argument, spend years building real production apps with the new patterns, they degrade for the same reasons, and people learn the hard way that they, too, are complex and messy. Rinse and repeat.

This is obviously what’s going on because the propaganda of “Paradigm X is dead, long live Paradigm Y” is laced with wild promises (precisely that the new paradigm will make programming Simply Not Hard™). As time goes on, these sales pitches become easier to believe because they’re mostly refuting the unrealistic sales pitch of the previous paradigm shift. OOP promised to make programming Simply Not Hard™, it certainly didn’t do that, so FP is now here demonstrating (correctly) that OOP failed its promise, giving the discussion an air of authority, and making it easier to believe that FP is the true way to make programming Simply Not Hard™.

I’m not suggesting genuine progress doesn’t happen. It obviously does. No one’s writing articles titled “Compilers are Dead, Long Live Assembly”. There’s little controversy over genuine progress. The controversy is over stuff that’s either largely a matter of personal taste, or a battle of mutual false promises driven by people who don’t understand why programming is hard and that it never won’t be (it’s hard because it’s logical problem solving, where all tiny details need to be worked out and accounted for. This is never going away).

Mutability: Ontological Evil Manifest

One major, if not the central, feature of the FP fad is immutability. Apparently, mutability in computer programs is the source of pretty much anything bad. Mutability causes logic errors, destroys thread safety, makes programs harder to reason about, contributes to global warming and probably harbors extremist political views.

You know this belongs to the “oscillating fashion trend” category because of these ridiculous promises, like that if you program in an immutable functional style you’ll basically never have logic errors or thread safety issues. I’ll explain later the specific fallacy people use to justify this claim.

I’m not saying this because I’m an OOP fanatic. I think both of them are fads.

I’ve always found “mutability is bad” to be a strange assertion. Taken to its logical conclusion, yes a program working with only immutable state would surely be much, much easier to reason about and not suffer from any logic errors or race conditions… because it wouldn’t do anything. A completely immutable GUI app is otherwise known as a PDF. It just sits there. It displays state that doesn’t and can’t change while the program is running.

But this has to be a straw man. The FP crowd isn’t telling people to stop writing, well… programs. They’re claiming that somehow behavior, dynamism, change, or what I ask you to please recognize is a synonym for all of those: mutation, can be achieved in a paradigm of total immutability. It sounds like a naked contradiction. What’s going on here?

Well, if I have a some state, S1, even if it’s immutable, I can create new state, S2, by calling a function that takes S1 as an input and returns S2 as an output. I didn’t mutate S1, I just created a similar but different in a few ways state S2. S2 is also immutable. If I want something in my program to change, I create another new app state, S3, from S2.

Okay fine, each instance of this app state is immutable, but if what I’m seeing on screen is supposed to change, then certainly the state of the UI is changing. Somewhere the UI has to track the current app state, and that is going to mutate from S1 to S2 to S3, right? I haven’t eliminated immutability, I just concentrated it in the reference to a “current” value instead of in the individual fields of the value (we’ll explore the implications of this difference more below).

Ah, but we continue the argument into the UI! The UI state isn’t mutating, there’s another function that takes an app state S and returns a UI state U. For each app state S1, S2 and S3, each one is put into this function to create corresponding UI states U1, U2, and U3. Just as the app state is not mutating but simply new slightly different copies are being created, so too is the app state not mutating, new copies are just getting created.

Um… okay (I’m starting to get slightly annoyed now, you’ll see why soon). So you make new immutable copies of the UI state every time the app changes (*cough* mutates *cough*) what it displays. But you’ve just shifted the problem again, not eliminated it. Only one of those states is going to be sent to the frame buffer, which is what the computer hardware is actually drawing on my computer monitor. So the frame buffer is still going to have a current UI state, and that is going to be mutated by setting it to U1, then U2, then U3, etc.

But you don’t understand! We simply keep pushing forward in the same way. The frame buffer isn’t mutable! There’s a function that takes a UI state as an input and returns a frame buffer as an output. Each time a new copy of the UI state is created we just use that function to make a new copy of the frame buffer.

Oh for f*ck’s sake…

Alright, so you’ve got copies of the frame buffer being created. You still have to decide which one is read by the computer hardware and sent to the monitor at any given moment. You have to have a current frame buffer, and that will mutate!

This is where it’s going to get plainly ridiculous. This isn’t a straw man. Hold onto your hats. Here we go:

There isn’t one monitor! As every moment in time passes, a new copy of your computer monitor is created, each one potentially being slightly different than the last copy. New copies of your monitor are being created on every tick of time in the physical universe in which we live. Those copies are created by calling a function that takes the copy of the frame buffer created from the copy of the UI state created from the copy of the app state, and since each copy can be slightly different than the previous one, each copy of the monitor for each moment in time will be displaying something slightly different.

Reality is, in fact, entirely immutable! Mutation is just a pedestrian way of modeling things. The brilliant FP philosophers have realized that nothing changes, instead for every instant in time there is a copy of the entire universe, created from a function that takes the previous copy as an input. “Mutability” would mean you could go back in time and edit the past. But you can’t! The past, meaning already constructed copies of the universe, is immutable. You can only move forward in time, which means creating more slightly different copies of the universe that existed in those past moments.

I have a phrase that I do not hesitate to use when I start hearing stuff like this, and I spent four years in college, so I’ve heard plenty of it:

“This is so stupid only a smart person could have come up with it.”

I mean, seriously, does anyone really conceive of the world like this? If you paint your car, you aren’t changing the color of your car, you’re just causing the copies of the your car that get produced every Planck time unit to be slightly different from the previous one by having one extra microscopic stroke of new paint? And you believe that’s anything but an extremely convoluted and confused way to say “you changed the f*cking color of your car”?

But let’s be fair, I’m just having an emotional reaction to this. One that I think is entirely warranted (who’s more foolish, the fool or the fool who debates him?), but nevertheless my “reeee”ing doesn’t count as a refutation. So let’s formally refute this once and for all.

Even if it is true that the universe creates copies of the entire state of all physical objects and sticks them in a dictionary mapping instants in time to states of all physical objects, this still fundamentally cannot eliminate mutability. First of all, does this dictionary mutate each time the clock ticks and a new moment in time is created? Well, maybe you’re an ultra-Calvinist and you think everything is perfectly predetermined, the dictionary is already fully built up from the beginning to the end of time itself. If not, you’d have to concede that on every tick, new copies of this dictionary are created, discarding the previous one that contained all time-state pairs up to the previously latest moment in time, with the new one containing all the same time-state pairs as the previous, plus a new one for the now latest moment in time.

Regardless, we now come to the crux of the matter. Whether new dictionaries are being created constantly, or there’s one ultra-predetermined dictionary, there’s still one thing in this ultra-immutable universe that must be mutable:

Now

Whatever instant in time is considered to be “now” has to mutate. The clock has to tick. Trying to model this as immutable gets stuck in an infinite regression. “No the moment that is now is copied on every new moment”, but you have to pick which one is the actual “now”, instead of a historical “now”. If all the “now”s are a stored in an array where each new copy is one larger than the last, you still have to say that the true “now” is the last element of the array… but which array (what if you hold all the arrays in another array, and say the current one is the last element? Again, that array changes when a new array is added, and we’re back to the same problem)? All the historical ones are still out there. You have to assign a “latest”, and that’s a mutating variable.

Honestly we’re doing rigorous super-abstract philosophizing to remind ourselves of what everyone already knows: things change. You aren’t experiencing all moments in time. You experience a particular moment at a particular moment (this is already becoming unworkably circular), and that particular moment changes. That’s what time means. Time is the parameterization of mutation (our minds recording snapshots of the changing state of physical objects and storing them in order). Modeling the universe as copies being keyed by moments in time is failing to recognize what time is. At best those keys are proxies for time, not time itself.

Yes you store all the frames of a movie sequentially in memory somewhere, where the index of a frame indicates the time of the frame. But that data is static. Those characters aren’t alive, they aren’t moving, they don’t do anything. The index isn’t literally time, it’s literally an index in an array, an offset (meaning a location in space) in memory. The only way to make it come alive is to play the movie, which means to create a connection between that index and literal time: you use actual time, what an actual clock tells you, to pick a frame to show. Because clocks tick, because the time they read changes, the displayed frame changes. Suddenly the movie becomes dynamic, a living simulation of whatever was filmed, not a dead one that sits still in memory banks.

You can never eliminate the mutability of “now”. If you did, the universe would simply be the frames of a movie sitting in memory banks somewhere, but with no external time (there’s nothing external, this array of frames is everything that exists), there’s no way, it’s not even meaningful, to “play” the movie. This describes a dead universe where nothing happens, nothing changes, nothing mutates.

Of course it does. Of course immutability implies nothing mutates! Come on!!

What Does “Immutable” Actually Mean?

Now, what was the point of that exercise? I thought we were trying to write software, not get stoned and talk about the meaning of existence.

Strained sophistry notwithstanding, it’s plainly not true that there are multiple frame buffers. There’s one (well, probably two or three, to prevent visual tearing), it has an address in memory, and its content is changed (mutated) by your running app. The “interpretations” to make that immutable only prove that “viewing things as immutable” in this way is actually not different than viewing things as mutable. The whole distinction becomes almost meaningless. Where and how it remains meaningful is what we need to talk about now.

It’s not hard to spot the obvious contradictions in articles about this stuff. Look at this article, not just at the title but within:

However, if you’re using Redux, you need to know how to modify your state immutably (without changing it).

Does anyone else not find this sentence hilarious? How to change something unchangingly without changing it!?

This plainly contradictory phrase appears in none other than the Redux docs themselves:

It’s also not obvious just by looking at the code that this function is actually safe and updates the state immutably

Update something immutably!? What are these guys smoking?

Yes I know what it actually means, but they’re obviously using the wrong words for it. None of this has anything to do with immutability. To say that in redux, you can only mutate the app state through reducers, is saying something meaningful, and in fact very important, it just isn’t about immutability. The app state can obviously mutate. But no one has unfettered permission to do so.

See, that’s the point. Mutability is controlled. You don’t just give naked write access to all the fields inside your app state to everyone who wants to read the app state. You make them go through functions that limit how mutations occur. You protect your app state by making sure it mutates only in a limited set of ways that you allow.

Now someone’s going to tell me I’m wrong, that in Redux the app state really is immutable, and that reducers don’t mutate state, they emit new values of state. Yes there’s a real distinction there, we’ll talk about that below, but I’m not wrong to say the state that comes out of the reducer gets reassigned back to the mutable reference-semantics state variable inside the Store, and thereby mutates what the app considers to be the state.

The difference in Redux isn’t the mutability of the app state. It’s who is allowed to do what mutations on the app state. We have a word in programming for this concept, and it’s not “immutable”. It’s “encapsulation”.

All this is telling us is that the app state is encapsulated. It’s a specific type of encapsulation in redux patterns: reading is not encapsulated, but writing is. You could choose to encapsulate reads as well, but the overarching point of this really seems to be that encapsulating mutation is far far more important than encapsulating reading.

I absolutely agree.

But see, “encapsulation” is one of the “four principles of object-oriented programming” (whether it’s legitimate for OOP to lay claim to it is another matter), so FP fanboys can’t be caught using that word.

That’s what’s kind of funny, and at times confusing, about these discussions. They get framed as big philosophical defenses of paradigms that, taken at face value, are totally wrong, but the framing itself is also wrong, and these two wrongs end up producing rights: it’s wrong to say that immutability solves these problems, and it’s wrong to frame redux’s limitation of mutation as a form of immutability (obviously), and these combine to a correct statement “it’s good to protect writing to your app state much more strongly than reading from it”.

Of course there are plenty of ways to do this that don’t require reducers. Make your fields public get and private set, and define your “reducers” to just be public functions on your app state type. The only difference between this and redux is that you don’t save copies of your state before updating it. But you easily could, just make it observable, subscribe to it and save every published update to an array somewhere.

I’m not saying you should or shouldn’t pick one or the other, I’m just saying the raison dêtre of redux is not at all an essential part of that pattern, it’s something that’s achievable in a myriad of ways in all widely used programming languages in the industry. Once that’s clear, it really makes you wonder: what’s the point of all of it? If reducers and actions is just a (very convoluted if you ask me) way to create data that’s publicly readable but not publicly writable, with some public methods for modifying it, why not just make a class?

Well, there’s a frequent assertion I hear from the “OOP sucks, let’s all write enterprise software in Haskell” people: the the hallmark of OOP, and the principle reason it sucks so hard, is that it “pairs data and functions together”. The article by Joe Armstrong is a canonical example.

The complaint isn’t about the irrelevant syntax difference between x.doThis(y) and doThis(x, y), or the fact that functions for a class are written inside the class, next to data. No, functions and data are “paired” in a more significant way than the mere mechanics of OOP languages: the methods are the only functions that pierce the encapsulation boundary of the object. Any language that supports encapsulating data will support pairing functions with that data (otherwise, that data becomes literally inaccessible to every part of the program, and might as well not exist).

Now, the weird part about this is that the functional languages they praise often support this. Both Haskell and OCaml call them “modules”, and they exist to create encapsulation, which is the fundamental mechanism of creating abstractions, i.e. an X that is simply an X and not the sequence of bytes that happen to be used, today, to implement an X. This is literally pairing data with functions, in the exact same way than OOP class closes over its fields (data) and members (functions). I don’t think it has ever been stated by FP language designers that encapsulation, which involves pairing data and functions, is somehow incompatible with a paradigm of immutable values and pure functions. The two seem quite unrelated.

In that article Joe Armstrong, after declaring that data and functions “should not be bound together” (because they aren’t identical concepts… I’m not sure how “X should not be bound with Y” logically follows from “X is not Y”), goes on to directly attack encapsulation. This section is honestly headache inducing. The argument seems to be: FP languages eliminate all state from a computer program by exposing all state as input parameters and return values of every function, which hides all the state by giving you full access to the state, which is good because hiding state is bad.

It was reading stuff like this that made me start to think all the Redux immutable functional stuff I keep encountering, and consistently not knowing what the hell the point is, is just a bunch of deeply confused gibberish from people who’ve managed to convince themselves they write computer programs that do something other than mutate state, and defend this by saying everyone through all of history has fundamentally misunderstood how the universe works and really there’s no such thing as state or change at all (if that’s true, surely OOP programs aren’t mutating state either, because state and mutation don’t exist, they’re just emitting new copies of the heap and stack on every clock cycle).

The Real Problem: Reference Semantics

Anyways, I can’t help but conclude that the insistence to do Redux instead of just writing a type with private fields and controlled methods for modifying those methods is due only to this confusion that: pairing mutable state and functions into a class is an anti-pattern that causes unnecessary bugs, but we need encapsulated data that can change over time so we still need a way to do this but in a way where we can, with a liberal dose of squinting, tell ourselves that everything is immutable and we’ve kept data and functions completely separate.

Look at this diagram:

This is literally just a class. The “state” is a private field, and the “actions” are methods:

class State {
  #account = { ... };

  deposit(amount) {
    #account.balance += amount;
  }

  ...
}

Wrap it in an Rx Observable (or some comparable equivalent) and now you’ve got your subscribeTo capability, which you can use to store a growing array of the historical values.

There’s a problem though.

Recall that Redux originated in JavaScript. Well, because of the way objects in JavaScript (and, in fact, most OOP languages) work, simply wrapping a State instance an RxJS Observable won’t correctly publish an update event when we call one of the methods:

const state = BehaviorSubject(new State());

state.subscribe(state => console.log(`New state: ${state}`));

state.value.deposit(10); // Doesn't trigger the log statement

Why not? Well, because of “mutability”… but that’s not the whole story, and saying it like that is confusing because we’re trying to mutate the state, how could mutability be the problem!?

No, the problem is that the mutability is on the internal #account field. Actually that’s not even right. What we’re mutating is the balance member inside the #account. The State instance itself doesn’t change, not even #account changes. Notice that we’re not calling .next anywhere to publish a new value. We’re reaching inside the BehaviorSubject, getting its value, which is “read-only”, but we are just reading it, but we’re calling deposit, which doesn’t mutate this, it mutates stuff inside of this.

Really, making value read-only is almost useless, at least when it’s holding an array or object. Claiming that the value is read-only, but that you can reach inside of it and change whatever you want (as far as the type’s encapsulation allows) is borderline fraudulent. It’s not necessarily bad modelling that I can go through an object’s member and make modifications to it while claiming I didn’t modify the object, but in many cases it most certainly is bad modelling.

For example, consider me as a value. I have two members: my left arm, and my best friend. If you replace my original left arm with a cyborg arm, have you mutated me? Absolutely! If you go to my best friend and replace his arm with a cyborg arm, have you mutated me? No. What’s the difference? My leftArm member is a value, the arm itself, but my bestFriend member is not the person himself (including all his members, like his left arm), it’s a reference to the person. Changing something about the person who happens to be my best friend doesn’t change me. What would change me is to change who my best friend is: that is, to change the reference itself stored in bestFriend.

This is the well-known problem of “value vs. reference types”. Objective-C, Java, C#, Swift and JavaScript treat the distinction on the metatype level (they have types of types that are values are references respectively, i.e. class and struct), and only C# and Swift let you define custom value types (unless you count custom C structs in Objective-C, which you usually don’t because they don’t interop well with Objective-C). For the rest, the only value types are the handful of primitives that come with the language. C/C++ treat the distinction on the type level (all types have value semantics, but some primitive types, namely pointers, provide referential access to other values. C++ references are a sort of unique oddball). Kotlin defines a value type as a class that obeys certain conventions (strictly speaking they always have reference semantics, but if you make them immutable enough you can’t tell that they have reference semantics… interesting, there’s that word “immutable” showing up in a discussion of references vs. values).

In JavaScript, an object has to be a “reference type”, which means you can’t actually work directly with object values. When you assign a variable to an object, you’re really assigning the variable to a reference to an object instance. This means you have no choice but to model members of one object that are themselves objects as being references. If you model an arm in JavaScript as an object, then my leftArm can’t be the arm itself, it has to be a reference to an arm, in the same way bestFriend is a reference to my best friend. Then, just as modifying my best friend doesn’t modify me, neither does modifying my arm modify me. Sure, replacing my arm, as in assigning my leftArm to refer to another arm instance modifies me, but I can go inside the existing arm reference and change stuff without changing me.

The problem repeats on every level. If an arm is an object made up of members, if those members are also objects, you can once again go inside the referenced instances and change stuff without changing the arm, and repeat if those objects’ members are once again objects.

Being forced to model all complex objects, on every level, as references, is a major restriction in a language’s ability to model things. And it is a direct consequence of OOP fanaticism in the 90s: if “everything is an Object“, well what exactly is an Object? It’s not just a synonym for “thing”, that would make it meaningless. This means a lot of things, including that literally every type you can define inherits an Object type… and this implies one key capability of Objects is inheritance, which is assumed to imply polymorphism (that if I have a variable of type BaseType, I can assign to a DerivedType), and the straightforward way to implement variables that work this way is as references (literally pointers is how C++ and Objective-C did it, and Java imitated C++ in this way).

So, there’s a legitimate point buried under all the pseudo-philosophical gobbledygook about “stateless immutable programs”: a misplaced insistence that “everything must be an object”, which specifically means every variable must really be a polymorphic reference, caused mutability to be shattered and spread across the model when in many cases this is incorrect and mutability of parts should be equivalent to mutability of the whole, which also implies those parts should have value semantics: copying the whole should also copy the parts.

The problem is not the existence of “mutable state”: that’s called a computer program. The problem is an overly restricted, and therefore often wrong, model of mutability, wherein the parts of something can always be mutated without considering it a mutation of the thing itself, and there’s no way in these languages to express that no, that’s not right, in this case they are one and the same.

Since we can’t express this directly in these languages, what do we do instead? We enforce it in a more ad-hoc manner. We simply can’t make object members have value semantics. But what problems does that cause? It makes it possible to mutate those objects even on “immutable” (read-only) instances of the containing object, and it means when we copy the containing object the member objects don’t also get copied.

The second problem isn’t really a problem because it’s reference semantics all the way: the containing object never gets copied in the first place, unless we specifically call a copy function in our code. Since we must define this copy function ourselves, we can decide there that it will copy the member objects too.

The first problem is about mutability of the members. We fix this with encapsulation, in order to make anything that mutates them without also mutating the containing object impossible to do from the outside, and then we just avoid doing those things in the class itself. What does that take? In our example above, the trouble comes from the deposit function that mutates the #account field. We must not write any methods that mutate fields. We instead replace the “mutating” method with an “immutable” flavor of it, which creates a new value:

class State {
  #account = { ... };

  deposit(amount) {
    var result = new State();
    result.#account = #account;
    result.#account.balance += amount;
    return result;
  }

  ...
}

But this isn’t correct yet. We’re assigning the #account of the copy to our #account, which means we’re assigning the reference. That means we’re sharing the value: when we mutate balance, it affects both the copy and us. We have to also copy the value of #account first. Well how do you do that? If you did a memberwise assign with a spread ({ ... #account }) you copy the top-level object, but not any members that are themselves objects. You’d have to do the same to those, and so on.

Is there a way to do this in general for any object in JavaScript, so that a deep copy is made on all levels of members, ensuring you have a fully independent instance? Yes: serialize it to JSON, then deserialize it back to an object. But that also might not be correct. Some of those members might genuinely be references (this doesn’t make the containing object mutable as long as the object being referred to follows the immutability rules itself), and you don’t want to copy those. Ultimately, you might have to make a bespoke copy function for your structure that understands at each level whether an object is a reference or a value. This is because the language has no idea about the distinction and can’t discover it.

This is already turning into a major pain in the a**. So I’m starting to have more sympathy for JavaScript developers inventing the Redux stuff instead of dealing with all of this.

Anyways, once we are properly copying the #account object, we have to actually assign this value, which for the BehaviorSubject means calling .send:

const state = BehaviorSubject(new State());

state.subscribe(state => console.log(`New state: ${state}`));

state.send(state.value.deposit(10)); // Triggers the log statement!

What if we want to be able to read part of the state, like the value of #account, from the outside, but not be able to write to it? If #account is an object, returning it in a getter returns a reference to it, and then we can fiddle with its insides. Instead we need to return a copy. Luckily we already had to figure out how to copy the thing, so we just use that:

class State {
  #account = { ... };

  get account() {
    return #copyAccount();
  }

  deposit(amount) {
    var result = new State();
    result.#account = #copyAccount();
    result.#account.balance += amount;
    return result;
  }

  ...

  #copyAccount() {
    ... // Do whatever it takes here
  }
}

This way, the object that comes out is guaranteed to be independent of the one inside the State object, so no fiddling with it will affect that object.

Alternatively, and I think this is a better approach, we can define classes on every level, and design them all to be immutable. Then instead the custom copying logic, i.e. what is value vs. reference semantics, manifests in what members are immutable types or not. If you want to alter an immutable-type member inside of an immutable type, you have to make a new instance of the member type, and then make a new instance of the parent type. That means you have to write a similar function on the member type that creates a near-copy, or “mutator” (it returns a new value that is identical except for one part that you specify to be different), and this repeats until you get down to primitives.

This shifts the custom copying logic from deciding which members, upon making a copy, need to be deep-copied (and those members then determine what deep copying means), to different copies sharing instances of members but those instances cannot be modified in any way, so they act as semantic copies (you can’t tell they’re being shared), and you only make a deep copy when you need to make a modified instance. Both achieve the same result of distinguishing members that are the values themselves from members that are really references.

Here’s an example with the “mutator” methods that produce near-copies:

class State {
  #account;
  ...
  
  constructor(
    account, 
    ...
  ) {
    this.#account = account;
    ...
  }

  get account() {
    return #account; // No need to copy, because Account is immutable
  }

  deposit(amount) {
    return new State(
      this.#account.deposit(amount),
      ...
    );
  }

  ...
}

class Account {
  #balance;

  constructor(balance) {
    this.#balance = balance;
  }

  get balance() {
    return #balance;
  }

  deposit(amount) {
    return new Account(this.#balance + amount);
  }
}

If we upgrade to TypeScript (which of course you’ve done, right? Right?), it’s even better because we can actually (sort of) enforce some of this “immutability”, or at least express that that’s the intention, and that eliminates the need for getters, we can expose fields directly:

class State {
  readonly account: Account

  constructor(account: Account) {
    this.account = account
  }

  deposit(amount): State {
    return State(account: this.account.deposit(amount))
  }
}

class Account {
  readonly balance: number // Oh yeah, storing a bank balance in floating point format, what could go wrong!?

  constructor(balance: number) {
    this.balance = balance
  }

  deposit(amount: number): Account {
    return Account(balance: this.balance + amount)
  }
}

You can be sure that any class you write is immutable as long as it is composed only of primitives and readonly class/array members… but keep in mind that those class/array members will have reference semantics unless you ensure the type of that member is an immutable type. For example, the array type you get out of the box in JavaScript is not an immutable type, so if you have an array member of your class, even marked as readonly, it is a reference to a particular array. If you can obtain the member through a getter, you can then start modifying the array.

If you want an array (or tuple) value type, TypeScript gives you one and even supports nice syntax for it. They also give you read-only (immutable) versions of other collections like Map and Set, and a Readonly<T> wrapper for any type.

If you aren’t using TypeScript you have to build a read-only array yourself:

class ReadonlyArray {
  constructor(values) {
    this.#values = values;
  }

  *[Symbol.iterator]() {
    yield* this.#values;
  }

  get(index: number) {
    return this.#values[index];
  }

  set(index, value) {
    let values = [...this.#values];
    values[index] = value;
    return new ImmutableArray(values);
  }

  append(values) {
    return new ImmutableArray(...this.#values, ...values);
  }

  remove(index) {
    let values = [...this.#values];
    values.splice(index, 1);
    return new ImmutableArray(values);
  }

  #values;
}

Before you get any crazy ideas (and despite what the FP fanatics say), you can’t, and wouldn’t want to try to, make everything an immutable value type. Your program is going to need to work with identifiable things. You need to choose judiciously which members of which types are values (they obey the immutability rules) and which are references (they are mutable arrays/objects), which you do by choosing the types of those members to be immutable types or mutable types. But what if you want to define a particular type and use it as a value in some places and a reference in others? Following this approach you’d have to write two versions of the type, an immutable flavor and a mutable one (much like the two flavors of arrays in TypeScript).

Another approach is to define all of your types as immutable, making them represent values not references, then define a dedicated type for a reference:

class Reference<T> {
  protected _value: T

  constructor(value: T) {
    this._value = value
  }

  get value(): T {
    return this._value
  }
}

class MutableReference<T> extends Reference<T> {
  set value(value: T) {
    this._value = value
  }
}

This allows you to more finely control encapsulation. You can define a MutableReference<T> instance somewhere, and hand that out to whoever should have write access to that identifiable object. Then you can hand it out upcasted to a Reference<T> to anyone that should only be allowed to read it (but will see modifications those with write access make to it). Done with inheritance this is easy to defeat by simply downcasting, but if that’s a concern to you, you can do this with composition as well (get rid of the inheritance and make Reference<T> hold a private MutableReference<T> member that it forwards the value getter to).

This approach is really an attempt to emulate what C++ gives you out of the box: define all your types like MyType to have value semantics then use MyType* (or a smart pointer like shared_ptr<MyType>) as a mutable reference and const MyType* (or a smart pointer like shared_ptr<const MyType>) as an immutable reference.

Well, if the solution to the problem is to make State (and sub-objects, and so on) an “immutable” type, then was the problem actually “mutability”? Well, not the existence of mutability, but its distribution. We aren’t eliminating mutability, we’re bringing several little islands of mutability together into a big island nation of mutability.

Since the language makes it so difficult to model deep trees of aggregates as a single value whose parts cannot change without that constituting a change to the whole, yeah… Redux starts to make a little more sense. But it’s ultimately not solving the problem. In Redux you get a “copy” of the current state passed into the reducer. Well that’s whatever type of object you initially specified, and if you fiddle with it’s insides, you’re fiddling with the internals of the store’s private state instance. And then stuff just breaks. It still comes down to “just don’t do that”. If you want to prevent this, you have to go through the trouble of defining classes that follow the immutability rules.

So… this doesn’t really make things any better, and you have to do silly stuff like define enums with a 1-1 correspondence to functions, and then write out the switch statement confirming that the two are, in fact, 1-1 (why define a new concept just to make it equivalent to an existing concept and then spell out the identity mapping between them?). Maybe this is because people got a vague notion that the original problem has something to do with mutability, they heard about these “functional languages” where everything is immutable and it’s impossible to have mutability bugs, and the way you model the state (which mutates as the inputs come into the program) in a functional program is as a list built out of a reducer over the sequence of program inputs. So then the solution to this mutability issue in JavaScript must be to model application state with reducers the same way you’d model it in a Haskell program.

Blaming mutability is not thinking about this carefully and understanding the problem is not mutability itself but reference semantics and the implication that has for how mutability is distributed and linked (or not linked) together. Understanding that this is the root cause of the problem makes it clear the proper solution is figuring out how to model aggregates as value types, and the fundamental mechanism of value semantics is copying, so you have to either define the copy function for your state object, or use immutability to ensure that shared references act as semantic copies (you can’t tell they’re not literal copies because you can’t modify one “copy” and see the modification in the other “copy”) and write “near-copy” (mutator) functions for mutations, which require any mutation to something deep inside your app state to cascade all the way up to a mutation of the entire app state.

Is It Just JavaScript?

How do you deal with this in other languages? In Objective-C and Java, it’s the same as JavaScript: you make “immutable” classes with hand-written copy-mutate methods and compose them. Objective-C makes this pattern more canonical as its core collection types (NSArray, NSDictionary and NSSet) are immutable (and have those hand-written copy methods). In C#, you get true value type aggregates called structs, which alleviates you of having to write the copy function yourself because the compiler can tell (by your choice of struct vs. class) whether a variable is the value itself or a reference and can therefore write the correct copy function for you.

In all these cases, syntactically this means replacing in-place mutation style code with whole-value replacement through immutable methods. This:

state.account.amount += 10

Becomes this:

state = state.withAccount(state.account.withAmount(state.account.amount + 10))

This is pretty close to the TypeScript example above. You’ll probably write a lot of helper functions in places where you have to reach down through multiple levels. That’s what the deposit methods are.

C# has this awkward problem where by default you can write the code in the first example, but it ends up creating a copy, mutating it and throwing it away. More specifically, when you call .account, assuming the result is a struct, you get a copy, not the one inside state, you change it’s amount, but it’s an unnamed temporary and just gets immediately thrown away. I’m pretty sure the compiler at least warns you about this. You can prevent this by marking the structs as readonly, and then this line becomes fully banned and you have to write the second line or some equivalent. It’s unfortunately this isn’t the default, so the conventional wisdom now is to just mark all your structs as readonly.

C++ is on par with C# here: you pick values vs. references by making the types of your members either T itself or T* (or some pointer-like type such as unique_ptr<T> or shared_ptr<T>). But it’s more powerful in two ways: first, it writes the default copy constructor for you, but you’re allowed to customize it too. You need to make sure you follow const correctness, i.e. if you return references to private fields, make sure it’s a const reference. Second, you immediately get to declare any field of any type to be a value or a reference, by choosing between T itself as the type or one of those above-listed “reference to T” types you get for free. Since it’s trivial to give a type reference semantics, a good rule of thumb in C++ is make all your types have value semantics (if it doesn’t make sense to copy it, delete the copy constructor), instead of writing classes that have reference semantics you can’t strip off.

As I mentioned, you can emulate this in C# (and Swift, which has the same struct vs. class dichotomy), by making everything a struct and writing a generic Reference<T> class that just holds a struct member, to use wherever you need reference semantics. You can’t change the fact other code doesn’t do this, but you can make your code follow this pattern. It’s an idea I’ve been increasingly exploring, particularly in Swift. The main downside is you can’t suppress copying… but Swift 5.9 introduced non-copyable structs!

The language that I think best solves this problem is Swift. The language designers realized that people want to write the in-place mutation style of code, but have that actually mean whole-value replacement that doesn’t sneak around the const-ness of the top-level value. So they defined mutating functions on structs (value types, like in C#) to work this way via a special inout function parameter. Since the “immutable style” so often involves reading a value, calling a function to produce a new related value, then assigning it back to where it came from, they formalized this pattern directly into the language. This way, you can write the in-place mutation line and it will, in fact, replace the whole value and call whatever setter the top-level read-write variable defines… which implies if the top-level variable is read-only (doesn’t have a setter) you aren’t allowed to write this line.

This:

struct State {
  private(set) var account: Account = .init()

  mutating func deposit(amount: Decimal) {
    account.deposit(amount: amount)
  }
}

struct Account {
  private(set) var balance: Decimal = .init() // Ahh, that's better.  No more mystery pennies.

  mutating func deposit(amount: Decimal) {
    balance += amount
  }
}

var state = CurrentValueSubject<State>()
let subscription = state.sink { state -> print("New State: \(state)") } 
state.value.deposit(amount: 10) // Line is printed!

Is equivalent to this:

struct State {
  let account: Account

  init(account: Account = .init()) {
    self.account = .init()  
  }

  func deposit(amount: Decimal) -> State {
    .init(account: account.deposit(amount: amount))
  }
}

struct Account {
  let balance: Decimal

  init(balance: Decimal = .init()) {
    self.balance = balance
  }

  func deposit(amount: Decimal) -> Account {
    .init(balance: balance + amount)
  }
}

var state = CurrentValueSubject<State>()
let subscription = state.sink { state -> print("New State: \(state)") } 
state.value = state.value.deposit(amount: 10) // Line is printed!

Really, the compiler turns the former into the latter. In the latter example, all the members of the structs are lets so they are formally “read-only” structs. But structs in Swift (unlike in C#!) are always read-only. Making “mutable lookalike” structs with var members just generates all these helper copy-mutate methods automatically and let’s you call them with mutation-style syntax.

Notice that the fact the CurrentValueSubject subscription fires is pretty obvious in the second example, because it’s explicit: we’re assigning value. But in the first example (without helpers we might have written state.value.account.balance += 10) it’s perhaps surprising, especially if you’re familiar with other languages and the fact the syntactically equivalent code in other languages would not fire the subscriber. That’s what is cool about this approach. You get this behavior you probably want for free, which is fundamentally because Swift lets you write the direct mutation syntax you want but ensures the side effects of mutating the whole occur even if you (apparently) mutate the part. No sneaking around invariants and triggers by reaching inside of stuff.

What’s funny is I’ve seen Swift devs scream “mutability, nooooo!!!” when they see the first code, after reading about how evil “mutable state” is, and then demand this be replaced with the immutable style of creating new values and assigning them back to the read-write value at the top… not realizing apparently they’re the exact same thing: Swift’s “mutation” of value types is just a more clear and straightforward syntax for achieving the exact behavior that the immutable “assign back” code spells out more explicitly.

“Eliminating” Bugs?

Now, here’s the fallacy that almost everyone uses when they make the “my pattern eliminates this whole class of bugs from your programs” arguments: it is usually true by definition, because that classification of bugs requires that you not follow their pattern. When you do follow their pattern, it doesn’t eliminate bugs, it just reclassifies them.

If we follow the pattern of making our state immutable, which we now understand doesn’t really mean immutable, it means particular instances of that state are immutable but the values meaningful to the program are emitted as copies-with-mutation by special functions like reducers, then yes, this will eliminate the “improper mutation” class of bugs in a formal sense. And it will replace them with “improper emission of an incorrect copy by a broken reducer” bugs.

Redux (even ignoring that it doesn’t stop you from reaching inside the state instances handed to you in the reducers) doesn’t stop you from writing the wrong reducer or dispatching the wrong action. So any idea that it somehow prevents you from incorrectly mutating the application state is obviously mistaken. Since the mechanism of achieving what is needed changes from X to Y, the bugs get reclassified from improper X bugs to improper Y bugs.

This same claim is made about reference semantics. If you force everything to be value semantics, well I’ve seen claims as fantastical as “you’ll never have concurrency issues again”. Plus reference semantics is “hard to reason about” and it creates a bunch of surprising and implicit coupling between far away parts of the program, and that means side effects (another naughty term in FP circles, synonymous with bad) that are not clear when reading part of a program (you see something being changed, who knows who else is referring to that thing and will be affected by the change).

The solution is value semantics, i.e. giving each part of the program its own copy, which ensures they aren’t coupled to each other (no side effects).

Well, okay, but what if they should be coupled to each other? Again, a program with no coupling between any parts is a program that doesn’t do anything interesting. Why are we calling everything a “side” effect? Aren’t some of those just effects? If I take ibuprofen, I don’t call the headache relief a side effect!

And that’s the problem. Value semantics doesn’t automatically eliminate bugs, it reclassifies bugs. After all, if you need coupling, i.e. if it’s correct for A to affect B by mutating X, then how would you do this in FP? The answer is by nesting a bunch of monads inside each other and zipping or otherwise merging together multiple lists to express that one list (representing what one part of the program is doing) is in fact a function of another list (representing what another far away part of the program is doing).

Once you’ve got that mindf*ckery in every place where you need coupling, I’ll bet it’s pretty damn easy to get it wrong. Then guess what: incorrect coupling. You’ve turned “A shouldn’t be coupled to B” into “I flatMapped over the wrong higher order function that produces the wrong list and zipped it into the wrong other list”. Yes, this makes the coupling explicit: you just read the program there and see that A and B are coupled… if you can read any of that in the first place, that is.

What’s more likely is that using value semantics will in fact reduce coupling in your program as intended, and do so far beyond what is correct. You’ll turn “too much coupling” bugs into “not enough coupling” bugs. Specifically, now that you’re passing copies of data around everywhere (that’s the whole point of value semantics, isn’t it?), your “side effects” bugs become “stale data” bugs. You end up with copies you don’t want, because there really should be one source of truth for some data, and you forget to update everyone’s copy.

These kinds of bugs caused by way too much caching (holding copies) of data in GUI apps are pervasive. All the React-ish UI frameworks are designed to prevent them (i.e. forgetting to update your view when state changes, which these new frameworks are supposed to achieve automagically). They don’t accomplish that with value semantics, they accomplish it with observability (i.e. reactivity) on reference-semantics “component state”.

The idea that all coupling is locally explicit might seem nice, but I’ll bet it doesn’t scale at all. There’s tons of (appropriate) coupling in very large, sophisticated applications. I don’t want to have to change tons of places to add a new explicit expression of coupling every time more coupling is added, and then be blasted with all that information every time I’m reading any part of it. That’s actually bad for the level of abstraction. Explicitness is another way of saying “implementation details spelled out”, which is the exact opposite of the direction you want to go in when you scale up software.

Then again, these guys explicitly don’t like encapsulation, so they probably think this is good.

Inappropriate coupling isn’t something I think you can solve with some blanket programming paradigm. It requires careful analysis of the system you’re building, and I think it’s primarily addressed by naming things well. If you’re coupling to something you shouldn’t be, it should hopefully be self-evident by reading the name of the what you’re coupling to. A big part of this is a tendency of coders to name variables after their type instead of what they represent. But this is confusing because it doesn’t tell you what that variable means to the program, and it’s especially bad for reference types because it isn’t telling you what that specific reference means to the entire program. It’s a specific instance that is used in other parts of the program. Instead of calling it theScreenCoords, you should call it primaryCursorScreenCoords, to make it clear it’s a reference to the special instance that means something special. If you follow that rule, hopefully you’ll notice when one class is either inappropriately holding a reference to that instance when it shouldn’t, or it is inappropriately mutating it.

Both OOP and FP are Turing complete language paradigms. Any OOP programming, including all of its bugs caused by mutability and reference semantics, can be translated perfectly into an equivalent FP program that behaves exactly the same, including all those bugs. How is this possible? How would that program have mutability and reference semantics bugs when it’s written in a language that does not have mutability or reference semantics? Because it translates those things into different constructs. Mutable state is transformed into a reducer over a list of inputs that produces a list of the different values that state takes on as the program runs. So mutation bugs get transformed into improper reduction bugs. Reference semantics is transformed into more deeply nested monads of lists that eventually get flatMapd and zipd together. So reference semantics bugs get transformed into improper flatMaping of monads and ziping of lists.

Now, maybe it’s the case that the buggy parts of the OOP program, which look totally innocuous in OOP code, are translated into super convoluted and plainly wrong FP code that no one would write. That’s the hope: that while it isn’t impossible to recreate the undesired behavior of the OOP program, it’s very easy to create it in the OOP language but very hard to create it in the FP language, and when you do, it’s pretty self-explanatory that you did so.

That’s possible in principle, and I don’t buy for a second that it ever works that way in practice.

Every useful program mutates state and requires two distant places to communicate with each other by sharing a reference to something. If the FP way of modeling this is convoluted, then every nontrivial program will be extremely convoluted if written in a FP language. That’s not going to eliminate bugs, it’s going to generate them.

I’m not saying I’m opposed to the FP paradigm. That’s the funny part: I probably bring in functional style constructs (gratuitous use of monadic transforms like map and flatMap on all sorts of types, from collections to optionals to observables, and so on) more than any industry developer I’ve worked with… but I see right through nonsense like “state is the root of all evil” (what could that possibly even mean?).

I just don’t think FP is the One True Way and I think it’s a terrible idea to a restrict a language to only work in that paradigm. I have the same opinion of the OOP paradigm. The problem is actually the “One True Way” part, so it’s funny that I’m reading people telling me that solution is to declare FP the actual One True Way. Why won’t these guys just let go of this search for a silver bullet? Programming Is Hard. You’re not going to get anywhere declaring that every computer program that can ever be created ever can be written with a single tool or concept or approach. We already know what that universal tool is: it’s called a computer, and the language for it is machine language. Everything the software industry has done for the last near-century is a desperate attempt to avoid having to write code with this universal tool.

I very much want a Swiss Army Knife programming language that supports as many paradigms as possible and allows them to interact with each other. This is why I’m such a fan of C++, whose designers share this philosophy. And of course I see endless whining about it for being “too complex” and hear that teams at major companies like Google declare large parts of the language as off limits, fracturing it into endless dialects because having to learn multiple dialects is apparently “easier” than just learning the language (I still have to learn everything because I’m not going to spend my entire career on one team, and additionally I have to learn how each team restricts things and how I’m supposed to do something that I would normally do with some part of the language they ban).

Plus, I mean, anyone suggesting that programming would get easier if we all just wrote Haskell… LOL. Have these guys ever seen Haskell? It’s a fascinating language, meant for language research, not building Microsoft Word, and for that purpose it’s been extremely fruitful, producing several concepts of type theory that eventually made their way into industry languages. Since it’s an extremely abstract research vehicle that requires you to be familiar with a bunch of super abstract math like category theory, the last word in the entire English language I would ever use to describe it is “easy” (for who, the most Asperger’s-y idiot savant who’s ever lived!?).

I should note that there seems to be some overlap of two groups here: one like Joe Armstrong who makes these nonsensical philosophical claims and says FP is the One True Way, and the other who is angry programming languages evolved past the 1970s and think everything should be written in C. They agree that OOP is a “disaster” but I suspect they’d hate each others’ paradigms even more. Addressing the people who want me to write GUI apps in C (which will, I guarantee, result in building an ad-hoc object model with abstraction, inheritance and dynamic dispatch… why do I have to build all that myself when C++ ships with it and its compiler understands it?) is a whole other topic.

Conclusion

These are the key takeaways:

  • When someone says a tool will eliminate an entire class of bugs, that doesn’t imply you’ll have fewer bugs, they’ll just be classified differently.
  • There is no One True Way to write computer software. Carpenters don’t look for one tool to bring to the job, they bring an entire truck full of different tools, each with high specialization, and walk around with an entire toolbelt strapped to their waist.
  • Anyone who genuinely thinks they’re writing computer programs that don’t have state and don’t mutate it are smoking some sh*t I wish I could have found in college.
  • OOP obsession and the languages influenced by it did create a problem of improper modeling of mutability. The solution is to make judicious selection between value and reference types, and in languages where you can’t make custom aggregate value types, use a combination of encapsulation and hand-written copy constructors or copy-mutators to simulate aggregate value types.

I promise at some point, people are going to look at giant apps written with stuff like Redux, freak out and start writing articles saying “FP is Dead, Long Live OOP” (this has already sort of happened with ReactiveX, which is a highly functional paradigm, it got massively overused and misused and now lots of people hate it). It’s going to go right back to where it started, no one’s going to learn the real lesson (stop looking for silver bullets!), and while history may not repeat, it’s going to rhyme.

That cynicism (it’s not so bad if you take the correct approach of just laughing at it) notwithstanding, I think programming languages have made genuine progress recently. Supporting a whole plethora of approaches, including value and reference types, imperative and declarative, procedural, object-oriented and functional, threads and coroutines, etc., lots of advanced work in generics, higher-order type systems and metaprogramming… this is all genuine progress, and this is happening in several languages. I’m sure at every point a group of people are going to be too proud or lazy to learn the new tools and complain about it… don’t be one of them and don’t let them drag you down with them.

At the same time, be prepared to look at a new trendy tool (especially ones that sell themselves as “simple” and work by banning large swaths of techniques that are available elsewhere) skeptically and recognize the patterns of a programming community latching suddenly onto a new concept and become outright drunk over its wild promises, and anticipate the epic hangover that’s about to strike them. In my opinion it’s very rare that a tool or technique gets added to a language or library that’s outright useless. Rather tools and techniques simply get used inappropriately, beyond their range of utility (i.e. hammers aren’t useless, but they’re not good screwdrivers). When entire communities start declaring a widely used tool ought to have never been invented and needs to be removed, that’s when the bender is starting, and that’s when you can bow out so you don’t wake up 3 years later with a pounding headache, no memory of what you did and codebase you can’t believe you thought was a good idea to write.

On Protocol Witnesses

Introduction

So, it’s just a normal day at work, you’re writing some Swift code, and the need arises to model an abstract type: there will be multiple implementations, and you want users to be able to use those implementations without knowing which implementation they are using.

So, of course, you reach for the tool that Swift provides for modeling abstract types: a protocol. If you were writing Kotlin, C#, Java or TypeScript, the comparable tool would be an interface.

As an example, let’s say you’re writing the interface to your backend, which exposes a few endpoints for fetching data. You are going to want to swap out the real or “live” backend for a fake backend during testing, so you’re at least going to need two variations. We’ll therefore write it as a protocol:

protocol NetworkInterfaceProtocol {
  func fetchJobListings() async throws -> [JobListing]

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant]

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing]
}

We’ll write each flavor as a conforming concrete type. The “live” one:

struct NetworkInterfaceLive: NetworkInterfaceProtocol {
  func fetchJobListings() async throws -> [JobListing] {
    let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings")

    guard let response, response.statusCode == 200 else {
      throw BadResponse(response)
    }

    return try JSONDecoder().decode(type: [JobListing].self, from: data ?? .init())
  }

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings/\(listing.id)/applicants")

    guard let response, response.statusCode == 200 else {
      throw BadResponse(response)
    }

    return try JSONDecoder().decode(type: [Applicant].self, from: data ?? .init())
  }

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
    let (data, response) = try await urlSession.data(for: host.appendingPathComponent("applicants/\(applicant.id)/listings")

    guard let response, response.statusCode == 200 else {
      throw BadResponse(response)
    }

    return try JSONDecoder().decode(type:[JobListing].self, from: data ?? .init())
  }

  init(
    urlSession: URLSession = .shared
    host: URL
  ) {
    self.urlSession = urlSession
    self.url = url
  }

  let urlSession: URLSession
  let host: URL
}

And then a fake one that just returns canned successful responses:

struct NetworkInterfaceHappyPathFake: NetworkInterfaceProtocol {
  func fetchJobListings() async throws -> [JobListing] {
    StubData.jobListings
  }

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    StubData.applicants[listing.id]!
  }

  func fetchJobListings(for applicant: Applicant) async throws {
    StubData.applicantListings[applicant.id]!
  }
}

Simple enough, right?

But wait! You start hearing that this isn’t the only way to implement this design. You can, in fact, build abstract types in Swift without using protocols at all! How could this be possible!?

We use what are called “protocol witnesses”. We replace the protocol with a struct, replace function declarations with closure members, and replace the concrete implementations with factories that assign the closures to the implementations provided by that concrete type. First the abstract type:

struct NetworkInterface {
  var fetchJobListings: () async throws -> [JobListing]

  var fetchApplicants: (_ listing: JobListing) async throws -> [Applicant]

  var fetchJobListingsForApplicant: (_ applicant: Applicant) async throws -> [JobListing]
}

Then the live implementation:

extension NetworkInterface {
  static func live(
    urlSession: URLSession = .shared
    host: URL
  ) -> Self {
    func fetchJobListings() async throws -> [JobListing] {
      let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings")

      guard let response, response.statusCode == 200 else {
        throw BadResponse(response)
      }

      return try JSONDecoder().decode(type: [JobListing].self, from: data ?? .init())
    }

    func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
      let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings/\(listing.id)/applicants")

      guard let response, response.statusCode == 200 else {
        throw BadResponse(response)
      }

      return try JSONDecoder().decode(type: [Applicant].self, from: data ?? .init())
    }

    func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
      let (data, response) = try await urlSession.data(for: host.appendingPathComponent("applicants/\(applicant.id)/listings")

      guard let response, response.statusCode == 200 else {
        throw BadResponse(response)
      }

      return try JSONDecoder().decode(type:[JobListing].self, from: data ?? .init())
    }

    return .init(
      fetchJobListings: fetchJobListings,
      fetchApplicants: fetchApplicants,
      fetchJobListingsForApplicant: fetchCandidates(for:)
    )
}

(If you haven’t seen this before, yes you can write “local functions” inside other functions, and they’re automatically closures with default implicit capture, which is how they have access to urlSession and host. If you need to explicitly capture anything you have to switch to writing them as closures, as in let fetchJobListings: () async throws -> [JobListing] = { [...] in ... }).

And the happy path fake:

extension NetworkInterface {
  var happyPathFake: Self {
    .init(
      fetchJobListings: { StubData.jobListings },
      fetchApplicants: { listing in StubData.applicants[listing.id]! },
      fetchJobListingsForApplicant: { applicant in StubData.applicantListings[applicant.id]! }
    )
  }
}

Take a moment to study those to understand that they are, in fact, equivalent. The only difference is that when you want to instantiate a live instance, instead of writing let network: NetworkInterfaceProtocol = NetworkInterfaceLive(url: prodUrl), we write let network: NetworkInterface = .live(url: prodUrl). And similarly when instantiating the happy path fake. Other than that (and the small caveat that you can’t have named function arguments), it’s equivalent.

And, in fact it is more powerful: I can, for example, mix and match the individual functions from different implementations (i.e. make an instance whose fetchJobListings is the happyPathFake one, but the other two functions are live). I can even change the implementation of one function after creating an instance, substituting new implementations inline in other code, like test code, and those functions will close over the context, so I can capture stuff like state defined on my test case file, to make a test double that, say, returns job listings that depends on how I’ve configured the test.

This technique is discussed by PointFree here. I don’t subscribe to their content, so I only see the free portion of this article, and similarly for other articles they link to from there. From this limited vantage point, it appears to me their position is that “protocol oriented programming”, a popular paradigm in the Swift community, is not a panacea, that there are situations where it is not the right tool, and the protocol witness approach can be a better and more powerful alternative. They specifically say that some things are harder to do with protocols than with protocol witnesses, and some things are not possible to do with protocols than can be done with protocol witnesses.

Now, the fundamental starting point of this discussion seems to be that these two options (the protocol with implementing structs, and the struct with settable closure members) are different implementations of the same design. That is, we’re attempting to model the same system, with the same requirements and behavior. And we’re not even debating two different conceptual models of the system. We agree there is an abstract type of network interface (and this, not somewhere else, is where the abstraction lies… this is important) and it has multiple concrete implementations. Rather, we are only comparing two mechanisms of bringing this conceptual model into reality.

As an example of what I’m talking about, here is two different models. This:

struct JobListing {
  let id: String
  let role: String
  let applicants: [Applicant]
  let candidates: [Candidate]
  ...
}

struct Applicant {
  let id: String
  let person: Person
  ...
}

struct Candidate {
  let id: String
  let person: Person
  ...
}

struct Person {
  let id: String
  let firstName: String
  let lastName: String
  ...
}

vs. this:

struct JobListing {
  let id: String
  let role: String
  let applicants: [Applicant]
  let candidates: [Candidate]
  ...
}

struct Applicant {
  let id: String
  let isCandidate: Bool
  let firstName: String
  let lastName: String
  ...
}

These are both attempts to conceptually model the same requirements, but they are different models (i.e. the former allows applicants and candidates to have different attributes, the latter does not). Contrast with the following, which is two implementations, with competing language features, of the same model. This:

struct JobListing {
  let id: String
  let role: String
  var applicants: [Applicant] { ... }
  var candidates: [Candidate] { ... }
  ...
}

vs. this:

struct JobListing {
  let id: String
  let role: String
  func getApplicants() -> [Applicant] { ... }
  func getCandidates() -> [Candidate] { ... }
  ...
}

These both model the domain in the exact same way. The difference is in whether we use the language feature of computed read-only variables vs. functions. That’s purely mechanical.

I believe PointFree are framing this choice of modeling our problem as a protocol with implementing structs, vs. a struct with closure members, as the latter: the same conceptual model but with different tools to construct the model. This is why the discussion focuses on what the language can and can’t do, not the exact nature of the requirements or system, since it is fundamentally about choosing language constructs, not correctly modeling things.

I think this framing is mistaken.

I rather consider the choice over these two implementations to be a discussion over two different conceptual models. This will become clear as I analyze the differences, and I will point out exactly where the differences have implications for the conceptual model of the system we’re building, which goes beyond mere selection of language mechanisms. In the process, we need to ask and satisfactorily answer:

  • What is the nature of this “extra power”, and what are the implications of being able to wield it?
  • What exactly are the things you “can’t” do (or can’t do as easily) with protocols, and why is it you can’t do them? Is it a language defect, something a future language feature might fix?
  • Does the protocol witness design actually eliminate protocols?
  • Why is this struct called a “protocol witness”? (that’s very significant and reveals a lot about what’s going on)

About That Extra Power

One, if not the, fundamental arguments to justify protocol witnesses is that it is “more powerful” than protocols.  Now, based on the part of the article I can see, I have a suspicion this is at least partially based on mistaken beliefs about what you can do with protocols. But there’s at least some truth to this, simply considering the examples they show (which I mentioned above in my own version of this): you can reconfigure individual pieces of functionality in the object, and even compose functionality together in different combinations, where in the protocol -> implementer approach you can’t (well, sort of… we’ll get there). With protocols, the behaviors always come as a set and can’t be mixed and matched, and they can’t be changed later.

This is, indeed, much more “powerful” in the sense you are less restricted in what you can do with these objects.

And that’s bad. Very bad.

“More powerful”, which we can also phrase as “more flexible”, always balances with “less safe”.  Being able to do more in code can very well mean, and usually does mean, you can do more wrong things.

After all, why are these restrictions in place to begin with?  Is it a language defect?  Have we just not figured out how to improve protocols to make them this flexible? Can you imagine a Swift update in which protocols enable you to do this swapping out of individual requirements after initializing a concrete implementing type?

No, it’s the opposite.  Unfettered flexibility is the “stone age” of software programming.  What’s the most flexible code, “powerful” in the sense of “you can implement more behaviors”, you could possibly write?

Assembly.

There, you can do lots of things higher level languages (where even BASIC or C is considered “high level”) won’t let you do.  Some examples are: perform addition on two values in memory that are execution instructions and store the result in another place that is later treated as an instruction, not decrement the stack before jumping to the return address, decrement the stack by an amount that’s not equal to the amount it was incremented originally, and have a for loop increment the index in the middle of an iteration instead of at the end.

These are all things you can’t do in higher level languages.  And thank God for that!  They’re all bugs.  That’s the point of higher level languages being more restricted: they stop you from writing incorrect code, which is the vast majority of code.

See, the main challenge of software engineering is not the inability to write code easily.  It’s the ability to easily write incorrect code.  The precise goal of raising the abstraction level with higher level languages is to restrict you from writing code that’s wrong.  The restrictions are not accidents or unsolved problems.  They are intentional guardrails installed to stop you from flying off the side of the mountain.

This is the point of strongly typed languages.  In JavaScript you can add an integer to a string.  You can’t do this in C.  Is C “missing” a capability?  Is JavaScript “more powerful” than C?  I guess in this sense yes.  But if adding an integer to a string is a programmer error, what is that extra power except a liability?

This is what the “footgun” analogy is about.  Some footguns are very very powerful.  They’re like fully automatic heat-seeking (or maybe foot-seeking?) assault footguns.  And that’s even worse than a single action revolver footgun because, after all, my goal is to not shoot myself in the foot.

While a looser type system or lower level language is “more powerful” at runtime, these languages are indeed less powerful at design time. Assembly is the most powerful in terms of what behavior you can create in the running program, but it is the least powerful in terms of what you can express, at design time, is definitely wrong and should be reported at design time as an error.

There’s no way to express in JavaScript that adding a network client to a list of usernames is nonsense.  There’s no way to express that calling .length on a DOM node is nonsense.  You can express this in Swift, in a way the compiler understands so that it prevents you from doing nonsensical things.  This is more powerful.

So more power at runtime is less power at design time, and vice versa. Computer code is just a sequence of instructions, and the vast vast majority of such sequences are useless or worse. Our goal is not to generate lots of those sequences quickly, it’s to sift through that incomprehensibly giant space of instruction sequences and filter out the garbage ones. And because of the halting problem, running the program to exercise every pathway is not a viable way of doing so.

Is this relevant to this discussion about protocol witnesses?  Absolutely.  All this is doing is turning Swift into JavaScript.  In JavaScript, there are no static classes with fixed functions.  “Methods” are just members of an object that can be called.  “Objects” are just string -> value dictionaries that can hold any values at any “key” (member name).

The only difference between this and the structs PointFree is showing us here is that JavaScript objects are even more powerful still: you can add or remove any functions, at any name, you want during runtime.  How can we make Swift do this?

@dynamicCallable
struct AnyClosure {
  // This will be much better with parameter packs
  init<R>(_ body: () -> R) {
    _call = { _ in body() }
  }

  init<T, R>(_ body: (T) -> R) {
    _call = { args in body(args[0] as! T) }
  }

  init<T1, T2, R>(_ body: (T1, T2) -> R) {
    _call = { args in body(args[0] as! T1, args[1] as! T2) }
  }

  init<T1, T2, T3, R>(_ body: (T1, T2, T3) -> R) {
    _call = { args in body(args[0] as! T1, args[1] as! T2, args[2] as! T3) }
  }

  ...

  @discardableResult
  func dynamicallyCall(withArguments args: [Any]) -> Any {
    _call(args)
  }

  private let _call: ([Any]) -> Any
}

@dynamicMemberLookup
struct NetworkInterface {
  private var methods: [String: AnyClosure]

  subscript(dynamicMember member: String) -> AnyClosure {
    get { methods[member]! }
    set { methods[member] = newValue }
  }
}

In a language that doesn’t have stuff like @dynamicCallable and @dynamicMemberLookup, you can still do this, but you have to settle for ugly syntax: network.fetchApplicants(listing.id) will have to be written as network.method("fetchApplicants").invoke([listing.id]).

With this, we can not only change the implementation of fetchJobListings or fetchApplicants, we can add more methods! Or delete one of those methods. Or change the parameters those methods take, or the type of their return value. Talk about being more powerful!

So that’s even better!  Well, if you consider this added “power” to be a good thing.  I don’t. What’s the point of adding a new method, or even worse changing the parameters of an existing one? It’s not like production code is going to call that new method, or send those new types of parameters in. Well, you might misspell fetchListings as fechListings or forget or reorder parameters, and now that’s a runtime error instead of a compile time error.

I like statically typed languages, and I like them because they restrict me from writing the bugs I can easily write in JavaScript, like calling a method that doesn’t exist, or changing the definition of sqrt to something that doesn’t return the square root of the input.

And this is very controversial.  I’ve met devs who strongly dislike statically typed languages and prefer JavaScript or Ruby because the static type system keeps stopping them from doing what they want to do.  I don’t want to be completely unfair to these devs: it is more work to design types that allow everything you need but also design bugs out of the system.  Rather, it’s more work at first, and then much much less work later.  It’s tempting to pick the “easier now and harder later” option because humans have time preference.

(Again to be fair to them, no static type system in existence today is expressive enough to do everything at compile time that dynamic languages can do at runtime. Macros, and more generally metaprogramming, will hopefully bridge the gap).

What bugs are no longer impossible to write when we use protocol witnesses?  Exercising any of these new capabilities that are just for tests in production code.  You can now swap an individual function out.  You should never do that in production code.  If you do it’s a bug.  If your response to this is “why would anyone ever do that?”, my counter-response is “LOL“.

Indeed this design, by being “more powerful” at runtime, is less powerful at compile time.  I simply can’t express that a flavor of the object exists where the definitions of the two methods can’t vary independently.  With the protocol I can still express that a flavor exists where they can vary independently: make the same witness struct but make it also conform to the protocol.  So I actually have more expressiveness at compile time with the protocol.  I can even say, for example, that one particular function or type only works with one concrete type (like a test that needs the happy path type, I can express that: func runTest(network: NetworkInterfaceHappyPathFake)), or that two parts of code must work with the same type (make it a generic parameter).

I can’t do any of this with protocol witnesses because the compiler has no idea about my types.  It only knows about the abstract type (represented by the witness struct), instead of knowing not only about all the types (the abstract type, as the protocol, and the live concrete type, and the happy path concrete type, etc.) but also their relationship (the latter are all subtypes of the abstract type, expressed by the conformance). So, as it turns out, and specifically because the protocol witness can do things the protocol design can’t do at runtime, the protocol design can do things at compile time the protocol witness design cannot.

“More” or “less” powerful depends on whether you’re looking at compile time or runtime, and one will be the opposite of the other.

These Are Different Models

As I said at the beginning, the very framing here is that these are alternative language mechanisms for the same model. The abstract type and concrete implementing types still exist in the protocol witness code. You just haven’t told the compiler about them, so it can’t enforce any rules related to them.

But whether or not you should be able to mix and match, and by extension swap out, individual functions in the NetworkInterface, is a matter of what system we’re modeling: what exactly the NetworkInterface concept represents, and what relationship, if any, its functions have to each other. Why, after all, are we even putting these three functions in the same type? Why did we decide the correct conceptual model of this system is for fetchJobListings and fetchApplicants(for:) to be instance methods on one type? Why didn’t we originally make three separate protocols, each of which defines only one of these functions?

Well, because they should vary together! Presumably the jobListing you pass into fetchApplicants(for:) is going to be, in fact needs to be, one you previously retrieved from fetchJobListings on the same NetworkInterface instance… or at least an instance of the same type of NetworkInterface. If you grabbed a jobListing from one implementation of NetworkInterface then asked another implementation of NetworkInterface what it’s applicants are, do you think it’s going to be able to tell you?

This means we really should try to express that a particular JobListing type is tied back to a particular NetworkInterface type, and that it is a programmer error that we ideally want to catch at compile time to send a JobListing retrieved from one type of NetworkInterface into another type of NetworkInterface. How would we do that? Definitely not with protocol witnesses. We not only need protocols, we need to exercise more of their capabilities:

protocol NetworkInterfaceProtocol {
  associatedtype JobListing
  associatedtype Applicant

  func fetchJobListings() async throws -> [JobListing]

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant]

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing]
}

This way, each NetworkInterface type has to specify its own JobListing and Applicant types, and (as long as we pick different types, which we should if we want strong typing) it will become a compile time error to try to pass one type of NetworkInterface‘s job listing into another type of NetworkInterface, which we know is not going to produce a sensible answer anyways (it might crash or throw an exception due to not finding a matching job listing, or even worse it might successfully return a wrong result because both backend systems happen to contain a job listing with the same id).

Obviously, then, it’s nonsense to mix and match the individual functions. The whole point of grouping these functions together is that they form a cohesive unit, and the implementation of one needs to “match” the implementation of another. Modeling this in a way where we are allowed to compose different individual functions is just plain wrong.

But if these three functions happened to be totally separate, like let’s say we have separate backends for different resources, they have no relationship to each other (no foreign keys from one to the other, for example), and it should be possible, and might eventually be a production use case, to mix and match different backends, then it would be just plain wrong to model all those functions as being a cohesive unit, incapable of being individually composed.

See, these are different models of a system. Which one is correct depends on what system we’re trying to build. This is not about language mechanics.

Even with the system as it is, where these functions clearly are a cohesive unit… why should NetworkInterface be a protocol? Why should we be able to define two interfaces that have entirely different implementations? Look at what’s defined in the Live implementation: we implement each one by doing an HTTP GET from a particular path added to a shared base endpoint (each one can’t have a separate base URL, they all have to go to the same backend system). Why would we want to replace this entire implementation? Are we going to have a backend that doesn’t use HTTP? If so, yes it makes sense this needs to be abstract. If not… well then what do we expect to vary?

The base URL? Because you have a Development, Staging and Production backend hosted at different URLs?

Well then just make that a parameter of a concrete type. If you want to be really fancy and make it a compile time error to send a Development JobListing into a Staging NetworkInterface, you can make NetworkInterface generic and use that to strongly type the response entities by their environment:

protocol Environment {
  static var baseURL: URL { get }
}

enum Environments {
  enum Development: Environment { static let host = ... } }
  enum Staging: Environment { static let host = ... } }
  enum Production: Environment { static let host = ... } }
}

struct JobListing<E: Environment> {
  ...
}

struct Applicant<E: Environment> {
 ...
}

struct NetworkInterface<E: Environment> {
  func fetchJobListings() async throws -> [JobListing<E>] {
    let (data, response) = try await urlSession.data(for: E.host.appendingPathComponent("listings")

    guard let response, response.statusCode == 200 else {
      throw BadResponse(response)
    }

    return try JSONDecoder().decode(type: [JobListing<E>].self, from: data ?? .init())
  }

  func fetchApplicants(for listing: JobListing<E>) async throws -> [Applicant<E>] {
    let (data, response) = try await urlSession.data(for: E.host.appendingPathComponent("listings/\(listing.id)/applicants")

    guard let response, response.statusCode == 200 else {
      throw BadResponse(response)
    }

    return try JSONDecoder().decode(type: [Applicant<E>].self, from: data ?? .init())
  }

  func fetchJobListings(for applicant: Applicant<E>) async throws -> [JobListing<E>] {
    let (data, response) = try await urlSession.data(for: E.host.appendingPathComponent("applicants/\(applicant.id)/listings")

    guard let response, response.statusCode == 200 else {
      throw BadResponse(response)
    }

    return try JSONDecoder().decode(type:[JobListing<E>].self, from: data ?? .init())
  }

  init(
    urlSession: URLSession = .shared
  ) {
    self.urlSession = urlSession
  }

  let urlSession: URLSession
}

So here, we use protocols, but in a different place: we’re expressing that the variation is not in the network interface itself. There’s only one network interface. The variation is the environment. Making NetworkInterface abstract is not correct because we don’t want that to vary. We’re not going to build a system that connects to anything other than one of these environments, who all have an HTTP backend with identical RESTful APIs. The web protocol, the resource paths, the shape of the responses, none of that varies. Only the environment varies. Not only can we not swap out individual function implementations, we can’t even swap out a network interface. There’s only one concrete type (parameterized by its environment).

In fact… why did we even think to make NetworkInterface abstract to begin with? We don’t even have multiple environments yet!

Ohh, right… testing.

The Interaction of Testing and Design

That’s a weird use case for this. We’re making things abstract specifically so we can break them… if we consider faking to be a breakage, which we should, as our customers certainly would if we accidentally shipped that to them. So then all these extra capabilities, like individually hot-swapping functions… sure, that creates network interfaces that are obviously broken, as any mix-and-matched one would have to be. But that’s the goal: we want to write a test where we can inject fake (a type of broken) behavior into specific parts of the code.

Testing is uniquely challenging in Swift, compared to other industry languages. In other languages the runtime system is dynamic enough that you can mess with objects at runtime by default. This is how you’re able to build mocking frameworks that can take any method on any class and rewrite it while the test is running. Back in Objective-C days we had OCMock. A major pain point in converting ObjC code with tests to Swift is that you can’t do this anymore.

Why is Swift uniquely difficult in this way (it is comparable to C++)? Because it is so much more strongly typed and compile time safe, which also affects how it is compiled. In other languages basically everything is dynamic dispatch, but in Swift a lot of stuff is static dispatch, so there is literally no place for tests to hook into and redirect that dispatch (at runtime at least). To allow a test to redirect something at runtime, you have to write Swift code specifically to abandon some of that compile time safety… or write Swift code that is compile time polymorphic instead of run time polymorphic, a.k.a. generics.

It’s very unfortunate, then, that we’ve introduced the ability to construct broken network interfaces into the production code just so we can do these things in tests. Wouldn’t it be much better if we could isolate this God mode cheat where we can break the invariants to just when the tests run? How might we do that?

Well, first let’s back up a moment here. What are we writing tests for? To prove the system works? To test-drive the features? It’s a popular idea in TDD circles that “testability” should dictate design. Now this evolved out of a belief that “testable design” is really just synonymous for “good design”… that a design that is untestable is untestable because it has design flaws. Testing is therefore doubly valuable because it additionally forces design improvements. Basically, testability reveals a bunch of places where you hardcoded stuff you shouldn’t have (like dependencies, which should be injected because that makes all your code far more reusable anyways).

But that concept can get carried way, to the point that your design becomes a slave to “testability”. You introduce abstractions, the capability for variation, customization, swapping out dependencies, all over the place just so you can finely probe the running system with test doubles. Even though you not only don’t need that variation, it would be fundamentally incorrect for those things to vary, we introduce a capability for variation just so we can construct intentionally broken (not shippable to production) configurations of our code just to run tests against them.

Again, this wasn’t really a concern in industry languages where TDD evolved in, because it’s not even possible to not have the capability for variation literally everywhere (people eventually found out how to mock even ostensibly static calls in Java). C++ devs might have known about it, and Swift brought this issue to the surface for a different audience.

There is a lot of debate over different testing strategies, including whether it’s a good idea in the first place for tests to “see” things customers would never be able to see, or for tests to run against anything other than the exact code that customers are going to use. The argument is simple: the more you screw with the configuration of code just to test it, the less accurate that test will be because no one in the wild is using that configuration. On the other hand, black box testing the exact production build with no special test capabilities or probes inserted is really hard and tends to produce brittle, slow, shallow tests that can’t even reliably put the system into a lot of the initial conditions we want to test.

So it’s really is worth asking ourselves: what exactly do we need to test that requires replacing implementations of NetworkInterface functions with test doubles? After all, this is the interface into a different system that we communicate with over the wire. Can’t we do the interception and substitution at the wire? Can’t we take full control over what responses NetworkInterface is going to produce by controlling the HTTP server at the base URL? Is that maybe a better way to test here, because then we don’t have to make NetworkInterface abstract, because arguably it shouldn’t be, there should not be a NetworkInterface that does anything other than make those specific HTTP calls?

You can do all the mixing, matching and hot swapping you want this way. As long as you can control your HTTP server in your tests, you’re good to go. If you run the server in the test process, you can supply closures that capture state in your test functions.

Okay so this is a pretty specific issue with this specific example. But then again, this is the specific example used to motivate the desire to mix and match and hot swap methods in our network interface. I think if we ignored the testing use case, it would become clear that this is a capability we very much do not want, and then protocol witnesses are plainly the wrong design specifically because they allow this.

But let’s say you really don’t want to spin up a local HTTP server to power your tests. I get it: it’s a pain in the a** to do that, and a heck of a lot easier to just hot-swap the functions. On the other hand, you aren’t exercising your production NetworkInterface, even though it would be nice for your tests to uncover problems in there too. And then it’s possible for your faked implementations to do things the real one simply wouldn’t be able to do. For example, the live implementation can throw a BadRequest error, whatever URLSession.data(for:) can throw, and whatever JSONDecoder.decode can throw. If you swap out a fake function that throws something else, and an explosion occurs… so what? Why does the rest of the code need to prepare for that eventuality even though it would never (and perhaps should never) happen?

Well, if NetworkInterface is a protocol, you can’t be sure that would never happen. After all, by having production code couple to a protocol, it’s coupling to implementations that can do anything. If NetworkInterface were concrete we’d know that isn’t possible, and we’d be wasting time ensuring production code deals with it. Correspondingly, we wouldn’t be able to write a test that throws something else. We could only control the response by controlling what the server returns, which can only trigger one of the errors that live implementation can throw… and we’ll be sure whatever scenario we created is one that can really happen and that we really need to deal with.

And so on… but anyways, you’re just not going to spin up an HTTP server, so can we somehow isolate these special hot-swapping capabilities to just the tests? Yes. To do so we still need to make NetworkInterface a protocol, which makes the production code potentially way more open-ended than we want it to be… but we can try to express that:

  • The entire production code should all use one implementation of NetworkInterface. That is, we shouldn’t be able to switch from one type to another in the middle of the app
  • The only implementation that should be available in production code is the live implementation

The second part is easy: we’re going to use the protocol model, the Live implementation will be in production code, and a Test implementation will be defined in the test suite. The Live implementation will look just like it did originally. The Test one will look like this:

struct NetworkInterfaceTest: NetworkInterfaceProtocol {
  var fetchJobListingsImp: () async throws -> [JobListing]
  func fetchJobListings() async throws -> [JobListing] {
    try await fetchJobListingsImp()
  }

  var fetchApplicantsImp: (_ listing: JobListing) async throws -> [Applicant]
  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    try await fetchApplicantsImp(listing)
  }

  var fetchJobListingsForApplicantImp: (_ applicant: Applicant) async throws -> [JobListing]
  func fetchJobListing(for applicant: Applicant) async throws -> [JobListing] {
    try await fetchJobListingsForApplicantImp(listing)
  }
}

extension NetworkInterfaceTest {
  var happyPathFake: Self {
    .init(
      fetchJobListingsImp: { StubData.jobListings },
      fetchApplicantsImp: { listing in StubData.applicants[listing.id]! },
      fetchJobListingsForApplicantImp: { applicant in StubData.applicantListings[applicant.id]! }
    )
  }
}

See, we’re still using the protocol witness approach, and we’re still building factories for creating specific common configurations, which we can always reconfigure or customize later. But we’re concentrating it in a specific implementation of the protocol. This expresses that in general a NetworkInterface has cohesion among its functions, and they therefore cannot be individually selected. But this individual configuration capability exists specifically in one implementation of NetworkInterface. By defining NetworkInterfaceTest in the test target instead of the prod target, we’re keeping this, and all its added capabilities, out of the production code.

For the first point, since NetworkInterface is a protocol, there’s nothing stopping you from creating new implementations and using them in specific places in production code. Maybe it’s enough to define only the Live one in prod code, but substituting a different (and definitely wrong) one at one specific place (like in a Model for one screen, or component on a screen, buried deep in the structure of the app) would be easy to miss because you’d have to go drill down into that specific place to see the mistake.

Is it not a requirement of the production code that everything uses the same type of network interface? It would always be incorrect to switch from one type to another for one part of the app. Even in tests, aren’t we going to swap in the fake server for everything? Can we somehow express this in the design, to make it a compiler error to mix multiple types? If we could, then picking the wrong type would affect the app everywhere, and that would probably be immediately obvious upon the most trivial testing.

The reason it would be possible to switch network types mid-way is because every UI model stores an instance of the protocol existential, like let network: NetworkInterfaceProtocol, which is really shorthand for let network: any NetworkInterfaceProtocol. That any box is able to store, well, any implementing type, and two different boxes can store two different types. How do we make them store the same type?

With generics.

Instead of this:

final class SomeComponentModel {
  ...

  let otherComponentModel: OtherComponentModel

  private let network: any NetworkInterfaceProtocol
}

final class OtherComponentModel {
  ...

  private let network: any NetworkInterfaceProtocol
}

We do this:

final class SomeComponentModel<Network: NetworkInterfaceProtocol> {
  ...

  let otherComponentModel: OtherComponentModel<Network>

  private let network: Network
}

final class OtherComponentModel<Network: NetworkInterfaceProtocol> {
  ...

  private let network: Network
}

See, the generic parameter links the two network members of the two component models together: they can still vary, we can make component models for any network interface type, but when we select one for SomeComponentModel, we thereby select the same one for OtherComponentModel. Do this throughout all the models in your app, and you express that the entire app structure works with a single type of network interface. You select the type for the entire app, not for each individual component model.

This is an example of “fail big or not at all”.  If you can’t eliminate an error from the design, try to make it a nuclear explosion in your face so you immediately catch and correct it.  The worst kinds of failures are subtle ones that fly under the radar. If your response to this is “just cover everything with tests”, well… I’ve yet to see any GUI app come anywhere close to doing that. Plus if you aren’t categorizing in your mind compilation errors as failed tests, and the type system you create as a test suite, you’re not getting the point of static typing.

Whether this is worth it to you, well that depends. How determined are you to prevent mistakes like creating new network interface types and using them in tucked away parts of the app (is this something you think is appreciably risky?), and how much does making all the types in your app generic bother you (maybe you don’t like the bracket syntax, or you have other aesthetic or design philosophy reasons to object to a proliferation of generics in a codebase)?

I have been steadily evolving for the past 3 years toward embracing generics, and I mean really embracing them: I have no problem with every single type in my app being generic, and having multiple type parameters (if you want to see code that works this way, just look at SwiftUI). I used to find the brackets, extra syntactic boilerplate, and cognitive load of working out constraints (where clauses) alarming or tedious. Then I got used to it, it became second nature to me, and now I see no reason not to. It leads to much more type safe code, and greatly expands the scope of rules I can tell the compiler about, that the compiler then enforces for me. I don’t necessarily think inventing a new network interface for an individual component is a likely bug to come up, but the cost of preventing it is essentially zero, and I additionally document through the design the requirement that the backend selection is app-wide, not per-component.

(The same evolution is happening in my C++ codebases: everything is becoming templates, especially once C++20 concepts showed up).

And this all involves using the two most powerful, expressive, and, in my opinion, important, parts of Swift: protocols and generics.

It’s Not a Choice Over Protocols

Is this “protocol oriented programming”? And if it is, does this mean I believe protocol oriented programming is a panacea? I don’t know, maybe. But the protocol witness design hasn’t removed protocols, it’s just moved them around. The issue isn’t whether to use protocols or not, it’s where they properly belong, which is determined by the correct conceptual model of your system, specifically in what should really be abstract.

Where are the protocols in the protocol witness version? After all, the keyword protocol is nowhere to be found in that code. How can I claim it’s still using protocols?

Because closures are protocols.

What else could the be? They’re abstract, aren’t they? I can’t write let wtf = (() async throws -> [JobListing]).init(). They express a variation: a closure variable holds a closure body, but it doesn’t specify which one. Even the underlying mechanisms are the same. When you call a closure, the compiler has to insert a dynamic dispatch mechanism. That means a witness table (there’s that word “witness”, we’ll get to that later), which lists out the ways a particular instance fulfills the requirements of a protocol.

This is more obvious if we think about how we would implement closures if the language didn’t directly support them. We’d use protocols:

protocol AsyncThrowingCallable0<R> {
  associatedtype R

  func callAsFunction() async throws -> R
}

protocol AsyncThrowingCallable1<T, R> {
  associatedtype T
  associatedtype R

  func callAsFunction(_ arg0: T) async throws -> R
}

struct NetworkInterface {
  var fetchJobListings: any AsyncThrowingCallable0<[JobListing]>

  var fetchApplicants: any AsyncThrowingCallable1<JobListing, [Applicant]>

  var fetchJobListingsForApplicant: any AsyncThrowingCallable1<Applicant, [JobListing]>
}

extension NetworkInterface {
  static func live(
    urlSession: URLSession = .shared
    host: URL
  ) -> Self {
    struct FetchJobListings: AsyncThrowingCallable0 {
      let urlSession: URLSession
      let host: URL

      func callAsFunction() async throws -> [JobListing] {
        let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings")

        guard let response, response.statusCode == 200 else {
          throw BadResponse(response)
        }

        return try JSONDecoder().decode(type: [JobListing].self, from: data ?? .init())
      }
    }

    struct FetchApplicants: AsyncThrowingCallable1 {
      let urlSession: URLSession
      let host: URL

      func callAsFunction(_ listing: JobListing) async throws -> [Applicant] {
        let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings/\(listing.id)/applicants")

        guard let response, response.statusCode == 200 else {
          throw BadResponse(response)
        }

        return try JSONDecoder().decode(type: [Applicant].self, from: data ?? .init())
      }
    }

    struct FetchJobListingsForApplicant: AsyncThrowingCallable1 {
      let urlSession: URLSession
      let host: URL

      func callAsFunction(_ applicant: Applicant) async throws -> [JobListing] {
        let (data, response) = try await urlSession.data(for: host.appendingPathComponent("applicants/\(applicant.id)/listings")

        guard let response, response.statusCode == 200 else {
          throw BadResponse(response)
        }

        return try JSONDecoder().decode(type:[JobListing].self, from: data ?? .init())
      }
    }

    return .init(
      fetchJobListings: FetchJobListings(urlSession: urlSession, host: host),
      fetchApplicants: FetchApplicants(urlSession: urlSession, host: host),
      fetchJobListingsForApplicant: FetchJobListingsForApplicant(urlSession: urlSession, host: host)
    )
}

This is something I think every Swift developer should see at least once, because it shows the underlying mechanism of closures and helps demystify them. We have to implement callability, which we do as a protocol requirement (even if we didn’t have callAsFunction to improve the syntax, we could name the function something like invoke and it still works, we just have to write out invoke to call a closure). We also have to implement capturing, which is fundamentally why these are protocols (whose implementations can be any struct with any amount of instance storage) and not just function pointers.

Now, am I just being pedantic? Can’t I just as well conceive of closures as being function pointers, instead of protocols? No, the specific reason a closure is not just a function pointer is because of capture. I could not keep urlSession and host around and accessible in the live implementation’s methods if they were just function pointers. And what exactly gets captured varies across the implementations.

The key capability of the protocol design is that each implementing type can have its own set of (possibly private) fields, giving each one a different size and layout in memory. The protocol witness itself can’t do this. It’s a struct and its only fields are the methods. Every instance is going to have the exact layout in memory. How, then, can we possibly give the live version more instance fields? By stuffing them inside the captures of the methods, which requires the methods to support variation in their memory layout. The protocoliness of closures is, in fact, crucial to the protocol witness approach working at all.

So, with this clear, we can plainly see that we aren’t choosing protocols vs. no protocols. Rather, we’re choosing one protocol with three requirements at the top vs. three protocols with one requirement (being callable) on the inside. It’s no different than any other modeling problem where you try to work out what the correct way to conceptualize the system you’re building. What parts of it are abstract? Do we have a single abstraction here, or three separate abstractions placed into a concrete aggregate?

If this is your way to avoid being “protocol oriented”, well… I have some bad news for you!

Since we’re on the subject, could we implement either one of these conceptual models (the single abstraction with multiple requirements or the multiple abstractions, each with a single requirement) without any protocols? That might seem plainly impossible. How can we model abstraction without using the mechanism of abstraction that Swift gives us? Well, you can do it, because of course abstraction itself has an implementation, and we can always do that by hand instead of using the language’s version of it.

I really, really hope no one actually suggests this would ever be a good idea. And I’m more nervous about that than you might think because I’ve seen a lot of developers who seem to have forgotten that dynamic dispatch exists and insist on writing their own dispatch code with procedural flow, usually by switching over enums. I hesitate to even show this because I might just be giving them ideas. But if you promise me you’ll take this as educational only and not go start replacing protocols in your code with what you’re about to see… let’s proceed.

This is, of course, how you would model this if you were using a language that simply didn’t have an abstract type system… like C. How would you implement this system in C? Well, that would be a syntactic translation of what you’re about to see in Swift:

enum NetworkInterfaceType {
  case live(urlSession: URLSession, host: URL)
  case happyPathFake
}

struct NetworkInterface {
  let type: NetworkInterfaceType

  func fetchJobListings() async throws -> [JobListing] {
    switch type {
      case let .live(urlSession, host):
        let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings")

        guard let response, response.statusCode == 200 else {
          throw BadResponse(response)
        }

        return try JSONDecoder().decode(type: [JobListing].self, from: data ?? .init())

      case .happyPathFake:
        return StubData.jobListings
    }
  }

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
  switch type {
      case let .live(urlSession, host):
        let (data, response) = try await urlSession.data(for: host.appendingPathComponent("listings/\(listing.id)/applicants")

        guard let response, response.statusCode == 200 else {
          throw BadResponse(response)
        }

        return try JSONDecoder().decode(type: [Applicant].self, from: data ?? .init())

      case .happyPathFake:
        return StubData.applicants[listing.id]!
    }    
  }

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
  switch type {
      case let .live(urlSession, host):
        let (data, response) = try await urlSession.data(for: host.appendingPathComponent("applicants/\(applicant.id)/listings")

        guard let response, response.statusCode == 200 else {
          throw BadResponse(response)
        }

        return try JSONDecoder().decode(type: [JobListing].self, from: data ?? .init())

      case .happyPathFake:
        return StubData.applicantListings[applicant.id]!
    }    
  }
}

For those curious: how would you do this in C, particularly the enum with associated values, something that does not translate directly to C? You would use a union of all the associated value tuples (structs) for each case, and have a plain enum for tracking which value is currently stored in the union (this is basically how enums with associated types are implemented under the hood in Swift).

How would you even add a new test double to this? You’d have to define a new enum case, and add the appropriate switch cases to each method. Yes, you’d have to do this for each and every “concrete implementation” you come up with. The code for every possible implementation of the network interface would all coexist in this giant struct, and in giant methods that cram all those different varying implementations next to each other.

How would you capture state from where you define these other implementations? Ohh… manually, by adding the captured values as associated values of that particular enum case, that you’d then have to pass in by hand. Every variation of state capture would need its own enum case!

Please, please please don’t start doing this, please!

Again, my anxiety here is well founded. I’ve seen plenty of Swift code where an enum is defined whose name ends with Type, it’s added as a member to another type X, and several methods on X are implemented by switching over the enum. If you’ve ever done this… I mean if you’ve ever written an enum whose name is WhateverType and made one a member of the type Whatever, then you’ve invented your own type system. After all, wouldn’t it make more sense to embed the enum inside the other type and remove the redundant prefix?

struct Whatever {
  enum Type {
    ...
  }

  let type: Type
  ...
}

Try it. See what the compiler tells you. Whatever already has an inner typealias named Type, which is the metatype: literally Whatever.Type, that Swift defines for every type you create.

You’re inventing your own type system!

Why you’re doing this instead of using the type system the language ships with… I just don’t know.

There are other ways you could do this that might be slightly less ridiculous than cramming all the bodies of all the implementations of the abstract type next to each other in a single source file, and even in single functions. You could, for example, use function pointers, and pass Any parameters in to handle captures. By that point you’re literally just implementing dynamic dispatch and abstract types.

To be clear, this implements the first design of a single protocol with three requirements. If we wanted to implement the protocol witness this way, we’d need three separate structs with their own Type enums, so they can vary independently.

So… yeah. Good thing we decided to use protocols. I don’t know about you, but I’m very happy we don’t write iOS apps in C.

I do see this as being peripherally related to the protocol witness stuff, particularly the conception of it as a choice over language mechanics, because that is also inventing your own type system. As I pointed out earlier, the abstract type and multiple concrete types still exist. The compiler just doesn’t know about it. You’re still typing things. You’re just doing it without making your conceptual “types” literally types as defined in Swift.

I don’t know why you’d want to invent your own type system, and I doubt there is ever a good reason to. Doing so has very surprising implications, it’s an expert level technique that rapidly gets complicated beyond only the most trivial use cases, and it only shuts off a large part of compile time verification and forces you to do type validation at runtime (and probably fatalError or something on validation failures). If you’re doing this to circumvent the rules of a static type system, like for example that you can’t replace the definition of a type’s function at runtime or construct instances that are stitched together from multiple different types (in other words, static types are static)… well you absolutely should not be doing that, the type system works the way it does for a reason and you should stop trying to break it.

What’s a “Witness” Anyways?

Speaking of inventing your own version of what Swift already gives you… why is this technique called a “protocol witness”? If you Google “Swift” and “witness” you’ll likely encounter discussions about how Swift is implemented, something involving a “witness table”. You may have even seen a mention of such a thing in certain crashes, either the debug message that gets printed or somewhere in the stack trace.

To understand that, let’s think about what happens when we declare a variable of type let network: any NetworkInterfaceProtocol. Now, today, you may or may not have to write any here. Whether you do is based on some rather esoteric history (albeit recent history) about Swift. Basically, if you were allowed to write this at all before Swift 5.7, you don’t have to write any, but you can. If something about the protocol prevented you from using it as a variable type before Swift 5.7, specifically before this proposal was implemented, then you have to write any.

I’m harping on this because it’s of critical importance here. This is the correct way to think about it: the any is always there, and always has been, wherever you use a protocol as a variable type. This just was, and still is in many places, implied by the compiler, so writing it out is optional. After all, the compiler can already tell that a protocol is being used as a variable type, so there’s no syntactic ambiguity by omitting it (similar to how you can omit the type of a variable if it gets initialized inline).

How else can you use a protocol beside as the type of a variable? To declare conformance (i.e. to say NetworkInterfaceLive conforms to NetworkInterfaceProtocol) and in a generic constraint. Those two places involve the protocol itself. That is completely different than the protocol name with any before it. If P is a protocol, then any P is an existential box that holds an instance of any concrete type that implements P.

What is an “existential box”? I don’t want to get too deep into the implementation details of Swift’s type system (I’m writing another article series for that), but just think of it this way: P is abstract, which literally means no variable can have a type P. But there has to be some concrete type, and it has to be static: the type of a variable var theBox: any P can’t change as you put different concrete implementations of P inside it. So the type can’t be whatever concrete type you put in it.

That’s why we have to have a special concrete type, any P, for holding literally any P. Imagine for a moment the compiler just didn’t let you do this. You just aren’t allowed to declare a variable whose type is a protocol. But you need to be able to store any instance that implements a protocol in a variable. What would you do? Let’s say we have the protocol:

protocol MyProtocol {
  var readOnlyMember: Int { get }
  var readWriteMember: String { get set }

  func method1() -> Int
  func method2(param: Int) -> String
}

Now, the point of trying to declare a variable var theBox: MyProtocol is so we can call the requirements, like theBox.readOnlyMember, or theBox.method1(). We need to create a type that lets us do this, but those calls actually dispatch to any implementing instance we want. Let’s try this:

struct AnyMyProtocol {
  var readOnlyMember: Int { readWriteMember_get() }
  var readWriteMember: String { 
    get { readWriteMember_get() }
    set { readWriteMember_set(newValue) }
  }

  func method1() -> Int { method1_imp() } 
  func method2(param: Int) -> String { method2_imp(param) }  

  init<P: AnyMyProtocol>(_ value: P) {
    readOnlyMember_get = { value.readOnlyMember }
   
    readWriteMember_get = { value.readWriteMember }
    readWriteMember_set = { newValue in value.readWriteMember = newValue }

    method1_imp = { value.method1() } 
    method2_imp = { param in value.method2(param: param) } 
  }

  private let readOnlyMember_get: () -> Int

  private let readWriteMember_get: () -> String
  private let readWriteMember_set: (String) -> Void

  private let method1_imp: () -> Int
  private let method2_imp: (Int) -> String
}

What’s happening here is we store individual closures for each of the requirements defined by the protocol (all getters and setters of vars and all funcs). We define a generic initializer that takes an instance of some concrete implementation of P, and we assign all these closures to forward the calls to this implementation.

This is called a type eraser, because that’s exactly what it’s doing. It’s dropping knowledge of the concrete type (whatever the generic parameter P was bound to when the init was called) but preserving the capabilities of the instance. That way, whoever is using the AnyMyProtocol doesn’t know the type of MyProtocol those calls are being forwarded to.

But hold on… this isn’t correct. If you tried compiling this code, you’ll see it failed. Specifically, we’re trying to assign readWriteVar on the value coming into the initializer. But value is a function parameter, and therefore read-only. We need to store our own mutable copy first, and make sure to send all our calls to that copy. We can do that by simply shadowing the variable coming in:

  init<P: AnyMyProtocol>(_ value: P) {
    var value = value

    readOnlyMember_get = { value.readOnlyMember }
   
    readWriteMember_get = { value.readWriteMember }
    readWriteMember_set = { newValue in value.readWriteMember = newValue }

    method1_imp = { value.method1() } 
    method2_imp = { param in value.method2(param: param) } 
  }

What exactly is happening here? Well, when the init is called, we create the local variable, which makes a copy of value. All those closures capture value by reference (because we didn’t explicitly add [value] in the capture list to capture a copy), so we are able to write back to it. Since those closures are escaping, Swift sees that this local variable is captured by reference in escaping closures, which means that variable has to escape too. So it actually allocates that variable on the heap, allowing it to live past the init call. It gets put in a reference counted box, retained by each closure, and the closures are then retained by the AnyMyProtocol instance being initialized. When this instance is discarded, all the closures are discarded, the reference count of this value instance goes to 0, and it gets deallocated.

In a rather convoluted way, this effectively sneaks value into the storage for the AnyMyProtocol instance. It’s not literally inside the instance, it just gets attached to it and has the same lifetime.

Now think about what happens here:

let box1 = AnyMyProtocol(SomeConcreteStruct())
var box2 = box1

box2.readWriteMember = "Uh oh..."

print(box1.readWriteMember)

What gets printed? This: "Uh oh...".

We assigned readWriteMember on box2, and it affected the same member on box1. Evidently AnyMyProtocol has reference semantics, even though the value it’s erasing, a SomeConcreteStruct, has value semantics. This is wrong. When we assign box2 to box1 it’s supposed to make a copy of the SomeConcreteStruct instance inside of box1. But above I explained that the actual boxed value is the var value inside the init, which is placed in a reference counted heap-allocated box to keep it alive as long as the closures that capture it are alive. This has to happen, this value must have reference semantics, because the closures have to share the value. When the readWriteMember_set closure is called and writes to value, that has to be “seen” by a subsequent call to the readWriteMember_get closure.

But while we need the value to be shared among the various closures in a single instance of AnyMyProtocol, we don’t want the value to be shared across instances. How do we fix this?

We have to put the erased value in the AnyMyProtocol instance as a member, and we need that member to be the one closures operate one, which means it needs to be passed into the closures:

struct AnyMyProtocol {
  var readOnlyMember: Int { readWriteMember_get(erased) }
  var readWriteMember: String { 
    get { readWriteMember_get(erased) }
    set { readWriteMember_set(&erased, newValue) }
  }

  func method1() -> Int { method1_imp(erased) } 
  func method2(param: Int) -> String { method2_imp(erased, param) }  

  init<P: AnyMyProtocol>(_ value: P) {
    erased = value

    readOnlyMember_get = { erased in (erased as! P).readOnlyMember }
   
    readWriteMember_get = { erased in (erased as! P).readWriteMember }
    readWriteMember_set = { erased, newValue in 
      var value = erased as! P
      value.readWriteMember = newValue
      erased = value
   }

    method1_imp = { erased in (erased as! P).method1() } 
    method2_imp = { erased, param in (erased as! P).method2(param: param) } 
  }

  private var erased: Any

  private let readOnlyMember_get: (Any) -> Int

  private let readWriteMember_get: (Any) -> String
  private let readWriteMember_set: (inout Any, String) -> Void

  private let method1_imp: (Any) -> Int
  private let method2_imp: (Any, Int) -> String
}

Here we see the appearance of Any. What’s that? It’s Swift’s type erasing box for literally anything. It is what implements the functionality of keep tracking of what’s inside the box and properly copying it when assigning one Any to another. The closures now have what is effectively the self parameter. The closures are going to be shared among copies of the AnyMyProtocol, we can’t change that. So if they’re shared, and we don’t want them to operate on a shared P instance, we have to pass in the particular P instance we want them to operate on.

In fact, these aren’t closures anymore because they aren’t capturing anything. And that’s fortunate, because we’re trying to solve the “Swift doesn’t let you use abstract types as variable types” hypothetical, which would ban closures too. By eliminating capture, these are now just plain old function pointers, which aren’t abstract types.

Now, we can collect these function pointers into a struct:

struct AnyMyProtocol {
  var readOnlyMember: Int { witnessTable.readWriteMember_get(erased) }
  var readWriteMember: String { 
    get { witnessTable.readWriteMember_get(erased) }
    set { witnessTable.readWriteMember_set(&erased, newValue) }
  }

  func method1() -> Int { witnessTable.method1_imp(erased) } 
  func method2(param: Int) -> String { witnessTable.method2_imp(erased, param) }  

  init<P: AnyMyProtocol>(_ value: P) {
    erased = value

    witnessTable = .init(
      readOnlyMember_get: { erased in (erased as! P).readOnlyMember },
      readWriteMember_get = { erased in (erased as! P).readWriteMember },
      readWriteMember_set = { erased, newValue in 
        var value = erased as! P
        value.readWriteMember = newValue
        erased = value,
      method1_imp = { erased in (erased as! P).method1() },
      method2_imp = { erased, param in (erased as! P).method2(param: param) }
     )
  }

  struct WitnessTable {
    let readOnlyMember_get: (Any) -> Int

    let readWriteMember_get: (Any) -> String
    let readWriteMember_set: (inout Any, String) -> Void

    let method1_imp: (Any) -> Int
    let method2_imp: (Any, Int) -> String
  }

  private let erased: Any
  private let witnessTable: WitnessTable
}

The specific WitnessTable instance being created doesn’t depend on anything except the type P, so we can move into an extension on MyProtocol:

extension MyProtocol {
  static var witnessTable: AnyMyProtocol.WitnessTable {
    .init(
      readOnlyMember_get: { erased in (erased as! Self).readOnlyMember },
      readWriteMember_get = { erased in (erased as! Self).readWriteMember },
      readWriteMember_set = { erased, newValue in 
        var value = erased as! Self
        value.readWriteMember = newValue
        erased = value,
      method1_imp = { erased in (erased as! Self).method1() },
      method2_imp = { erased, param in (erased as! Self).method2(param: param) }
    )
  }
}

Then the init is just this:

  init<P: AnyMyProtocol>(_ value: P) {
    erased = value
    witnessTable = P.witnessTable
  }

Does this idea of a table of function pointer members, one for each requirement of a protocol, sound familiar to you?

Hey! This is a protocol witness! “Witness” refers to the fact that the table “sees” the concrete P and its concrete implementations of all those requirements, recording the “proof”, so-to-speak, of those implementations in the form of the function pointers. When someone else comes along and asks, “hey, does you value implement this requirement?”, the witness answers, “yes! I saw that earlier. Here, look at this closure, this is what its implementation is”.

Well, this is exactly what the Swift compiler writes for you, for every protocol that you write. The only difference is that theirs is called any MyProtocol instead of AnyMyProtocol. The compiler creates a witness table for every protocol and uses it in its existential boxes to figure out where to jump to.

At any point an existential box will hold a pointer to a particular witness table, and when you call a method on it, the compiled code goes to the witness table, grabs the function pointer at the right index (depending on what requirement you’re invoking), and then jumps to that function pointer.

The box is, in fact, implementing dynamic dispatch, which always requires some type of runtime lookup mechanism. The members of the witness table are function pointers, which are just the addresses of the first line of the compiled code for the bodies. Calling one just jumps the execution to that address. Every function has a known compile time constant address, so if you want dynamic dispatch, you have to store a function pointer in a variable. That’s what the witness table is.

There are things the compiler can make its box do that we can’t make our box do, and vice versa. The compiler makes its box transparent with respect to casting: we can directly downcast an instance of any P to P1 (where P1 is a concrete implementer of P) and the compiler checks what it has in the box and, if it matches, pulls that out and gives it to us. We can’t make our own box do that, at least not transparently. On the other hand, the compiler never conforms its box to any protocols (not even the protocol being erased: any P does not conform to P, you may have seen compiler errors about this that, they used to be much worse messages like P as a type cannot conform to itself, which is pretty freaking confusing!). You can conform your box to whatever protocols you want.

If we wrote our own existential box for NetworkInterfaceProtocol, it would look like this:

struct AnyNetworkInterface: NetworkInterfaceProtocol {
  func fetchJobListings() async throws -> [JobListing] {
    try await fetchJobListings_imp()
  }

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    try await fetchApplicants_imp(listing)
  }

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
    try await fetchJobListingsForApplicant_imp(applicant)
  }

  init<Erasing: NetworkInterfaceProtocol>(erasing: Erasing) {
    fetchJobListings_imp = { try await erasing.fetchJobListings() } 
    fetchApplicants_imp = { listing in try await erasing.fetchApplicants(for: listing) } 
    fetchJobListingsForApplicant_imp = { applicant in try await erasing.fetchJobListings(for: applicant) } 
  }

  private let fetchJobListings_imp: () async throws -> [JobListings]
  private let fetchApplicants_imp: (JobListing) async throws -> [Applicant]
  private let fetchJobListingsForApplicant_imp: (Applicant) async throws -> [JobListings]
}

// Convenience function, so we can box an instance with `value.erase()` instead of `AnyNetworkInterface(erasing: value)`
extension NetworkInterfaceProtocol {
  func erase() -> AnyNetworkInterface {
    .init(erasing: self)
  }
}

Well this is almost exactly like the Test implementation I showed earlier, isn’t it!?

“Protocol witness” is a reference to the fact this is, for all intents and purposes, a type erasing box. The only difference is the one PointFree shows us is made to be configurable later. We can do that too:

struct ConfigurableAnyNetworkInterface: NetworkInterfaceProtocol {
  func fetchJobListings() async throws -> [JobListing] {
    try await fetchJobListings_imp()
  }

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    try await fetchApplicants_imp(listing)
  }

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
    try await fetchJobListingsForApplicant_imp(applicant)
  }

  init<Erasing: NetworkInterfaceProtocol>(erasing: Erasing) {
    fetchJobListings_imp = { try await erasing.fetchJobListings() } 
    fetchApplicants_imp= { listing in try await erasing.fetchApplicants(for: listing) } 
    fetchJobListingsForApplicant_imp = { applicant in try await erasing.fetchJobListings(for: applicant) } 
  }

  var fetchJobListings_imp: () async throws -> [JobListings]
  var fetchApplicants_imp: (JobListing) async throws -> [Applicant]
  var fetchJobListingsForApplicant_imp: (Applicant) async throws -> [JobListings]
}

extension NetworkInterfaceProtocol {
  func makeConfigurable() -> ConfigurableAnyNetworkInterface{
    .init(erasing: self)
  }
}

With this, we can do something like start off with the Live implementation, put it in a configurable box, then start customizing it:

var network = NetworkInterfaceLive(host: prodHost)
  .makeConfigurable()

network.fetchJobListings_imp = { 
  ...
}

What we’re really doing here is writing our own existential box. That’s what I mean when I say this technique is getting into the territory of creating our own versions of stuff the compiler already creates for us. It’s just a less absurd form of building our own type system with Type enums.

Now, there are reasons why you need to build your own type erasing box. Even Apple does this in several of their frameworks (AnySequence, AnyPublisher, AnyView, etc.). It usually comes down to making the box conform to the protocol it’s abstracting over (AnySequence conforms to Sequence, AnyPublisher conforms to Publisher, AnyView conforms to View, etc.). This is a language limitation, something we expect, or at least hope, will be alleviated in future versions (i.e. allowing extensions of the compiler-provided box: extension any Sequence: Sequence), and sometimes we just need to work around language limitations by doing stuff ourselves the compiler normally does.

(…well, not necessarily. Whether the type erasing box should ever conform to the protocol it abstracts over is not so obvious. Remember how we used a generic to ensure two component models use the same concrete network interface type? If any NetworkInterfaceProtocol automatically counted as a concrete network interface, you could pick that as the type parameter and then you’re able to mix and match, or switch mid-execution. Then what’s the point of making it generic? Maybe what we really need is a way to specify that a generic parameter can be either a concrete type or the existential).

However, writing a type erasing box that you can then start screwing with is not an example of this. This is not a missing language feature. Allowing methods of an any P to be hot-swapped or mix-and-matched would break the existential box. You’d never be sure, when you have an any P, if you have a well-formed instance conforming to P, and everyone has the keys to reach in and break it.

This is why I would only want to expose an implementation of NetworkInterfaceProtocol that allows such violations to tests, and call it Test to make it clear it’s only for use as a test double. Now that’s a fine approach. I’ve actually gone one step further and turned such a Test implementation into a Mock implementation by having it record invocations:

typealias Invocation<Params> = (time: Date, params: Params)

final class NetworkInterfaceMock: NetworkInterfaceProtocol {
  private(set) var fetchJobListings_invocations: [Invocation<()>] = []
  var fetchJobListings_setup: () async throws -> [JobListing]
  func fetchJobListings() async throws -> [JobListing] {
    fetchJobListings_invocations.append((.now, ())
    try await fetchJobListings_setup()
  }

  private(set) var fetchApplicants_invocations: [Invocation<JobListing>] = []
  var fetchApplicants_setup: (_ listing: JobListing) async throws -> [Applicant]
  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    fetchApplicants_invocations.append((.now, listing)
    try await fetchApplicants_setup(listing)
  }

  private(set) var fetchJobListingsForApplicant_invocations: [Invocation<Applicant>] = []
  var fetchJobListingsForApplicant_setup: (_ applicant: Applicant) async throws -> [JobListing]
  func fetchJobListing(for applicant: Applicant) async throws -> [JobListing] {
    fetchJobListingsForApplicant_invocations.append((.now, applicant)
    try await fetchJobListingsForApplicantImp(listing)
  }

  func reset() {
    fetchJobListings_invocations.removeAll()
    fetchApplicants_invocations.removeAll()
    fetchJobListingsForApplicant_invocations.removeAll()
  }
}

This kind of boilerplate is a perfect candidate for macros. Make an @Mock macro and you can produce a mock implementation, that records invocations and is fully configurable from the outside, with as a little as @Mock final class MockMyProtocol: MyProtocol {}. You can probably do the same with the type erasing boxes.

Remember I said earlier that Swift’s compile time safety shut off a lot of easy configurability in tests that all executed at runtime (i.e. rewriting methods)? Well, macros is the solution, where this same level of expressiveness is recovered at compile time.

What Can’t You Do with Protocols?

But let’s rewind a little bit. Do you actually need to create this kind of open-ended type whose behavior you can then change, which is a strange thing to be able to do to any type (changing the meaning of its methods) in order to cover the examples shown?

No. For example, in the case where we want a test to be able to swap in its own implementation of fetchJobListings, we don’t have to make any concrete NetworkInterfaceProtocol whose behavior can change ex-post facto. Instead, we can build transforms that take one type of network interface, and create a new type of network interface with new behavior. The key is that we stick to the usual paradigm of a static type having static behavior. We don’t make a particular instance, of course whose type doesn’t change, dynamic to this degree. We create a new instance of a new type:

struct NetworkInterfaceReplaceJobListings<Base: NetworkInterfaceProtocol>: NetworkInterfaceProtocol {
    func fetchJobListings() async throws -> [JobListing] {
    try await fetchJobListings_imp()
  }

  func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
    try await base.fetchApplicants(for: listing)
  }

  func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
    try await base.fetchJobListings(for: applicant)
  }

  init(
    base: Base,
    fetchJobListingsImp: @escaping () async throws -> [JobListings]
  ) {
    self.base = base
    self.fetchJobListingsImp = fetchJobListingsImp
  }

  private let base: Base
  private let fetchJobListingsImp: () async throws -> [JobListings]
}

extension NetworkInterfaceProtocol {
  func reimplementJobListings(`as` fetchJobListingsImp: @escaping () async throws -> [JobListings]) -> NetworkInterfaceReplaceJobListings<Self> {
    .init(base: self, fetchJobListingsImp: fetchJobListingsImp)
  }
}

...

let network = NetworkInterfaceHappyPathFake()
  .reimplementJobListings { 
    throw TestError()
  }

The key difference here is that the network interface with the swapped out implementation is a different type than the original one. This can interact in interesting ways with the generics system. For example, if you implemented your component models to be generics in order to constrain all component models in your app to always use the same network interface type, then this is ensuring that either the entire app uses the network interface with the swapped out implementation, or no one does.

This is more strongly typed. It expresses something I think is very reasonable: a network interface with a swapped out implementation of fetchJobListings is a different type of network interface. However, there’s still an element of dynamism. Each type we create a new NetworkInterfaceReplaceJobListings instance, we can supply a different reimplementation. Everything else is static, but the jobListings implementation is still instance-level varying. This is clear from the fact the type has a closure member. Closures are abstract, so that’s dynamic dispatch. Can we get rid of that exception and our classes fully static? Can we instead make it so that each specific reimplementation of jobListings produces a distinct type of NetworkInterface?

Yes, but it requires a lot of boilerplate, and we unfortunately lose automatic capture so we have to implement it ourselves. Let’s say in a test we’re calling reimplementJobListings inside a method either for a test, or for test setup:

final class SomeTests: XCTestCase {
  var jobListingsCallCount = 0

  func testJobListingsGetsCalledOnlyOnce() {
    let network = NetworkInterfaceHappyPathFake()
      .reimplementJobListings { [weak self] in
        self?.jobListingsCallCount += 1
        return []
      }

    ...

    XCTAssertEqual(jobListingsCallCount, 1)
  }
}

We can replace this with defining a local type of network interface:

final class SomeTests: XCTestCase {
  var jobListingsCallCount = 0

  func testJobListingsGetsCalledOnlyOnce() {
    struct NetworkInterfaceForThisTest: NetworkInterfaceProtocol {
      func fetchJobListings() async throws -> [JobListing] {
        parent?.jobListingsCallCount += 1
        return []
      }

      func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
        try await base.fetchApplicants(for: listing)
      }

      func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
        try await base.fetchJobListings(for: applicant)
      }

      init(
        parent: SomeTests?
      ) {
        self.parent = parent
      }

      private let base = NetworkInterfaceHappyPathFake()
      private weak var parent: SomeTests?
    }

    let network = NetworkInterfaceForThisTest(parent: self)

    ...

    XCTAssertEqual(jobListingsCallCount, 1)
  }
}

Here, we’re going all the way with “static types are static”, meaning a type’s behavior is statically bound to that type. If we want different behavior in a different test, we make a different type. But having to type all of this out, and deal with capturing (in this case weak self manually) is a lot of tedium. Especially in tests, I probably wouldn’t bother, and would bend the rules of a static type having fixed behavior to avoid all this boilerplate.

What I would love to be able to do, though, is be able to do this fully static approach, but avoid having to invent a throwaway name for this local type, and have it close over its context, using the same syntax as always to specify capture:

final class SomeTests: XCTestCase {
  var jobListingsCallCount = 0

  func testJobListingsGetsCalledOnlyOnce() {
    let network = NetworkInterfaceProtocol { [weak self] in
      func fetchJobListings() async throws -> [JobListing] {
        parent?.jobListingsCallCount += 1
        return []
      }

      func fetchApplicants(for listing: JobListing) async throws -> [Applicant] {
        try await base.fetchApplicants(for: listing)
      }

      func fetchJobListings(for applicant: Applicant) async throws -> [JobListing] {
        try await base.fetchJobListings(for: applicant)
      }

      private let base = NetworkInterfaceHappyPathFake()
    }

    ...

    XCTAssertEqual(jobListingsCallCount, 1)
  }
}

This isn’t a very odd language capability, either. Java and Typescript both support this very thing.

Another minor issue here is performance, which I’ll be very frank, I would be shocked if this ever actually matters for something like an iOS app. But it’s interesting to point it out.

The protocol witness pays various runtime costs, in terms of both memory and CPU usage, because it is fundamentally, and entirely, a runtime solution to the problem of swapping out implementations. First, what is the representation of the protocol witness NetworkInterface in memory? Well, it’s a struct, and it looks to be a pretty small one (just three members, which are closures), so it’s likely to be placed on the stack where it’s a local variable. However, the struct itself isn’t doing anything interesting. The real work happens inside the closure members. Those are all existential boxes for a closure, which as we saw above is a concrete type that holds all the captured state and implements a callAsFunction requirement. Depending on how much gets captured, the instances of those concrete types may or may not fit directly into their type erasing boxes. If they don’t, they’ll be allocated on the heap and the box will store a pointer to its instance. This will result in runtime costs of pointer indirection, heap allocation and cache invalidation.

When methods are called, the closure members act as a thunk to the body of the closure, so that’s going to be another runtime cost of pointer indirection.

Contrast that with the first approach shown above (which is really an example of the Proxy Pattern), where the static typing is much stronger. The type that we end up with is a NetworkInterfaceReplaceJobListings<NetworkInterfaceHappyPathFake>. Notice the way the proxy type retains information about the original type that it is proxying. In particular, it does not erase this source type. Furthermore, NetworkInterfaceHappyPathFake is a static type, not a factory that constructs particular instances.

The implication of this is that the network variable can never (even if it was made var) store another type of network interface. We can’t change the definitions of any of the functions on this variable at any point in its life. That means it is known at compile time what the precise behavior is, except for the closure provided as the reimplementation of fetchJobListings. The fact that, for example, the other two calls just call the happy path fake version, is known at compile time. It is known at compile time that the base member is a NetworkInterfaceHappyPathFake. The size of this member, and the closure member, is known at compile time. If it’s small enough it can be put on the stack. There is no runtime dispatch of any of the calls, except for the closure.

If we go to the fully static types where we define new ones locally, we eliminate the one remaining runtime dispatch of the closure, and literally everything is hardwired at compile time.

The cost is all paid at compile time. The compiler looks at that more complex nested/generic type and figures out, from that, how to wire everything up hardcoded. This the difference, in terms of runtime performance, between resolving the calls through runtime mechanisms as in the protocol witness, and resolving the calls through compile mechanisms of generics.

(Caveat: when you call generic code in Swift, supplying specific type parameters, the body of the generic code can be compiled into a dedicated copy for that type parameter, with everything hardwired at compile time, and there will be no runtime cost, but only if the compiler has access to that source code during compilation of your code that calls it, which is generally true only if you aren’t crossing module boundaries. If you do cross a module boundary, when the compiler compiled the module with the generic code, it compiled an “erased” copy of the generic code that replaces the generic parameter with an existential and dynamic-dispatches everything. That’s the only compiled code your module will be able to invoke, and you’ll end up paying runtime costs).

Again, I’m sure this will never matter in an iOS app. We used to write apps in Objective-C, where everything is dynamic dispatch (specifically message passing, the slowest type), on phones half the speed (or less) of what we write them on now. I rather think it’s interesting to point out the trivial improvement in runtime performance (and corresponding, probably also trivial, degradation of compile time performance) of the strongly typed implementation because it further illustrates that the behavior of the program is being worked out at compile time… and it therefore does more validation, and rejection of invalid code, at compile time.

These kinds of techniques where you start building up more sophisticated types, typically with some kind of type composition (through generic parameters), of protocols, with the goal of eliminating runtime value-level (existential) variation and try to express the variation on the type level, is where the power of protocols can really yield fruit, especially in terms of expanding your thinking about what they can do. It’s easy to just give up and say “this can’t be done with protocols or at the type level”, but you should always try. You might be surprised what you can do, and even if you revert to existentials later, you’ll probably learn something useful.

Conclusion

So, do I think you should ever employ this struct witness pattern?

Well, that depends on what it means exactly.

As a refactor of a design with protocols on the basis that it works around language limitations? No, that’s fundamentally confused: those are not language limitations, that’s called a static type system, and if you’re going to throw that away for dynamic typing at least frame it honestly and accurately. Then do your best to convince me dynamic typing is better than static typing (good luck).

What about if we realize those individual functions are where the abstractions should be, because it is correct modeling of the system we’re building to let them vary independently? Well, in that case, the question is: should those members be just closures, or should you define protocols with just one function as their requirements? For example, if you define fetchJobListings to simply be any () async throws -> [JobListing] closure, to me this communicates that any function that has that signature is acceptable. I could have two such functions, both defined as mere closures, but they mean different things and it’s a programmer error to assign one to another. If I introduce two protocols, each of which having a single function requirement, well first I can name that function, thereby indicating what the meaning of this function is. Second, it’s strongly typed: the compiler won’t let me use one where another is expected.

So even in this case, I would want to think carefully about whether closures are sufficient, or if I want to have stronger typing than this and instead define a protocol that implements callAsFunction. Being as biased toward strong typing as I am, it’s likely I’ll choose to write out protocols. That extra boilerplate? The names of those protocols, and possibly the names of their one function requirement? I want that. Terseness has never been my goal. If it were I’d name my variables things like x or y1.

As a hand-written type eraser, intending to be a replacement for the language-provided existential where I need more flexible behavior? Never, I would consider that to be breaking the existential by enabling it to circumvent and violate the static type system.

To implement test doubles? Yes I’d do something that’s effectively the same as a protocol witness, but I wouldn’t replace the protocol with this, I would add to the protocol based design by having this witness implement the protocol.

So, ultimately, as it is presented: no, I never would, and I would consider it to be broadly in the same category as the style of coding where you define your own TheThingType enums that appear as an instance member in a type TheThing, which is a bad style of Swift that tries to replace the language’s type system with a homegrown type system that, however good or even better (which I doubt it will be) it is, will never be verified by the compiler, and that’s a hard dealbreaker for me.

Where might you see me writing code that involves simple structs with a closure member where I then create a handful of different de-facto “types” by initializing the closure in a specific way for each one? In prototyping. Like I’m scaffolding out some code, it’s early and I’m not sure about the design, and because it’s less boilerplate, whipping up that struct is just a little bit faster than writing out the protocol and conforming type hierarchy. Once I’m more sure this abstract type is here to stay, I’m almost certainly going to refactor it to a protocol with the closure member becoming a proper function.

If I can tie together all my decisions regarding this into a single theme, it is: I completely, 100% favor compile time safety over runtime ease or being able to quickly write code without restriction. Because perhaps my central goal of design is to elevate as many programmer errors as possible into compilation failures, and because generally the way to do that is to use statically typed languages and develop a strong (meaning lots of fine-grained) type system, I immediately dislike the protocol witness approach because it degrades the type system that the compiler sees. My goal is not and has never been to avoid or minimize the boilerplate that defining more types entails.

If you’re interested in pursing this style of programming, where you are trying to reject as much invalid code as possible at compile time, my #1 advice, if you’re working in Swift, is to embrace generics and spend time honing your skills with protocols, especially the more advanced features like associated types. If there’s a single concrete goal to drive this, try your best to eliminate as many protocol existentials in your code base as possible (this will almost always involve replacing them with generics). As part of this, you have to treat closures as existentials (which, fundamentally, they are).

You will get frustrated at times, and probably overwhelmed. Run into the challenge, not away. You’ll be happy you did later.

In a follow-up article I will explore a more nontrivial conceptual model of an abstract type with concrete implementations, and what happens (specifically what breaks down, especially related to compile time safety) if you try to build it will protocol witnesses.

Testing Async Code

Introduction

By now, pretty much every major modern OOP language has gained async/await. Java is the odd man out (they have a very different way of handling lightweight concurrency).

So, you go on a rampage, async-ifying all the things, and basking in the warm rays of the much more readable and naturally flowing fruits of your labor. Everything is fine, everything is good.

Then you get to your tests.

And suddenly you’re having lots of problems.

What’s the issue, exactly? Well, it isn’t async code per se. In fact, usually the test frameworks that are used with each language support test methods that are themselves async. If you want to test an async method, simply make the test method async, call the async method you’re testing, and await the result. You end up with what is functionally the same test as if everything were synchronous.

But what if the async code you want to test isn’t awaitable from the test?

Async vs. Await

This is a good time to review the paradox that async and await are almost antonyms. Asynchrony allows for concurrency: by not synchronizing two events, other stuff is able to happen between the two events. But adding async to code just allows you to await it. “Waiting” means to synchronize with the result: don’t continue executing below the await until the result is ready. What exactly is “asynchronous” about that?

Let’s look at an example async method:

func doSomeStuff() async -> String {

  let inputs = Inputs(5, "hello!", getSomeMoreInputs())

  let intermediateResult = await processTheInputs()

  let anotherIntermediateResult = await checkTheResultForProblems()

  logTheResults(anotherIntermediateResult)

  await save(result: anotherIntermediateResult)

  return "All done!"
}

Asynchrony appears in three places (where the awaits are). And yet this code is utterly sequential. That’s the whole point of async-await language features. Writing this code with callbacks, while functionally equivalent, obscures the fact that this code all runs in a strict sequence, with the output of one step being used as the input to later steps. Async-await restores the natural return-value flow of code while also allowing it to be executed in several asynchronous chunks.

So then why bother making stuff async and awaiting it? Why not just make the functions like processTheInputs synchronous? After all, when you call a regular old function, the caller waits for the function to return a result. What’s different?

The answer is how the threads work. If the function were synchronous, one thread would execute this from beginning to end, and not do anything else in the process. Now if the functions like processTheInputs are just crunching a bunch of CPU instructions, this makes sense, because the thread keeps the CPU busy with something throughout the entire process. But if the function is asking some other thread, possibly in some other process or even on another machine, to do something for us, our thread has nothing to do, so it has to sit there and wait. You typically do that, ultimately, by using a condition variable: that tells the operating system this thread is waiting for something and not to bother giving it a slice of CPU time.

This doesn’t waste CPU resources (you aren’t busy waiting), but you do waste other resources, like memory for the thread’s stack. Using asynchrony lets multiple threads divide and conquer the work so that none of them have to sit idle. The await lets the thread go jump to other little chunks of scheduled work (anything between two awaits in any async function) while the result is prepared somewhere else.

Okay, but this is a rather esoteric implementation detail about utilizing resources more efficiently. The code is logically identical to the same method with the async and awaits stripped out… that is, the entirely synchronous flavor of it.

So then why does simply marking a method as async, and replacing blocking operations that force a thread to go dormant with awaits that allow the thread to move onto something else, and a (potentially) different thread to eventually resume the work, suddenly make testing a problem?

It doesn’t. Just add async to the test method, and add await to the function call. Now the test is logically equivalent to the version where everything is synchronous.

The problem is when we introduce concurrency.

The Problem

How do we call async code? Well, if we’re already in an async method, we can just await it. If we’re not in an async method, what do we do?

It depends on the language, but in some way or another, you start a background task. Here, “background” means concurrency: something that happens “in the background”, meaning it hums along without interfering with whatever else you want to do in the mean time. In .NET, this means either calling an async function that returns a Task but not awaiting it (which causes the part of the task before the first suspension to run synchronously), or calling Task.Run(...) (which causes the entire part of the task to run concurrently). In Swift, it means calling Task {...}. In Kotlin, it means calling launch { ... }. In Javascript, similar to .NET, you call an async function that returns a Promise but don’t await it, or construct a new Promise that awaits the async function then resolves itself (and, of course, don’t await this Promise).

That’s where the trouble happens.

This is how you kick off async code from a sync method. The sync method continues along, while the async code executes concurrently. The sync method does not wait for the async code to finish. The sync code can, and often will, finish before the async code does. We can also kick off concurrent async work in an async method, the exact same way. In this case we’re allowed to await the result of this concurrent task. If we do, then the outer async function will only complete after that concurrent work completes, and so awaiting the outer function in a test will ensure both streams of work are finished. But if the outer function only spawns the inner task and doesn’t await it, the same problem exists: the outer task can complete before the inner task does.

Let’s look at an example. You create a ViewModel for a screen in your app. As soon as it gets created, in the init, it spawns a concurrent task to download data from your server. When the response comes in, it is saved to a variable that the UI can read to build up what it will display to the user. Before it’s ready, we show a spinner:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() {
    Task {
      let (data, response) = try await urlSession.data(for: dataRequest)

      try await MainActor.run {
        self.results = try JSONDecoder().decode([String].self, from: data)
      }
    }
  }
}

struct MyView: View {
  @ObservedObject var viewModel: MyViewModel = .init()

  var body: some View {
    if let results = model.result {
      ForEach(results, id: \.self) { result in 
        Text(result)
      }
    } else {
      ProgressView()
    }
  }
}

You want to test that the view model loads the results from the server and stores them in the right place:

final class MyViewModelTests: XCTestCase {
  func testLoadData() {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

You run this test, and it passes… sometimes. Sometimes it fails, complaining that viewModel.results is nil. It’s actually kind of surprising that it ever passes. The mock server response you set up can be fetched almost instantaneously, it doesn’t have to actually call out to a remote server, so the Task you spin off the init completes in a few microseconds. It also takes a few microseconds for the thread running testLoadData to get from the let viewModel = ... line to the XCTAssertEqual line. The two threads are racing against each other: if the Task inside the init wins, the test passes. If not, it fails, and viewModel.results will be nil because that’s the initial value and it didn’t get set by the Task yet.

Do we fix this by making things async? Let’s do this:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() async throws {
    let (data, response) = try await urlSession.data(for: dataRequest)

    try await MainActor.run {
      self.results = try JSONDecoder().decode([String].self, from: data)
    }
  }
}

...

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = try await MyViewModel()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

Now it passes every time, and it’s clear why: it’s now the init itself that awaits the work to set the results. And the test is able to (in fact it must) await the init, so we’re sure this all completes before we can get past the let viewModel = ... line.

But the view no longer compiles. We’re supposed to be able to create a MyView(), which creates the default view model without us specifying it. But that init is now async. We would have to make an init() async throws for MyView as well. But MyView is part of the body of another view somewhere, and that can’t be async and so can’t await this init.

Plus, this defeats the purpose of the ProgressView. In fact, since results is set in the init, we can make it a non-optional let, never assigning it to nil. Then the View will always have results and never need to show a spinner. That’s not what we want. Even if we push the “show a spinner until the results are ready” logic outside of MyView, we have to solve that problem somewhere.

This is a problem of concurrency. We want the spinner to show on screen concurrent with the results being fetched. The problem isn’t the init being async per se, it’s that the init awaits the results being fetched. We can keep the async on the init but we need to make the fetch concurrent again:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() async throws {
    Task {
      let (data, response) = try await urlSession.data(for: dataRequest)

      try await MainActor.run {
        self.results = try JSONDecoder().decode([String].self, from: data)
      }
    }
  }
}

...

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = try await MyViewModel()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

Now we’re back to the original problem. The async on the init isn’t helping. Sure we await that in the test, but the thing we actually need to wait for is concurrent again, and we get the same flaky result as before.

This really has nothing to do with the async-await language feature per se. The exact same problem would have arisen had we achieved our desired result in a more old school way:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() {
    DispatchQueue.global().async { // This really isn't necessary, but it makes this version more directly analogous to the async-await one
            
      URLSession.shared.dataTask(with: .init(url: .init(string: "")!)) { data, response, _ in
        guard let data, let response else {
          return
        }
                
        DispatchQueue.main.async {
          self.results = try? JSONDecoder().decode([String].self, from: data)
        }
      }
    }
  }
}

There’s still no way for the test to wait for that work thrown onto a background DispatchQueue to finish, or for the work to store the result thrown back onto the main DispatchQueue to finish either.

So what do we do?

The Hacky “Solution”

Well, the most common way to deal with this is ye old random delay.

We need to wait for the spun off task to finish, and the method we’re calling can finish slightly earlier. So after the method we call finishes, we just wait. How long? Well, for a… bit. Who knows. Half a second? A tenth of a second? It depends on the nature of the task too. If the task tends to take a couple seconds, we need to wait a couple seconds.

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    try await Task.sleep(0.2) // That's probably long enough!

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

(Note that Task.yield() is just another flavor of this, it’s just causing the execution to pause for some indeterminate brief amount of time)

To be clear, this “solves” the problem no matter what mechanism of concurrency we decided to use: async-await, dispatch queues, run loops, threads… doesn’t matter.

Whatever the delay is, you typically discover it by just running the test and seeing what kind of delay makes it usually pass. And that works… until random fluctuations in the machine running the tests make it not work, and you have to bump the delay up slightly.

That, ladies and gentlemen, is how you get brittle tests.

Is This a Solution?

The problem is that by adding a delay and then going ahead with our assertions as though the task completed in time, we’re encoding into our tests a requirement that the task complete within the time we chose. The time we chose was utterly random and tuned to make the test pass, so it’s certainly not a valid requirement. You don’t want tests that inadvertently enforce nonexistent requirements.

I’ve heard some interesting takes on this. Well, let’s think about the requirements. Really, there is a requirement that mentions a time window. After all, the product owner wouldn’t be happy if this task completed 30 minutes after the sync method (the “when” of the scenario) got triggered. The solution, according to this perspective, is to sit down with the product owner, work out the nonfunctional requirements regarding timing constraints (that this ought to finish in no more than some amount of time), and voilà: there’s your delay amount, and now your tests are enforcing real requirements, not made up ones.

But hold on… why must there be a nonfunctional requirement about timing? This is about a very technical detail in code that concerns how threads execute work on the machine, and whether it’s possible for the test method to know exactly when some task that got spun off has finished. Why does this implementation detail imply that a NFR about timing exists? Do timing NFRs exist for synchronous code? After all, nothing a computer does is instantaneous. If this were true, then all requirements, being fundamentally about state changes in a computer system, would have to mention something about the allowed time constraints for how long this state change can take.

Try asking the product owner what the max allowed time should be. Really, ask him. I’ll tell you what his answer will be 99% of the time:

“Uhh, I don’t know… I mean, as short as it can be?”

Exactly. NFRs are about tuning a system and deciding when to stop optimizing. Sure, we can make the server respond in <10ms, but it will take a month of aggressive optimization. Not worth it? Then you’ll get <100ms. The reason to state the NFR is to determine how much effort needs to be spent obtaining realistic but nontrivial performance.

In the examples we’re talking about with async methods, there’s no question of optimization. What would that even mean? Let’s say the product owner decides the results should be ready in no more than 10 seconds. Okay, first of all, that means this test has to wait 10 seconds every time it runs! The results will actually be ready in a few microseconds, but instead every test run costs 10 seconds just to take this supposed NFR into account. It would be great if the test could detect that the results came in sooner that the maximum time and go ahead and complete early. But if we could solve that problem, we’d solve the original problem too (the test knowing when the results came in).

Even worse, what do we do with that information? The product owner wants the results in 10ms, but the response is large and it takes longer for the fastest iPhone to JSON decode it. What do we do with this requirement? Invent a faster iPhone?

Fine, then the product owner will have to take the limitations of the system into account when he picks the NFR. Well now we’re just back to “it should be as fast as it reasonably can”. So, like, make the call as early as possible, then store the results as soon as they come in.

The Real Requirements

The requirement, in terms of timing, is quite simply, that the task get started immediately, and that it finish as soon as it can, which means that the task only performs what it needs to, and nothing else.

These are the requirements we should be expressing in our tests. We shouldn’t be saying “X should finish in <10 s”, we should be saying “X should start now, and X should only do A, B and C, and nothing else”.

The second part, that code should only do something, and not do anything else, is tricky because it’s open-ended. How do you test that code only does certain things? Generally you can’t, and don’t try to. But that’s not the thing that’s likely to “break”, or what a test in a TDD process needs to prove got implemented.

The first part… that’s what our tests should be looking for… not that this concurrent task finishes within a certain time, but that it is started immediately (synchronously!). We of course need to test that the task does whatever it needs to do. That’s a separate test. So, really, what starts off as one test:

Given some results on the server
When we create a view model
Then the view model’s results should be the server’s results

Ends up as two tests:

When we create the view model
Then the “fetch results” task should be started with the view model

Given some results on the server
And the “fetch results” task is running with the view model
When the “fetch results” finishes
Then the view model’s results should be the server’s results

I should rather say this ended up as two requirements. Writing two tests that correspond exactly to these two tests is a more advanced topic that I will talk about another time so it doesn’t distract too much from the issue here.

For now let’s just try to write a test that indeed tests both of these requirements together, but still confirms that the task does start and eventually finish, doing what it is supposed to do in the process, without having to put in a delay.

Solution 1: Scheduler Abstraction

The fundamental problem is that a async task is spun off concurrently, and the test doesn’t have access to it. How does that happen? By initializing a Task. This is, effectively, how you schedule some work to run concurrently. By writing Task.init(...), we are hardcoding this schedule call to the “real” scheduling of creating a Task. If we can abstract this call, then we can substitute test schedulers that allow us more capabilities, like being able to see what was scheduled and await all of it.

Let’s look at the interface for Task.init:

struct Task<Success, Failure: Error> {

}

extension Task where Failure == Never {

  @discardableResult
  init(
    priority: TaskPriority? = nil, 
    operation: @Sendable @escaping () async -> Success
  )
}

extension Task where Failure == Error {

  @discardableResult
  init(
    priority: TaskPriority? = nil, 
    operation: @Sendable @escaping () async throws -> Success
  )
}

There are actually some hidden attributes that you can see if you look at source code for this interface. I managed to catch Xcode admitting to me once that these attributes are there, but I can’t remember how I did it and can no longer reproduce it. So really this is the interface:

extension Task where Failure == Never {

  @discardableResult
  @_alwaysEmitIntoClient
  init(
    priority: TaskPriority? = nil, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  )
}

extension Task where Failure == Error {

  @discardableResult
  @_alwaysEmitIntoClient
  init(
    priority: TaskPriority? = nil, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  )
}

Okay, so let’s write an abstract TaskScheduler type that presents this same interface. Since protocols don’t support default parameters we need to deal with that in an extension:

protocol TaskScheduler {
  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never>

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error>
}

extension TaskScheduler {
  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never> {
    schedule(priority: nil, operation: operation)
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error> {
    schedule(priority: nil, operation: operation)
  }
}

Now we can write a DefaultTaskScheduler that just creates the Task and nothing else:

struct DefaultTaskScheduler: TaskScheduler {
  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never> {
    .init(priority: priority, operation: operation)
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error> {
    .init(priority: priority, operation: operation)
  }
}

And we can introduce this abstraction into MyViewModel:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init(taskScheduler: some TaskScheduler = DefaultTaskScheduler()) {
    taskScheduler.schedule {
      let (data, response) = try await urlSession.data(for: dataRequest)

      self.results = try JSONDecoder().decode([String].self, from: data)
    }
  }
}

Now in tests, we can write a RecordingTaskScheduler decorator that records all tasks scheduled on some other decorator, and lets us await them all later. In order to do that, we need to be able to store tasks with any Success and Failure type. Then we need to be able to await them all. How do we await a Task? By awaiting its value:

extension Task {
  public var value: Success {
    get async throws
  }
}

extension Task where Failure == Never {
  public var value: Success {
    get async
  }
}

In order to store all running Tasks of any success type, throwing and non-throwing, and to be able to await their values, we need a protocol that covers all those cases:

protocol TaskProtocol {
  associatedtype Success

  public var value: Success {
    get async throws
  }
}

extension Task: TaskProtocol {}

Now we can use this in RecordingTaskScheduler:

final class RecordingTaskScheduler<Scheduler: TaskScheduler>: TaskScheduler {
  private(set) var runningTasks: [any TaskProtocol] = []

  func awaitAll() async throws -> {
    // Be careful here.  While tasks are running, they may schedule more tasks themselves.  So instead of looping over runningTasks, we need to keep repeatedly pull off the next one if it's there.
    while !running tasks.isEmpty {
      let task = runningTasks.removeFirst()
      _ = try await task.value
    }
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never> {
    let task = scheduler.schedule(priority: priority, operation: operation)
    runningTasks.append(task)
    return task
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error> {
    let task = scheduler.schedule(priority: priority, operation: operation)
    runningTasks.append(task)
    return task
  }

  init(scheduler: Scheduler) {
    self.scheduler = scheduler
  }

  let scheduler: Scheduler
}

extension TaskScheduler {
  var recorded: RecordingTaskScheduler<Self> {
    .init(scheduler: self)
  }
}

You probably want to make that runningTasks state thread safe.

Now we can use this in the test:

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let taskScheduler = DefaultTaskScheduler().recorded
    let viewModel = MyViewModel(taskScheduler: taskScheduler)

    try await taskScheduler.awaitAll()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

We’ve replaced the sleep for an arbitrary time with a precise await for all scheduled tasks to complete. Much better!

Now, there are other ways to schedule concurrent work in Swift. Initializing a new Task is unstructured concurrency: creating a new top-level task that runs independently of any other Task, even if it was spawned an async function being run by another Task. The other ways to spawn concurrent work are the structured concurrency APIs: async let and with(Throwing)TaskGroup. Do we need to worry about these causing problems in tests?

No, we don’t. The consequence of the concurrency being structured is that these tasks spawned inside another Task are owned by the outer Task (they are “child tasks” of that “parent task”). This primarily means parent tasks cannot complete until all of their child tasks complete. It doesn’t matter if you explicitly await these child tasks or not. The outer task will implicitly await for all that concurrent work at the very end (before the return) if you didn’t explicitly await it earlier than that. Thus, as long as the top-level Task that owns all these child tasks is awaitable in the tests, then doing so will await all those concurrent child tasks as well.

It’s only the unstructured part of concurrency we need to worry about. That is handled by Task.init, and our TaskScheduler abstraction covers it.

(It’s becoming popular to claim that “unstructured concurrency is bad” and that you should replace all instances of it with structured concurrency, but this doesn’t make sense. Structured concurrency might very well be called structured awaiting. When you actually don’t want one thing to await another, i.e. genuine concurrency, unstructured concurrency is exactly what you need. The view model where we made init async throws is an example: it’s not correct to use async let to kick off the fetch work, because that causes init to await that child task, and destroys the very concurrency we’re seeking to create.)

Things look pretty similar in other platforms/frameworks, with some small caveats. In Kotlin, the way to spawn concurrent work is by calling CoroutineScope.launch. There’s a global one available, but many times you need to launch coroutines from other scope. This is nice because this is already basically the abstraction we need. Just make it configurable in tests, and make a decorator for CoroutineScope that additionally records the launched coroutines and lets you await them all. You might even be able to do this by installing a spy with mockk.

In .NET, the equivalent to Task.init in Swift is Task.Run:

void SpawnATask() {
  Task.Run(async () => { await DoSomeWork(); });
}

Task DoSomeWork() async {
  ...
}

Task.Run is really Task.Factory.StartNew with parameters set to typical defaults. Whichever one you need, you can wrap it in a TaskScheduler interface. Let’s assume Task.Run is good enough for our needs:

interface ITaskScheduler {
  Task Schedule(Func<Task> work);
  Task<TResult> Schedule<TResult>(Func<TResult> work);
  Task<TResult> Schedule<TResult>(Func<Task<TResult>> work);
}

struct DefaultTaskScheduler: ITaskScheduler {
  Task Schedule(Func<Task> work) {
    Task.Run(work);
  }

  Task<TResult> Schedule<TResult>(Func<TResult> work) {
    Task.Run(work);
  }

  Task<TResult> Schedule<TResult>(Func<Task<TResult>> work) {
    Task.Run(work);
  }
}

Then we replace naked Task.Run with this abstraction:

void SpawnATask(ITaskScheduler scheduler) {
  scheduler.Schedule(async () => { await DoSomeWork(); });
}

And similarly we can make a RecordingTaskScheduler to allow tests to await all scheduled work:

sealed class RecordingTaskScheduler: ITaskScheduler {
  public IImmutableQueue<Task> RunningTasks { get; private set; } = ImmutableQueue.Empty;

  Task AwaitAll() async {
    while !RunningTasks.IsEmpty {
      Task task = RunningTasks.Dequeue();
      await task;
    }
  }

  Task Schedule(Func<Task> work) {
    var task = _scheduler.schedule(work);
    RunningTasks = RunningTasks.Enqueue(task);
    return task;
  }

  Task<TResult> Schedule<TResult>(Func<TResult> work) {
    var task = _scheduler.schedule(work);
    RunningTasks = RunningTasks.Enqueue(task);
    return task;
  }

  Task<TResult> Schedule<TResult>(Func<Task<TResult>> work) {
    var task = _scheduler.schedule(work);
    RunningTasks = RunningTasks.Enqueue(task);
    return task;
  }

  RecordingTaskScheduler(ITaskScheduler scheduler) {
    _scheduler = scheduler;
  }

  static RecordingTaskScheduler Recorded(this ITaskScheduler scheduler) {
    new(scheduler);
  }

  private ITaskScheduler _scheduler;
}

Because in C#, generic classes are generally created as subclasses of a non-generic flavor (Task<TResult> is a subclass of Task), we don’t have to do any extra work to abstract over tasks of all result types.

There’s a shark in the water, though.

Async/await works a little differently in C# than in Swift. Kotlin is Swiftlike, while Typescript and C++ are C#-like in this regard.

In C# (and Typescript and C++), there aren’t really two types of functions (sync and async). The async keyword is just a request to the compiler to rewrite your function to deal with the awaits and return a special type, like Task or Promise (you have to write your own in C++). And correspondingly, an async function can’t just return anything, it has to return one of these special types. But that’s all that’s different. Specifically, you can call async functions from anywhere in these languages. What you can’t do in non-async functions is await. You can call that async function from a non-async function, you just can’t await the result, which is always going to be some awaitable type like Task or Promise.

Long story short, you can do this:

void SpawnATask() {
  DoSomeWork();
}

Task DoSomeWork() async {
  ...
}

There’s a slight difference in how this executes, compared to wrapping it in Task.Run, but I certainly hope you aren’t writing code that depends in any way on that difference. So you should be able to wrap it in Task.Run, and then change that to scheduler.Schedule.

But before you make that change, this is a sneaky little ninja lurking around in your codebase. It’s really easy to miss. If you’re running a test, it’s failing clearly due to not waiting long enough, and you’re going crazy because you searched for every last Task.Run (or Task. in general) in your code base, the culprit can be one of these crypto task constructors that you’d never even notice is spawning concurrent work. Just keep that in mind. Again, it should be fine to wrap it in scheduler.Schedule.

This isn’t a thing in Swift/Kotlin because they do async/await differently. In those languages there are two types of functions, and you simply aren’t allowed to call async functions from non-async ones. You have to explicitly call something like Task.init to jump from one to another.

A Not So Good Solution

This is not a new problem. I showed earlier that we can handle the concurrency with DispatchQueue. Similarly, and I’ve done this plenty of times, you would write an abstraction that captures work scheduled on global queues so the test can synchronize with it…

…well, no that’s not exactly what I did. I did something a little different. First I made a protocol so that I can customize what happens when I dispatch something to a queue:

protocol DispatchQueueProtocol {
  func async(
    group: DispatchGroup?, 
    qos: DispatchQoS, 
    flags: DispatchWorkItemFlags, 
    execute work: @escaping @convention(block) () -> Void
  )
}

extension DispatchQueueProtocol {
  // Deal with default parameters
  func async(
    group: DispatchGroup? = nil, 
    qos: DispatchQoS = .unspecified, 
    flags: DispatchWorkItemFlags = [], 
    execute work: @escaping @convention(block) () -> Void
  ) {
    // Beware of missing conformance and infinite loops!
    self.async(
      group: group,
      qos: qos,
      flags: flags,
      execute: work
    )
  }
}

extension DispatchQueue: DispatchQueueProtocol {}

Just as with the scheduler, the view model takes a dispatch queue as an init parameter:

final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init(
    backgroundQueue: some DispatchQueueProtocol = DispatchQueue.global(),
    mainQueue: some DispatchQueueProtocol = DispatchQueue.main
  ) {
    backgroundQueue.async {
            
      URLSession.shared.dataTask(with: .init(url: .init(string: "")!)) { data, response, _ in
        guard let data, let response else {
          return
        }
                
        mainQueue.async {
          self.results = try? JSONDecoder().decode([String].self, from: data)
        }
      }
    }
  }
}

Then I defined a test queue that didn’t actually queue anything, it just ran it outright:

struct ImmediateDispatchQueue: DispatchQueueProtocol {
  func async(
    group: DispatchGroup?, 
    qos: DispatchQoS, 
    flags: DispatchWorkItemFlags, 
    execute work: @escaping @convention(block) () -> Void
  ) {
    work()
  }
}

And if I supply this queue for both queue parameters in the test, then it doesn’t need to wait for anything. I have, in fact, removed concurrency from the code entirely. That certainly solves the problem the test was having!

This is often the go-to solution for these kinds of problems: if tests are intermittently failing because of race conditions, just remove concurrency from the code while it is being tested. How would you do this with the async-await version? You need to be able to take control of the “executor”: the underlying object that’s responsible for running the synchronous slices of async functions between the awaits. The default executor is comparable to DispatchQueue.global, it uses a shared thread pool to run everything. You would replace this with something like an ImmediateExecutor, which just runs the slice in-line. That causes “async” functions to become synchronous.

Substituting your own executor is possible in C# and Kotlin. It’s not in Swift. They made one step in this direction in 5.9, but they’re still working on it.

However, even if it was possible, I don’t think it’s a good idea.

Changing the underlying execution model from asynchronous to synchronous significantly changes what you’re testing. Your test is running something that is quite different than what happens in prod code. For example, by making everything synchronous and running in a single thread, everything becomes thread safe by default (not necessarily reentrant). If there are any thread safety issues with the “real” version that runs concurrently, your test will be blind to that. On the opposite side, you might encounter deadlocks by running everything synchronously that don’t happen when things run concurrently.

It’s just not as accurate as a test as I’d like. I want to exercise that concurrent execution.

That’s why I prefer to not mess with how things execute, and just record what work has been scheduled so that the test can wait on it. This is a little more convoluted to get working in the DispatchQueue version, but we can do it:

final class RecordingDispatchQueue<Queue: DispatchQueueProtocol>: DispatchQueueProtocol {
  private(set) var runningTasks: [DispatchGroup] = []

  func waitForAll(onComplete: @escaping () -> Void) {
    let outerGroup = DispatchGroup() 
    while let innerGroup = runningTasks.first {
      runningTasks.removeFirst()
      outerGroup.enter()
      innerGroup.notify(queue: .global(), execute: outerGroup.leave)
    }
    
    outerGroup.notify(queue: .global(), execute: onComplete)
  }

  func async(
    group: DispatchGroup?, 
    qos: DispatchQoS, 
    flags: DispatchWorkItemFlags, 
    execute work: @escaping @convention(block) () -> Void
  ) {
    let group = DispatchGroup()
    group.enter()

    queue.async {
      work()
      group.leave()
    }

    runningTasks.append(group)
  }

  init(queue: Queue) {
    self.queue = queue
  }

  private let queue: Queue
}

extension DispatchQueueProtocol {
  var recorded: RecordingDispatchQueue<Self> {
    .init(queue: self)
  }
}

Then in the tests:

final class MyViewModelTests: XCTestCase {
  func testLoadData() {
    let mockResults = setupMockServerResponse()

    let backgroundQueue = DispatchQueue.global().recorded
    let mainQueue = DispatchQueue.main.recorded

    let viewModel = MyViewModel(
      backgroundQueue: backgroundQueue,
      mainQueue: mainQueue
    )

    let backgroundWorkComplete = self.expectation(description: "backgroundWorkComplete")
    let mainWorkComplete = self.expectation(description: "mainWorkComplete")

    backgroundQueue.waitForAll(onComplete: backgroundWorkComplete.fulfill)
    mainQueue.waitForAll(onComplete: mainWorkComplete.fulfill)

    waitForExpectations(timeout: 5.0, handler: nil)

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

A lot less pretty than async-await, but functionally equivalent.

Solution 2: Rethink the Problem

Abstracting task scheduling gives us a way to add a hook for tests to record scheduled work and wait for all of it to complete before making its assertions. Instead of just randomly waiting and hoping it’s long enough for everything to finish, we expose the things we’re waiting for so we can known when they’re done. This solves the problem we had of the test needing to know how long to wait… but are we thinking about this correctly?

Why does the test need to be aware that tasks are being spawned and concurrent work is happening? Does the production code that uses the object we’re testing need to know all of that? It didn’t look that way. After all we started with working production code where no scheduling abstraction was present, the default scheduling mechanism (like Task.init) was hardcoded inside the private implementation details of MyViewModel, and yet… everything worked. Specifically, the object interacting with the MyViewModel, MyView, didn’t know and didn’t care about any of this.

Why, then, do the tests need to care? After all, why in general do tests need to know about private implementation details? And it is (or at least was, before we exposed the scheduler parameter) a totally private implementation detail than any asynchronous scheduling was happening at all.

What were we trying to test? We wanted to test, basically, that our view shows a spinner until results become ready, that those results will eventually become ready, and that they will be displayed once they are ready. We don’t want to involve the view in the test so we don’t literally look for spinners or text rows, instead we test that the view model instructs the view appropriately. The key is asking ourselves: why is this “we don’t know exactly when the results are ready” problem not a problem for the view? How does the view get notified that results are ready?

Aha!

The view model’s results are an @Published. It is publishing changes to its results to the outside world. See, we’ve already solved the problem we have in tests. We had to, because it’s a problem for production code too. It’s perhaps obscured a bit inside the utility types in SwiftUI, but the view is notified of updates by being bound to an ObservableObject, that has a objectWillChange publisher that fires any time any of its @Published members are about to change (specifically in willSet). This triggers an evaluation of the view’s body in the next main run loop cycle, where it reads from viewModel.results.

So, we just need to simulate this in the test:

final class MyViewModelTests: XCTestCase {
  @MainActor
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    let updates = viewModel
      .objectWillChange
      .prepend(Just()) // We want to inspect the view model immediately in case the change already occurred before we start observing
      .receive(on: RunLoop.main) // We want to inspect the view model after updating on the next main run loop cycle, just like the view does

    for await _ in updates.values {
      guard let results = viewModel.results else {
        continue
      }

      XCTAssertEqual(viewModel.results, mockResults)
      break
    }
  }
}

Now this test is faithfully testing what the view does, and how a manual tester would react to it: the view model’s results are reevaluated each time the view model announces it updated, and we wait until results appear. Then we check that they equal the results we expect.

With this, a stalling test is now a concern. If the prod code is broken, the results may never get set, and then this will wait forever. So we should throw it in some sort of timeout check. Usually test frameworks come with timeouts already. Unfortunately XCTestCase only has a resolution of 1 minute. It would be nice to specify something like 5 seconds, so we can write our own withTimeout function (I won’t show the implementation here):

final class MyViewModelTests: XCTestCase {
  @MainActor
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    let updates = viewModel
      .objectWillChange
      .prepend(Just()) // We want to inspect the view model immediately in case the change already occurred before we start observing
      .receive(on: RunLoop.main) // We want to inspect the view model after updating on the next main run loop cycle, just like the view does
 
    try await withTimeout(seconds: 5.0) {
      for await _ in updates.values {
        guard let results = viewModel.results else {
          continue
        }

        XCTAssertEqual(viewModel.results, mockResults)
        break
      }
    }
  }
}

The mindset here is to think about why anyone cares that this concurrent work is started to begin with. The only reason why anyone else would care is that they have access to some kind of state or notification that is changed or triggered by the concurrent task. Whatever that is, it’s obviously public (otherwise no one else could couple to it, and we’re back to the original question), so have your test observe that instead of trying to hook into scheduling of concurrent work.

In this example, it was fairly obvious what the observable side effect of the task running was. It may not always be obvious, but it has to exist somewhere. Otherwise you’re trying to test a process that no one can possibly notice actually happened, in which case why is it a requirement (why are you even running that task)? Whether this is setting some shared observable state, triggering some kind of event that can be subscribed to, or sending a message out to an external system, all of those can be asserted on in tests. You shouldn’t need to be concerned about tasks finishing, that’s an implementation detail.

Conclusion

We found a way to solve the immediate problem of a test not knowing how long to wait for asynchronous work to complete. As always, introducing an abstraction is all that’s needed to be able to insert a test hook that provides the missing information.

But the more important lesson is that inserting probes like this into objects to test them raises questions: why would you need to probe an object in a way production code can’t to test the way that object behaves from the perspective of the production objects that interact with it (that is, after all, all that matters)? I’m not necessarily saying there’s never a good answer for this. At the very least it may replace a more faithful (in terms of recreating what users do) but extremely convoluted, slow and unreliable test with a much more straightforward, fast and reliable one that “cheats” slightly (is the risk of the convolutedness, skipping runs because it’s slow and ignoring its results because it’s unreliable more or less than the risk of producing invalid results by cheating?).

But you should always ask this question, and settle for probing the internals of behavior only after you have exhaustively concluded that it really is necessary, and for good reason. In the case of testing concurrent tasks, the point of waiting for tasks to “complete” is that the tasks did something, and it’s this side effect that you care about. You should assert on the side effect, which is the only real meaningful test that code executed (and it’s all that matters).

In general, if you write concurrent code, you will already solve the notification problem for the sake of production code. You don’t need to insert more hooks to observe the status of jobs in tests. The only reason the status of that job could matter is because it affects something that is already visible, and it is possible to listen to and be immediately notified of when that change happens. Whether it’s an @Published, an Observable/Publisher, a notification, event, C# delegate, or whatever else, you have surely already introduced one of these things into your code after introducing a concurrent task. Either that, or you’re calling out to an external system, and that can be mocked.

Just find that observable state that the job is affecting, and then you have something to listen to in tests. Introducing the scheduling interface is a way to quickly cover areas with tests and get reliable passing, but you should always revisit this and figure out what the proper replacement is.

On Code Readability

Introduction

I have many times encountered the statement from developers that “90% of time is spent reading code”, and therefore readability is a very important quality of code. So important, in fact, that it can overtake other qualities. This is almost always used to justify less abstract code. Highly abstracted code can be “hard to read”, so much so that it’s positive qualities (less duplication, more scalable and maintainable, etc.) may be invalidated by the fact it is just so hard to read. More “straightforward” code, perhaps with the lower level details spelled out explicitly in line may suffer problems of maintainability, but since we spend most of our time reading it, not making changes to it, the less abstract but easier to read code may still be better.

Since you hear this exact phrase (the 90% line), it’s obviously coming from some influential source. That would be Uncle Bob’s book, Clean Code: A Handbook of Agile Software Craftsmanship:

Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.

Now, he’s saying improving readability improves modifiability. But these discussions end up becoming about readability over modifiability. I’ll dig into why that matters.

Excuses for poorly abstracted code are abundant, particularly in “agile” communities. The first one I encountered was spoken by the first scrum master I ever had. Basically, since agile development is about rapid iteration, thinking design-wise more than two weeks ahead is wasteful, and you instead just want to focus on getting a barely working design out quickly, to complete the iteration before starting the next one.

Then, I heard more excuses later, ranging from YAGNI to “junior developers can’t understand this” to arguments about emergent design and “up-front design is bad” (the implication being that writing highly abstract code always counts as upfront design). There are a lot of misconceptions around this topic, especially the idea that building up a runway of library tools (i.e. abstractions) that can be quickly and easily composed into new features somehow hurts your ability to quickly implement new features with short turnaround (in other words, agility), but that’s not what I want to talk about here.

I just want to talk about this argument that “code should be easy to read”, implying that some other quality of it should be sacrificed. The reason I bring this other stuff up is to put it in context: developers are constantly looking for excuses to write less abstract code, so the “readability” argument isn’t unique at all. It’s just another of many avenues to reach the same conclusion. You should view these arguments with a bit of suspicion, as all of these unrelated premises land at the same place. Perhaps the conclusion was foregone, and people are working backward from it to various starting points.

(The simple, banal, reason developers look for excuses to not write abstract code is because abstracting is a form of low time preference: investing into the future instead of blowing all your money today, or running your credit card up. Investment requires delaying gratification, which is unpleasant here and now. It’s the same reason everyone wants to be fit but no one wants to go to the gym.)

For transparency’s sake, I’m the one who thinks abstraction, lots of it, is literally the only way any of us can begin to comprehend modern software. If you don’t believe me, remove all the abstractions and read the compiled code in assembly. This is why whenever I see a principle like YAGNI or readability used to conclude “un-abstracted code is better, actually”, I become extremely suspicious of the principle. At best it’s being completely misunderstood. Sometimes on closer inspection the principles turn out to be nonsensical.

I think that’s the case with “readability”. It doesn’t make sense as a principle of good software design.

Readability Is Largely Subjective

Let’s start by assuming the statement, “90% of time is spent reading code” is correct (I’m not sure how this could ever be measured… it feels accurate enough, but giving it a number like it was scientifically measured is just making it sound more authoritative), and let’s assume that this straightforwardly implies readability is 10 times more important than other qualities of the code.

I’m going to say something, the exact same thing, in two different ways. The only difference will be the phrasing. But the meaning of these two sentences are exactly equivalent:

  1. This code is hard to read
  2. I have a hard time reading this code

A skill I think everyone should learn is to pick on the habit of people using existential phrasing (sentences where the verb is “is”) to take themselves out of the picture. People will share their opinions, but instead of saying “I like Led Zeppelin more than the Beatles”, they say, “Led Zeppelin is better than the Beatles”. See the difference? By removing themselves as the subject, they are declaring their subjective judgements are objective traits of the things they are judging.

It’s not that you find this code hard to read, which is a description you and your reading abilities, it’s that the code is just hard to read, like it’s an innate quality of the code itself.

The implication is that all developers will always agree on which two versions of code is more readable. Hopefully I don’t have to emphasize or prove with examples that this is nowhere close to the truth. This is what leads to advice from developers that’s basically tautological: “you should write code that’s easy to read”. Who’s intentionally writing code that’s hard to read? This is like telling someone, “you should be less lazy”, as if anyone gets up in the morning and yells, “WOO, bring on the laziness!” Yeah, I know being lazy is bad, do you have any techniques for overcoming it, or just empty platitudes?

With the “readability” argument, developers are claiming, whether they realize it or not, that certain developers are consciously eschewing readability in favor of something else, like the DRY principle. Maybe some of them do, but that’s rather presumptuous. Why do you assume those developers don’t actually find the DRY code easier to read and understand? Maybe for them, having to see nearly identical blocks of code in multiple places, realizing the only difference is the name of a local variable, and reverse engineering in their head that it’s the same code and it’s doing something, and after looking at it for a moment they see it’s doing X, makes code harder to read than seeing a single line that’s a call to the function doX(someLocalVariable). Maybe they introduced the abstraction because, in their opinion, that improves readability.

Are you assuming they consciously rejected readability in favor of something else simply because you find it harder to read? Why are you assuming if you find it harder to read, everyone else must too?

Example 1: I have a book in my hand that’s written in Spanish. Embarrassing as this is, I’m not a polyglot. I only speak and read English. The Spanish classes I took in college didn’t take. How hard is it for me to read the Spanish book? Well, straight up impossible. I just can’t read it, at all. A native Spanish speaker, on the other hand, might find it quite easy to read. Or, if I bothered to learn the language, I could read it too.

Example 2: I spend more time than I’d like to admit in public reading physics textbooks. For me, an undergrad or grad level textbook on classical mechanics is an easy read. I haven’t read fiction since middle school. Any moderately sophisticated adult fiction book would probably leave me lost trying to figure out the metaphors and recall tiny plot hints that were dropped chapters ago that explain what I’m reading now. Maybe I’d figure it out, but I wouldn’t call it easy. Contrast this with, well, normal people, and I’m sure it’s the exact opposite: they breeze through the fiction, picking up on all the literary devices, and can’t get through the introductory math section of the physics textbook.

Now, some of this is difference in personality. I’m the kind of person who gets and likes math, and doesn’t really care about fictional stories, some people are the opposite. But a lot of it is practice. I find physics literature easy to read because I’ve spent a lot of time practicing. That grad level textbook was not easy the first time I read it, not even close. But it got easier each time I read it. The next one I read was easier because I already had experience. I would probably get better at reading fiction if I ever practiced it.

You see my point, right?

You can’t read highly abstracted code because you aren’t used to reading it.

It’s not an innate quality of the code. It’s a quality of you. The code doesn’t read itself.

I am not implying there are no objective qualities of code that would affect anyone’s ability to read it. The author could have scrambled the words into a random order. Yeah, that makes it harder to read for everyone. This is irrelevant because no one does this. No one sets out to write something that’s hard to read. If you suggest a change that everyone agrees, including its author, makes it easier to read, it’s going to be accepted with no resistance (the author probably just didn’t think of it), and “more readable code is good” didn’t help reveal that change (it’s an empty tautology). If the suggested change is controversial, it’s quite likely some people think the change degrades readability.

This is why “readability” can’t be a guideline for development teams. Either team members are going to disagree on which code is actually more readable, or they agree code is hard to read but can’t think of how to make it better. Telling them “make it more readable” doesn’t help in either case.

This applies not just to the level of abstraction, but the style and tools. Someone who’s never heard of ReactiveX would probably be totally bewildered by code written by someone who thinks “everything is an Observable“. But it’s easy to read for the author, and to anyone else who has years of experience reading that style of code.

My point is not that there’s no such thing as bad code. Of course not. It’s just that whether code is high or low quality has nothing to do with whether any particular developer is good at reading it or not.

We have the fortune of not having to learn to read assembly code. Coders in the 1960s, or even 90s, weren’t so lucky. Super Mario World was written in assembly. The authors were probably so good at reading assembly their brains would recognize control flow structures like loops and subroutines. They didn’t just see unrelated blocks of code, they saw Mario jumping around and hitting scenery and launching fireballs.

You’ll find highly abstracted code easy to read if you spend more time reading it, as long as you don’t approach it with a cynical attitude of “this is stupid, I shouldn’t have to put up with code like this”. Again, I’m not saying you have to like that code. It may be the worst code you’ve ever read. And you can still make it easy to read.

It’s somewhat ironic that I’ve spent enough of my career reading code I think was pretty poorly designed when I have to work on “legacy” apps, that I’ve gotten pretty damn good at reading bad code. It’s bad, but there’s a lot of it out there so it’s necessary for my job to be good at reading it.

How to Read Abstracted Code

I’m aware of a few differences in the way I read code compared to how I think developers, who especially don’t like highly abstracted code, read it. They are probably relevant to why I have an easier time reading such code.

The absolute biggest one is: I don’t feel a need to “click into” an implementation, or class, or other abstraction and read its implementation details simply because I encountered a use of it in code. If I ever do feel that urge, it immediately turns into an urge to rename something. The name should make it self-evident enough what role it’s playing in the code that’s calling it.

I should not need to open up a coffee machine and look at how all its gears and pumps are hooked together to know that it’s a coffee machine, and pressing the “brew” button is going to make coffee come out of it. But if it’s just an unlabeled machine with a blank button it, okay maybe I’d want to open it up to figure out what the hell it is. I could also just press the button and see what happens (for code, that means find the test that calls this method or instantiates this class, and if doesn’t exist, write it). Once I figure out what it is, I make sure to label it properly, so no one has to do that again.

The theme here is: when you organize a large codebase into modules, and you’re still clicking from one module to another while reading, or even worse stepping from one module to another during debugging, you’re doing it wrong. Want to challenge yourself? Put your library modules in different repos and consume them in your app through a package manager, in binary format. No source code available. You shouldn’t need a library’s source code while you’re working on the app that uses the library, and vice versa. This will also more or less force you to test the library with tests for the library instead of testing it indirectly through testing the app.

This is the single biggest complaint I’ve gotten from code I wrote for large applications with dozens of very similar screens. No, I’m not f**cking copy-pasting it 25 times. That’s a maintenance nightmare. But the complaint is, “I have to click through dozens of files just to get through a single behavior”. No, you don’t have to, you just do. Have you considered not clicking through? Have you done some self-reflection on what drives your need to dig through implementation details of implementation details to understand something? You usually can’t do that once you hit the platform SDKs. It’s not like there isn’t implementation details inside those classes, it’s just not publicly accessible. If you can stop there, why can’t you stop on abstractions in your source code?

The answer might be, “because they’re bad abstractions”. They’re badly named, or worse yet leaky, un-cohesive, slice an actual abstraction into pieces and scatter it around multiple files, combine two unrelated functionalities and select one with a parameter, etc. Yes, developers can absolutely do a bad job of abstracting. The solution is not to destroy the abstractions, it’s to fix them. This is why you need to ask yourself why you want to click into everything. There may be a good answer, and it may lead to an improvement in the code that’s a lot better than just inlining implementations everywhere.

Clicking into stuff isn’t always bad. But you need to treat it as a context switch. You unload everything from your brain you were reading before you clicked in. It’s irrelevant now. How you got to a method or class or whatever should have nothing to do with what that method or class is doing. If the meaning isn’t clear from just looking at the method or class, it needs a better name, or it’s a bad abstraction if it really requires context from the call site to understand.

How did I learn to read code like this? Practicing. The most reliable way to get better at X is to practice X. Do you want running a marathon to be easy? Then run some marathons. The first few will be really, really hard. They’ll get easier. Want to learn to play a musical instrument? Pick up the instrument and start playing it. It will be really hard at first. It will get easier.

Just because you can get better at reading a certain style of code doesn’t mean you should. I’m sure it’s a waste of time for any of us to get as good at reading assembly as the Super Mario World programmers were. It’s not a waste of time to get good at reading highly abstract code. Change it as soon as you can if you want, but you’ll only get faster at that if it’s easy to read.

Does Optimizing for Readability Even Make Sense?

Why cite the actual number for how much time we spend reading code? It’s 90%, apparently. The reason is because we humans really like simple relationships. Ideally, linear and proportional ones. We like to believe that if we can measure the proportions of time allocation on different activities, we can straightforwardly calculate the weights for different optimizations. If we spend X1% amount of time doing Y1, and X2% of time doing Y2, then optimizing Y2 will be X2/X1 times as important as optimizing Y1. We love these kinds of simple calculations.

This is why I never hesitate to remind people that physics problems become literally impossible to solve analytically as soon as you introduce… wait for it… three bodies. The number of problems in the world that are actually solvable with these kinds of basic algebra techniques are miniscule. We are heavily biased toward simplifying real problems down until they become solvable like this, destroying in the process any relevance the now toy model of the problem has to the real problem.

There are a lot of unspoken assumptions between the statement “we spend 90% of our time reading code and 10% of the time modifying it” and “optimizing readability is 10x more important than optimizing modifiability”, or even that optimizing readability is important at all. I’m going to try to bring out some of them, and we’re going to find they aren’t justifiable.

First, let’s clarify what exactly we’re trying to “optimize”, overall. Hopefully, as professional developers, we’re not merely trying to optimize for our own comfort. Of course making our jobs more pleasant is good for the profession, because we’re more eager to do our jobs, and we’re people who work to live, not faceless cogs in a corporate machine. But they’re still jobs (not, as Red Foreman says, “Super Happy Fun Time”). At the end of the day we’re being paid to deliver software.

It doesn’t matter if we spend 99.99999% of our time reading code.

We aren’t paid to read code.

We are literally paid to modify code. The only reason the reading is valuable at all is because it is a necessary precondition to the eventual modification.

Thus, optimizing for being happy during the 90% of our jobs when we aren’t actually producing deliverables is extremely dubious.

This is literally in Uncle Bob’s quote: that the reason improving readability is important is insofar as it improves writability. He was not pitting the two against each other.

But this problem is being framed as a “readability vs. modifiability” problem (some sort of tradeoff). Any adjustment of code that makes it simultaneously easier to read but harder to modify is absolutely detrimental to productivity, because the only actually productive phase is during modification. The only possible way readability could help productivity is because improving readability improves modifiability.

If you find it way easier to read less abstracted code, but directly at the expense of making it more complicated to modify it appropriately, how is this improving productivity? I totally get that it’s making your day more pleasant, since you’re spending 90% of that day reading instead of modifying the code. But is that what we’re optimizing for? You not tearing your hair out because you hate your job so much? I’m not being totally facetious. It is important for productivity that you not want to go postal every time you open your IDE. But this can’t come at the expense of you getting stuff done. You need a way to be happier and more productive.

A way to do that is get used to reading the code that’s driving you crazy. Over time it won’t drive you crazy. At worst you’ll start finding it funny.

Honestly I don’t think readability and modifiability are in competition very often, if ever. I think it’s a false dilemma, as I explained above, born out of developers being presumptuous that all other developers share their opinion about what code is easiest to read. Developers assume the person who wrote the code they’re staring at and finding hard to read was written by someone who agrees it’s hard to read but believed readability and some other quality (which is going to be some aspect of modifiability) were in competition, and he let the other quality win. I think it’s way more likely the author believes his version of the code is both easier to read and easier to modify, and the current reader simply disagrees with him about that.

Is It Good We Read Code This Much?

Another unspoken assumption here is that the ratio of reading to modifying is constant. Not just constant, but specifically independent of the ease of reading and modifying code. No matter how much the ease of either of these activities change, the assumption is made that the reading/modifying ratio will stay at 10/1. I mentioned above that, this number was probably pulled out of someone’s nether regions, and is completely meaningless (87.3% of all statistics are made up on the spot). Given that we’re just making up numbers, the only thing we could do if we wanted to model how this made up number is coupled to the other variables we’re thinking about modifying is to make up more facts about how this coupling works. But that’s still better than just assuming there’s no coupling at all.

It seems pretty natural to me that there should be some rather strong coupling between the reading/modifying allocation ratio and how easy/pleasant both of these activities are. I would assume that making code much, much easier to modify is likely to lower the read/modify time ratio. Now that’s interesting. Remember again that, if we apply productivity multipliers to both activities, modifying gets a full 100% (actually that’s too simple, sometimes it can be negative, more on that below), and reading gets a big fat goose egg (again, not implying you can optimize reading out of the equation, it’s indirectly productive). In terms of productivity then, anything that decreases the read/modify time ratio is literally increasing the amount of actual productive time. That cannot be ignored. It could also disproportionately reduce the rate of productivity during modification (“velocity”), so this doesn’t necessarily imply increased productivity.

So, perhaps it’s true that today developers typically spend 90% of their time reading code, and only 10% of the time modifying it. And maybe that’s a problem. Maybe that’s because most of the code bases we’re working on today are huge piles of ten-year-old spaghetti following a design that could barely scale to a 9 month old app, and we can’t help but stare at in disbelief for hours before finally mustering up the courage to edit one line and see if it actually accomplishes what we’re trying to do. Maybe we’re in this situation because so much code out there is a total maintenance nightmare, we can’t figure out how to do a 10 minute modification without rereading a chunk of code for an hour and half first. Maybe if we all wrote code with ease of modifiability as the primary goal, we wouldn’t need to preamble the next modification with such a massive amount of preparatory study.

Or maybe not. Maybe this is just how it goes, and it’s unreasonable to think it could be any different. But there’s a wide range of options here. It could be that spending any less than 90% of time reading code is just unreasonable, or it could mean that spending less than 50% of time reading code is unreasonable, but we’ve got a solid 40% of potential improvement to work with. Or somewhere in between.

A large amount of my time is spent neither reading nor writing code but simply thinking about the problem. I really believe this is the most crucial phase of my job. Software engineering is more creative than formulaic, it’s much more like architecture than construction. The most valuable moments for the businesses I work for are random epiphanies I have in the middle of a meal when I realize a certain design is perfectly suited for the new requirements, and this is followed by a three day bender where I’m almost exclusively writing code. The time allocation for this is kind of nonsensical because it is often just background while I’m doing other stuff.

Does time allocation really even matter for productivity? Recall that if you’re optimizing a software algorithm, when you find that 90% of time is spent on one part, your goal is to reduce that number.

Quality vs. Quantity

Readability is subjective. But there are objective qualities of writing that virtually everyone will subjectively agree makes it harder to read.

What’s easier for me to read? A press release or a Top Secret classified document? Okay, sure, the former is easier to read, by simple virtue of it being available to me. That’s not an accident. I’m not supposed to read the classified document.

Now, imagine your friend complains that a novel is hard to read. You inquire further, and determine she is frustrated that the novel doesn’t come with the author’s notes and early drafts added as an appendix. She exclaims, “how can I know what the author meant here if he’s not going to share his notes with me!?”

Okay… if you’re having trouble reading a novel because you can’t read the author’s notes about it, there’s something wrong with the way you’re approaching reading the novel. That extra stuff is intentionally not part of the book.

Now imagine your other friend is complaining a novel is hard to read. You inquire further, and find he is jumping around, reading all the odd chapters and then reading all the even chapters. He reads Chapter 1, then 3, then 5, and so on, then circles back and reads Chapter 2, then 4, etc.

After lamenting to yourself how you managed to find all of these friends, you ask him in bewilderment why he isn’t just reading the damn thing in order. The order is very intentional. Maybe the novel isn’t chronological, that’s very important to its structure, but your friend thinks all stories should be presented in chronological order and is jumping around trying to reorganize it in that way.

If your teammate wrote code you find hard to read, it’s quite possible your teammate finds it easy to read. It’s also possible that however you’re trying to read it, the author made that harder on purpose.

Reading code is only indirectly productive, and insofar as it aids in future modification, which is directly productive. It is also the case that not all modification is productive. Correct modification is productive. Incorrect modification (introducing bugs) is counterproductive. That’s even worse than reading, which is merely (directly) unproductive.

Another tendency I see among some developers is to be focused on how overall easy it is to make changes to code. This is definitely a big part of the arguments I hear in favor of non-compiled (basically not strongly typed) languages. Having to create types is hard, it makes every little change harder.

And changing is what we get paid for, so making changing code harder is bad, right?

F*** no it isn’t.

It isn’t if most changes are actually harmful.

It’s way, way easier to break code than it is to fix it or correctly add or modify functionality. Thus, most changes to code are actually harmful.

If you’re into XP, you love automated tests, right? TDD all the way right!? All a test does is stop you from modifying code. It does nothing but make editing production code harder. That’s why some developers hate test suites.

In my experience, high quality code is high quality not so much because it makes correct modification easier (this is the #2 aspect of high quality code) but because it makes incorrect modification harder (this is the #1 aspect). Thus, when I hear developers complaining that working on this code base, given its design, is too hard, I’m suspicious that they’re complaining their bugs are being found as they’re being typed instead of later by QA or customers.

Is it “easier” to read the internal details of a library class that are exposed to you than it is to read those that are encapsulated? Well, yeah. But that doesn’t mean you should expose everything as public to make it “easier” to find. Hiding stuff is a fundamental tool of improving software quality.

If “easier to read” means less abstraction, and all the implementation details spelled out right there so you don’t have to click around to find them, then this literally means less encapsulation. Less intentional hiding of stuff you might want to read but you damn well shouldn’t be because all reading it is going to do is make you want to couple to it.

Some parts of code need to be easy to read and understood by certain people in certain roles. Other parts of code, those people in those roles have no business reading it and doing so is likely to just confuse them or give them dangerous ideas. This is directly relevant to the way “readability” is often connected directly to abstraction.

Abstraction introduces boundaries, and intentionally moves one thing (an implementation detail) away from another thing (an invocation of the thing being implemented elsewhere) specifically because it is not helpful to read both at the same time. Doing so only feeds confusing and misleading ideas that they are relevant to each other, that they should be colocated, are likely to change together, that they form a cohesive unit, and have no independent meaning.

Highly abstract code does intentionally make certain approaches to reading code harder, as a direct corollary to it making certain approaches to modifying code harder. Properly abstracted code makes the modifications, plus the reading that likely precedes those modifications, that are not helpful and degrade the code quality, harder. This is a feature, not a bug.

The structure may be stopping you from even reading the code in a way that leads up to you making a damaging modification, especially if the main problem you’re having is that you’d have to punch through a bunch of encapsulation boundaries to do whatever it is you’re trying to do.

Conclusion

The overall thesis here is simple: readability has no simple relation to code quality. Readability has a large subjective component, and to the extent authors intentionally frustrate certain approaches to reading code, there can be valid reasons for doing so.

Even examples I try to come up with that I think are as objectively more readable as can be, like naming a variable thisSpecificThing instead of x, I know someone is gonna say, “no x is better because it’s more concise, the more concise the easier it is to read”. I can’t argue with that. I can’t tell someone they don’t like reading something they like reading. If I want to convince him to please name his variables descriptively, “more readable” isn’t a way to do it.

If you have trouble reading highly abstract code, you just need to practice reading it, which will help you understand why it discourages certain reading strategies. It’s also really helpful to practice writing it. If you always tell yourself you don’t have time or it’s not agile enough, when will you ever get this practice?

There’s nothing intrinsically unreadable about even extremely DRY’d heavily templated and layered code. It may even be excessive or inappropriate abstraction (I don’t like saying “excessive”, what matters is quality, not quantity), but even that code can be easy to read if you just practice doing so. Then you’ll get really good at improving it.

The rules of thumb for achieving high code quality need to be objective, not subjective. Examples of subjective (or at best tautological, they’re all fancy synonyms for “good”) qualities are:

  • Readable
  • “Clean” (what does this mean? You wiped your code down with Lysol?)
  • Simple
  • Understandable

Examples of qualities that are objective but so vague that they aren’t functional rules of thumb on their own (they are rather guiding principles for producing the rules of thumb) are:

  • Scalable
  • Maintainable
  • Agile
  • Modular
  • Reusable
  • Resilient
  • Stable

Examples of objective rules of thumb are (you may agree or disagree if these are good rules, I’m just giving examples of rules that are objective):

  • Everything is private by default
  • Use value semantics until you need reference semantics
  • Always couple to the highest possible level of abstraction (start with types at the top of the hierarchy, and cast down only once needed)
  • Law of Demeter (never access a dependency’s dependencies directly)
  • Use inheritance for polymorphism, use composition for sharing code
  • Make all required dependencies of a class constructor parameters (don’t allow constructed but nonfunctional instances)
  • Use a single source of truth for state, and only introduce caching when performance measurements prove it necessary
  • Don’t use dynamic dispatch for build-time variations
  • Use dynamic dispatch over procedural control flow
  • View should only react to model changes and never modify themselves directly

You see the difference right? Good or bad, these rules actually tell you something concrete. It’s a matter of fact whether you are following them or not, rather than subjective preferences.

The first category, in my opinion, is useless. The second category is useful only as a means of generating statements in the last category. It’s the last category that should go in the style guide of your README.

Being in the first category, I think developers should stop talking about readability. I mean, if they just want to complain, confiding with their fellow team members how frustrating their job can be, that’s fine. But citing “improve readability” as any kind of functional advice or guiding principle… it’s not. And if you have trouble reading code, reframe that as an opportunity for you to grow. After all, we’re software engineers, our job is to be good at reading and understanding code.

What Should Your Entities Be?

Introduction

So, you’re writing an application in an object-oriented programming language, and you need to work with data in a relational database. The problem is, data in a relational database is not in the same form as data in OOP objects. You have to solve this “impedance mismatch” in some way or another.

What exactly is the mismatch? Well, plain old data (POD) objects in OOP are compositions: one object is composed of several other objects, each of which are composed of several more objects, and so on. Meanwhile, relational databases are structured as flat rows in tables with relations to each other.

Let’s say we have the following JSON object:

{
  "aNumber": 5,
  "aString": "Hello",
  "anInnerObject": {
    "anotherNumber": 10,
    "aDate": "12-25-2015",
    "evenMoreInnerObject": {
      "yetAnotherNumber": 30,
      "andOneMoreNumber": 35,
      "oneLastString": "Hello Again"
    }
  }
}

In OOP code, we would represent this with the following composition of classes:

class TopLevelData {

  let aNumber: Int
  let aString: String
  let anInnerObject: InnerData 
}

class InnerData {

  let anotherNumber: Int
  let aDate: Date
  let evenMoreInnerObject: InnerInnerData
}

class InnerInnerData {
  
  let yetAnotherNumber: Int
  let andOneMoreNumber: Int
  let oneLastString: String
}

This is effectively equivalent to the JSON representation, with one important caveat: in most OOP languages, unless you use structs, the objects have reference semantics: the InnerData instance is not literally embedded inside the memory of the TopLevelData instance, it exists somewhere else in memory, and anInnerObject is really, under the hood, a pointer to that other memory. In the JSON, each sub-object is literally embedded. This means we can’t, for example, refer to the same sub-object twice without having to duplicate it (and by extension all its related objects), and circular references are just plain impossible.

This value vs. reference distinction is another impedance mismatch between OOP objects and JSON, which is what standards like json-api are designed to solve.

In the database, this would be represented with three tables with foreign key relations:

TopLevelData

id

Int
Primary Key
aNumber

Int
aString

Text
anInnerObjectId

Int
ForeignKey
InnerData.id

InnerData

id

Int
Primary Key
anotherNumber

Int
aDate

Date
evenMoreInnerObjectId

Int
ForeignKey
InnerInnerData.id

InnerInnerData

id

Int
Primary Key
yetAnotherNumber

Int
andOneMoreNumber

Int
oneLastString

Text

This representation is more like how OOP objects are represented in memory, where foreign keys are equivalent to pointers. Despite this “under the hood” similarity, on the visible level they’re completely different. OOP compilers “assemble” the memory into hierarchical structures we work with, but SQL libraries don’t do the same for the result of database queries.

The problem you have to solve, if you want to work with data stored in such tables but represented as such OOP classes, is to convert between foreign key relations in tables to nesting in objects…

…or is it?

It may seem “obvious” that this impedance mismatch needs to be bridged. After all, this is the same data, with different representations. Don’t we need an adapter that converts one to the other?

Well, not necessarily. Why do we need to represent the structure of the data in the database in our code?

ORMs to the Rescue

Assuming that yes, we do need that, the tools that solve this problem are called object-relational mapping, or ORM, libraries. The purpose of an ORM is to automate the conversion between compositional objects and database tables. At minimum, this means we get a method to query the TopLevelData table and get back a collection of TopLevelData instances, where the implementation knows how to do the necessary SELECTs and JOINs to get all the necessary data, build each object out of its relevant parts, then assign them to each other’s reference variables.

If we want to modify data, instead of hand-crafting the INSERTs or UPDATEs, we simply hand the database a collection of these data objects, and it figures out what records to insert or update. The more clever ones can track whether an object was originally created from a database query, and if so, which fields have been modified, so that it doesn’t have to write the entire object back to the database, only what needs updating.

We still have to design the database schema, connect to it, and query it in some way, but the queries are abstracted from raw SQL, and we don’t have to bother with forming the data the database returns into the objects we want, or breaking those objects down into update statements.

The fancier ORMs go further than this and allow you to use your class definitions to build your database schema. They can analyze the source code for the three classes, inspect its fields, and work out what tables and columns in those tables are needed. When it sees a reference-type field with one object containing another, it’s a cue to create a foreign key relationship. With this, we no longer need to design the schema, we get it “for free” by simply coding our POD objects.

This is fancy and clever. It’s also, the way we’ve stated it, unworkably inefficient.

Inefficiency is a problem with any of these ORM solutions because of their tendency to work on the granularity of entire objects, which correspond to entire rows in the database. This is a big problem because of foreign key relations. The examples we’ve seen so far only have one-to-one relations. But we can also have one-to-many, which would look like TopLevelData having a field let someInnerObjects: [InnerData] whose type is a collection of objects, and many-to-many, which would add to this a “backward” field let theTopLevelObjects: [TopLevelData] on InnerData.

The last one is interesting because it is unworkable in languages that use reference counting for memory management. That’s a circular reference, which means you need to weaken one of them, but by weakening one (say, the reference from InnerData back to TopLevelData) means you must hold onto the TopLevelData separately. If you, for example, query the database for an InnerData, and want to follow it to its related TopLevelData, they’ll be gone as soon as you get your InnerData back.

This is, of course, not a problem in garbage collected languages. You just have to deal with all the other problems of garbage collection.

With x-to-many relations, constructing a single object of any of our classes might end up pulling hundreds or thousands of rows out of the database. The promise we’re making in our class, however implicit, is that when we have a TopLevelData instance, we can follow it through references to any of its related objects, and again, and eventually end up on any instance that is, through an arbitrarily long chain of references, related back to that TopLevelData instance. In any nontrivial production database, that’s an immediate showstopper.

A less severe form of this same problem is that when I grab a TopLevelData instance, I might only need to read one field. But I end up getting the entire row back. Even in the absence of relations, this is still wasteful, and can become unworkably so if I’m doing something like a search that returns 10,000 records, where I only need one column from each, but the table has 50 columns in it, so I end up querying 50,000 cells of data. That 50x cost, in memory and CPU, is a real big deal.

By avoiding the laborious task of crafting SQL queries, where I worry about SELECTing and JOINing only as is strictly necessary, I lose the ability to optimize. Is that premature optimization? In any nontrivial system, eventually no.

Every ORM has to deal with this problem. You can’t just “query for everything in the object” in the general case, because referenced objects are “in”, and that cascades.

And this is where ORMs start to break down. We’ll soon realize that the very intent of ORMs, to make database records look like OOP objects, is fundamentally flawed.

The Fundamental Flaw of ORMs

We have to expose a way to SELECT only part of an object, and JOIN only on some of its relations. That’s easy enough. Entity Framework has a fancy way of letting you craft SQL queries that looks like you’re doing functional transformations on a collection. But the ability to make the queries isn’t the problem. Okay, so you make a query for only part of an object.

What do you get back?

An instance of TopLevelData? If so, bad. Very bad.

TopLevelData has everything. If I query only the aNumber field, what happens when I access aString? I mean, it’s there! It’s not like this particular TopLevelData instance doesn’t have an aString. If that were the case, it wouldn’t be a TopLevelData. A class in an OOP language is literally a contract guaranteeing the declared fields exist!

So, what do the other fields equal? Come on, you know what the answer’s gonna be, and it’s perfectly understandable that you’re starting to cry slightly (I’d be more concerned if you weren’t):

null

I won’t belabor this here, but the programming industry collectively learned somewhere in the last 10 years or so that the ancient decision of C, from which so many of our languages descend in some way, to make NULL not just a valid but the default value of any pointer type, is one of the most expensive bug-breeding decisions that’s ever been made. It wasn’t wrong for C to do this, NULL (just sugar for 0) is a valid pointer. The convention to treat this as “lacking a value” or “not set” is the problem, but again, in C, there’s really no better option.

But carrying this forward into C++, Objective-C, C# and Java was a bad idea. Well, okay, Objective-C doesn’t really have a better option either. C++ has pointers but more than enough tools to forever hide them from anyone except low level library developers. C# and Java don’t have them at all, and it’s their decision to make null a valid value of any declared variable type (reference types at least) that’s really regrettable. It’s a hole in their type system.

This is one of the greatest improvements that Swift and Kotlin (and Typescript if you configure it properly) made over these languages. null is not a String, so if I tell my type system this variable is a String, assigning null to it should be a type error! If I want to signal that a variable is either a String or null, I need a different type, like String?, or Optional<String>, or std::optional<String>, or String | null, which is not identical to String and can’t be cast to one.

I said I wouldn’t belabor this, so back to the subject: the ability of ORMs to do their necessary optimization in C# and Java is literally built on the biggest hole of their type systems. And of course this doesn’t work with primitives, so you either have to make everything boxed types, or God forbid, decide that false and 0 is what anything you didn’t ask for will equal.

It really strikes to the heart of this issue that in Swift, which patched that hole, an ORM literally can’t do what it wants to do in this situation. You’d have to declare every field in your POD objects to be optional. But then what if a particular field is genuinely nullable in the database, and you want to be able to tell that it’s actually null, and not just something you didn’t query for? Make it an optional optional? For fu…

Either way, making every field optional would throw huge red flags up, as it should.

In Java and C#, there’s no way for me to know, just by looking at the TopLevelData instance I have, if the null or false or 0 I’m staring at came from the database or just wasn’t queried. All the information about what was actually SELECTed is lost in the type system.

We could try to at least restrict this problem to the inefficiency of loading an entire row (without relations), but making relations lazy loaded: the necessary data is only queried from the database when it is accessed in code. This tries to solve the problem of ensuring the field has valid data whenever accessed while also avoiding the waste of loading a potentially very expensive (such as x-to-many) relation that is never accessed.

This comes with a host of its own problems, and in my experience it’s never actually a workable solution. Database connections are typically managed with some type of “context” object that, among other things, is how you control concurrency, since database access is generally not thread safe. You usually create a context in order to make a query, get all the data you need, then throw the context away once the data is safely stored in POD objects.

If you try to lazy-load relations, you’re trying to hit the database after the first query is over, and you’ve thrown the context away. Either it will fail because the context is gone, or it’s going to fail because the object with lazy-loading capabilities keeps the context alive, and when someone else creates a context it throws a concurrency exception.

You can try so solve this by storing some object capable of creating a new context in order to make the query on accessing a field. But even if you can get this to work, you’ll end up potentially hitting the database while using an object you returned to something like your UI. To avoid UI freezes you’d have to be aware that some data is lazy-loaded, keep track of whether it’s been loaded or not, and if not, make sure to do it asynchronously and call back to update the UI when it’s ready. By that point you’re just reinventing an actual database query in a much more convoluted way.

The Proper Solution

What we’re trying to do is simply not a good idea. Returning a partial object of some class, but having it declared as a full instance of that class, violates the basic rules of object-oriented programming. The whole point of a type system is to signal that a particular variable has particular members with valid values. Returning partials throws that out the window.

We can do much better in Typescript, whose type system is robust enough to let us define Partial<T> for any T, that will map every member of type M in T to a member of type M | undefined. That way, we’re at least signaling in the type system that we don’t have a full TopLevelData. But we still can’t signal which part of TopLevelData we have. The stuff we queried for becomes nullable even when it shouldn’t be, and we have to do null checks on everything.

Updating objects is equally painful with ORMs. We have to supply a TopLevelData instance to the database, which means we need to create a full one somehow. But we only want to update one or a few fields. How does the framework know what parts we’re trying to update? Combine this with the fact that part of the object may be missing because we didn’t query for all of it, and what should the framework do? Does it interpret those empty fields as instructions to clear the data from the database, or just ignore them?

I know Entity Framework tries to handle this by having the generated subclasses of your POD objects try to track what was done to them in code. But it’s way more complicated than just setting fields on instances and expecting it to work. And it’s a disaster with relations, especially x-to-many relations. I’ve never been able to get update statements to work without loading the entire relations, which it needs just so it can tell exactly how what I’m saving back is different from what’s already there. That’s ridiculous. I want to set a single cell on a row, and end up having to load an entire set of records from another table just so the framework can confirm that I didn’t change any of those relations?

Well, of course I do. If I’m adding three new records to a one-to-many relation, and removing one, then how do I tell EF this? For the additions, I can’t just make an entity where the property for this relationship is an array that contains the three added records. That’s telling EF those are now the only three related entities, and it would try to delete the rest. And I couldn’t tell it to remove any this way. The only thing I can do is load the current relations, then apply those changes (delete one, add three) to the loaded array and save it back. There’s no way to do this in an acceptably optimized fashion.

The conclusion is inescapable:

It is not correct to represent rows with relations in a database as objects in OOP

It should be fairly obvious, then, what we should be representing as objects in OOP:

We should represent queries as objects in OOP

Instead of writing a POD object to mirror the table structure, we should write POD objects to mirror query structures: objects that contain exactly and only the fields that a query SELECTed. Whether those fields came from a single table or were JOINed together doesn’t matter. The point is whatever array of data each database result has, we write a class that contains exactly those fields.

For example, if I need to grab the aNumber from a TopLevelData, the aDate from its related InnerData, and both yetAnotherNumber and oneLastString from its related InnerInnerData, I write the following class:

struct ThisOneQuery {

  let aNumber: Int
  let aDate: Date
  let oneLastString: String
}

This means we may have a lot more classes than if we just wrote one for each table. We might have dozens of carefully crafted queries, each returning slightly different combinations of data. Each one gets a class. That may sound like extra work, but it’s upfront work that saves work later, as is always the case with properly designing a type system. No more accidentally accessing or misinterpreting nulls because they weren’t part of the query.

We apply the same principal to modifying data. Whatever exact set of values a particular update statement needs, we make a class that contains all and only those fields, regardless of whether they come from a single table or get distributed to multiple tables. Again, we use the type system to signal to users of an update method on our Store class exactly what they need to provide, and what is going to be updated.

These query objects don’t need to be flat. They can be hierarchical and use reference semantics wherever it is helpful. We can shape them however we want, in whatever way makes it easiest to work with them. The rule is that every field is assigned a meaningful value, and nulls can only ever mean that something is null in the database.

Entity Framework does something interesting that approximates what I’m talking about here: when you do a Select on specific fields, the result is an anonymous type that contains only the fields you selected. This is exactly what we want. However, since the type is anonymous (it doesn’t have a name), you can’t return them as is. We still need to write those query result classes and give them a name, but this feature of Entity Framework will make it a lot easier to construct those objects out of database queries.

We can get similar functionality in C++ by using variadic templates, to write a database query method that returns a tuple<...> containing exactly the fields we asked for. In that case, it’s a named type and we can return it as is, but the type only indicates which types of fields, in what order, we asked for. The fields aren’t named. So we’d still want to explicitly define a class, presumably one we can reinterpret_cast that tuple<...> to.

The payoff of carefully crafting these query classes is that we get stronger coupling between what our model layers work with to drive the UI and what the UI actually needs, and looser coupling between the model layer and the exact details of how data is stored in a database. It’s always a good idea to let requirements drive design, including the database schema. Why create a schema that doesn’t correspond in a straightforward way to what the business logic in your models actually needs? But even if we do this, some decisions about how to split data across tables, whether to create intermediate objects (as is required for many-to-many relations), and so on may arise purely out of the mechanics of relational databases, and constitute essentially “implementation details” of efficiently and effectively storing the data.

Writing classes that mirror the table structure of a database needlessly forces the rest of the application to work with data in this same shape. You can instead start by writing the exact POD objects you’d most prefer to use to drive your business logic. Once you’ve crafted them, they signal what needs to be queried from the database. You have to write your SELECTs and JOINs so as to populate every field on these objects, and no more.

If, later, the UI or business logic requirements change, and this necessitates adding or removing a field to/from these query classes, your query method will no longer compile, guiding you toward updating the query appropriately. You get a nice, compiler-driven pipeline from business requirements to database queries, optimized out of the box to supply exactly what the business requirement need, wasting no time on unnecessary fetches.

This also guides how queries and statements are batched. Another problem with ORMs is that you can’t bundle fetches of multiple unrelated entities into a single query, because there’s no type representing the result. It would be a tuple, but only C++ has the machinery necessary to write a generic function that returns arbitrary tuples (variadic generics). You’re stuck having to make multiple separate trips to the database. This may be okay, or even preferred, in clients working with embedded databases, but wherever the database lives on another machine, each database statement is a network trip, and you want to batch those where possible.

By writing classes for queries, the class contain be a struct that contains the structs for each part of the data, however unrelated it is to other parts. With this we can hit the database once, retrieving all we need, and nothing extra, even if it constitutes multiple fetches of completely unrelated data. We can do the same with updates, although we could achieve the same in ORM with transactions.

Queries as classes also integrates very well with web APIs, especially if they follow a standard like json-api that supports partial objects. Anyone who’s tried writing the network glue for updating a few fields in an object whose class represents an entire backend database entity knows the awkwardness of having to decide either to inefficiently send the entire object every time, or come up with some way to represent partial objects. This could be straightforward in Typescript, where a Partial<T> would contain only what needs to be updated, but even there we can improve the situation with transaction objects because they signal what data is going to be updated. With queries, requesting specifically the fields needed translates straightforwardly into parsing the responses to the query objects, which contain the same fields as what was requested.

Conclusion

It turns out it not only is not necessary but wholly misguided to try to represent your database tables as classes in OOP code. That set of classes exists conceptually, as that’s exactly what the database is ultimately storing, but just because those classes conceptually exist doesn’t mean you need to code them. You may find it useful to write them purely to take advantage of the schema-specifying features of ORMs, but their usage should not go beyond this.

The actual interactions with the database, with data going in and out, don’t work in terms of entire rows with all their relations, but with carefully selected subsets. The solution we’re yearning for, that made us think an ORM might help, is in fact a rather different solution of representing individual queries as classes. Perhaps eventually a tool can be written that automates this with some type of code generation. Until then, I promise you’ll be much happier handwriting those query classes than you ever were working with entity classes.

On ReactiveX – Part IV

In the previous parts, first we tried to write a simple app with pure Rx concepts and were consumed by demons, then we disentangled the Frankenstein Observable into its genuinely cohesive components, then organized the zoo of operators by associating them with their applicable targets. Now it’s time to put it all together and fix the problems with our Rx app.

Remember, our requirements are simple: we have a screen that needs to download some data from a server, display parts of the downloaded data in two labels, and have those labels display “Loading…” until the data is downloaded. Let’s recall the major headaches that arose when we tried to do this with Rx Observables:

  • We started off with no control over when the download request was triggered, causing it to be triggered slightly too late
  • We had trouble sharing the results of one Observable without inadvertently changing significant behavior we didn’t want to change. Notice that both this and the previous issue arose from the trigger for making the request being implicit
  • We struggled to find a stream pipeline that correctly mixed the concepts of “hot” and “cold” in such a way that our labels displayed “Loading…” only when necessary,

With our new belt of more precise tools, let’s do this right.

The first Observable we created was one to represent the download of data from the server. The root Observable was a general web call Observable that we create with an HTTPRequest, and whose type parameter is an HTTPResponse. So, which of the four abstractions really is this? It’s a Task. Representing this as a “stream” makes no sense because there is no stream… no multiple values. One request, one response. It’s just a piece of work that takes time, that we want to execute asynchronously, and may produce a result or fail.

We then transformed the HTTPResponse using parsers to get an object representing the DataStructure our server responds with. This is a transformation on the Task. This is just some work we need to do to get a Task with the result we actually need. So, we apply transformations to the HTTP Task, until we end up with a Task that gives us the DataStructure.

Then, what do we do with it? Well, multiple things. What matters is at some point we have the DataStructure we need from the server. This DataStructure is a value that, at any time, we either have or don’t have, and we’re interested in when it changes. This is an ObservableValue, particularly of a nullable DataStructure. It starts off null, indicating we haven’t retrieved it yet. Once the Task completes, we assign this ObservableValue to the retrieved result.

That last part… having the result of a Task get saved in an ObservableValue… that’s probably a common need. We can write a convenience function for that.

We then need to pull out two different strings from this data. These are what will be displayed by labels once the data is loaded. We get this by applying a map to the ObservableValue for the DataStructure, resulting in two ObservableValue Strings. But wait… the source ObservableValue DataStructure is nullable. A straight map would produce a nullable String. But we need a non-null String to tell the label what to display. Well, what does a null DataStructure represent? That the data isn’t available yet. What should the labels display in that case? The “Loading…” text! So we null-coalesce the nullable String with the loading text. Since we need to do that multiple times, we can define a reusable operator to do that.

Finally we end up with two public ObservableValue Strings in our ViewModel. The View wires these up to the labels by subscribing and assigning the label text on each update. Remember that ObservableValues give the option to have the subscriber be immediately notified with the current value. That’s exactly what we want! We want the labels to immediately display whatever value is already assigned to those ObservableValues, and then update whenever those values change. This only makes sense for ObservableValues, not for any kind of “stream”, which doesn’t have a “current” value.

This is precisely that “not quite hot, not quite cold” behavior we were looking for. Almost all the pain we experienced with our Rx-based attempt was due to us taking an Observable, which includes endless transformations, many of which are geared specifically toward streams, subscribing to it and writing the most recently emitted item to a UI widget. What is that effectively doing? It’s caching the latest item, which as we saw is exactly what a conversion from an EventStream to an ObservableValue does. Rx Observables don’t have a “current value”, but the labels on the screen certainly do! It turned out that the streams we were constructing were very sensitive to timing in ways we didn’t want, and it was becoming obvious by remembering whatever the latest emitted item was. By using the correct abstraction, ObservableValue, we simply don’t have all these non-applicable transformations like merge, prepend or replay.

Gone is the need to carefully balance an Observable that gets made hot so it can begin its work early, then caches its values to make it cold again, but only caches one to avoid repeating stale data (remember that caching the latest value from a stream is really a conversion from an EventStream to a… ObservableValue!). All along, we just needed to express exactly what a reactive user interface needs: a value, whose changes over time can be reacted to.

Let’s see it:

class DataStructure
{
    String title;
    int version;
    String displayInfo;
    ...
}

extension Task<Result>
{
    public void assignTo(ObservableValue<Result> destination)
    {
        Task.start(async () ->
        {
            destination.value = await self;
        });
    }
}

extension ObservableValue<String?>
{
    private Observable<String> loadedValue()
    {
        String loadingText = "Loading...";

        return self
            .map(valueIn -> valueIn ?? loadingText);
    }
}

class AppScreenViewModel
{
    ...

    public final ObservableValue<String> dataLabelText;
    public final ObservableValue<String> versionLabelText;

    private final ObservableValue<DataStructure?> data = new ObservableValue<DataStructure?>(null);
    ...

    public AppScreenViewModel()
    {
        ...

        Task<DataStructure> fetchDataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response);

        fetchDataStructure.assignTo(data);
        
        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure?.displayInfo)
            .loadedValue();

        versionLabelText = dataStructure
            .map(dataStructure -> 
 dataStructure?.version.map(toString))
            .loadedValue();
    }
}

...

class AppScreen : View
{
    private TextView dataLabel;
    private TextView versionLabel;

    ...

    private void bindToViewModel(AppScreenViewModel viewModel)
    {
        subscriptions.add(viewModel.dataLabelText
            .subscribeWithInitial(text -> dataLabel.setText(text)));

        subscriptions.add(viewModel.versionLabelText
            .subscribeWithInitial(text -> versionLabel.setText(text)));
    }
}

Voila.

(By the way, I’m only calling ObservableValues “ObservableValue“s to avoid confusing them with Rx Observables. I believe they are what should be properly named Observable, and that’s what I would call them in a codebase that doesn’t import Rx)

This, I believe, achieves the declarative UI style we’re seeking, that avoids the need to manually trigger UI refreshes and ensures rendered data is never stale, and also avoids the pitfalls of Rx that are the result of improperly hiding multiple incompatible abstractions behind a single interface.

Where can you find an implementation of these concepts? Well, I’m working on it (I’m doing the first pass in Swift, and will follow with C#, Kotlin, C++ and maybe Java implementations), and maybe someone reading this will also start working on it. For the time being you can just build pieces you need when you need them. If you’re building UI, you can do what I’ve done several times and write a quick and dirty ObservableValue abstraction with map, flatMap and combine. You can even be lazy and make them all eagerly stored (it probably isn’t that inefficient to just compute them all eagerly, unless your app is really crazy sophisticated). You’ll get a lot of mileage out of that alone.

You can also continue to use Rx, as long as you’re strict about never using Observables to represent DataStreams or Tasks. They can work well enough as EventStreams, and Observables that derive from BehaviorSubjects work reasonably well as ObservableValues (until you need to read the value imperatively). But don’t use Rx as a replacement for asynchrony. Remember that you can always block threads you create, and yes you can create your own threads and block them as you please, and I promise the world isn’t going to end. If you have async/await, remember that it was always the right way to handle DataStreams and Tasks, but don’t try to force it to handle EventStreams or ObservableValues… producer-driven callbacks really are the right tool for that.

Follow these rules and you can even continue to use Rx and never tear your hair out trying to figure out what the “temperature” of your pipelines are.

On ReactiveX – Part III

Introduction

In the last part, we took a knife to Rx’s one-size-fits-all “Observable” abstraction, and carved out four distinct abstractions, each with unique properties that should not be hidden from whoever is using them.

The true value, I believe, of Rx is in its transformation operators… the almost endless zoo of options for how to turn one Observable into another one. The programming style Rx encourages is to build every stream you need through operators, instead of by creating Subjects and publishing to them “imperatively”.

So this begs the question… what happens to this rich (perhaps too rich, as the sheer volume of them is headache-inducing) language of operators when we cleave the Observable into the EventStream, the DataStream, the ObservableValue and the Task?

What happens is operators also get divided, this time into two distinct categories. The first category is those operators that take one of those four abstractions and produces the same type of abstraction. They are, therefore, still transformations. They produce an EventStream from an EventStream, or a DataStream from a DataStream, etc. The second category is those operators that take one of those four abstractions and produces a different type of abstraction. They are not transformers but converters. We can, for example, convert an EventStream to a DataStream.

Transformations

First let’s talk about transformations. After dividing up the abstractions, we now need to divvy up the transformations that were originally available on Observable, by determining which ones apply to which of these more focused abstractions. We’ll find that some still apply in all cases, while others simply don’t make sense in all contexts.

We should first note that, like Observable itself, all four abstractions I identified retain the property of being generics with a single type parameter. Now, let’s consider the simplest transformation, map: a 1-1 conversion transformation that can change that type parameter. This transformation continues to apply to all four abstractions.

We can map an EventStream to an EventStream: this creates a new EventStream, which subscribes to its source EventStream, and for each received event, applies a transformation to produce a new event of a new type, and then publishes it. We can map a DataStream to a DataStream: this creates a new DataStream, where every time we consume one of its values, it first consumes a value of its source DataStream, then applies a transformation and returns the result to us. We can map an ObservableValue: this creates a new ObservableValue whose value is, at any time, the provided transformation of the source ObservableValue (this means it must be read only. We can’t manually set the derived ObservableValue‘s value without breaking this relationship). It therefore updates every time the source ObservableValue updates. We can map a Task to a Task: this is a Task that performs its source Task, then takes the result, transforms it, and returns it as its own result.

We also have the flatMap operator. The name is confusing, and derives from collections: a flatMap of a Collection maps each element to a Collection, then takes the resulting Collection of Collections and flattens it into a single Collection. Really, this is a compound operation that first does a map, then a flatten. The generalizing of this transformation is that it takes an X of X of T, and turns it in an X of T.

The flatten operator, and therefore flatMap, also continues to apply to all of our abstractions. How do we turn an EventStream of EventStreams of T into an EventStream of T? By subscribing to each inner EventStream as it is published by the outer EventStream, and publishing its events to a single EventStream. The resulting EventStream receives all the events from all the inner EventStreams as they become available. How do we turn a DataStream of DataStreams of T into a DataStream of T? When we consume a value, we consume a single DataStream from the outer DataStream, store it, and then supply each value from it on each consumption, until it runs out, then we go consume the next DataStream from the outer DataStream, and repeat. How do we turn an ObservableValue of an ObservableValue of T into an ObservableValue? By making the current value the current value of the current ObservableValue of the outer ObservableValue (which then updates every time either the outer or inner ObservableValue updates). How do we turn a Task that produces a Task that produces a T into a Task that produces a T? We run the outer Task, then take the result inner Task, run it, and return its result.

In Rx land, it was realized that flatten actually has another variation that didn’t come up with ordinary Collections. Each time a new inner Observable is published, we could continue to observe the older inner Observables, or we can stop observing the old one and switch to the new one. This is a slightly different operator called switch, and it leads to the combined operator switchMap. For us, this continues to be the case for only EventStream, because the concept depends on being producer-driven: that streams publish values on their own accord, and we must decide whether to keep listening for them. DataStreams are consumer-driven, so flatMap must get to the end of one inner DataStream before moving to the next. ObservableValue and Task don’t involve multiple values so the concept doesn’t apply there.

Now let’s look at another basic transformation: filter. Does this apply to all the abstractions? No... because filtering is inherently about multiple values: some get through, some don’t. But only two of our four abstractions involve multiple values: EventStream and DataStream. We can therefore meaningfully filter those. But ObservableValue and Task? Filtering makes no sense there, because there’s only one value or result. Any other transformations that inherently involve multiple values (filter is just one, others include buffer or accumulate) therefore only apply to EventStream and DataStream, but not ObservableValue or Task.

Another basic operator is combining: taking multiple abstractions and combining them into a single one. If we have an X of T1 and an X of T2, we may be able to combine them into a single X of a combined T1 and T2 (i.e. a (T1, T2) tuple). Or, if we have a Collection of Xs of Ts, we can combine them into a single X of a Collection of Ts, or possibly a single X of T. Can this apply to all four abstractions? Yes, but we’ll see that for abstractions that involve multiple values, there are multiple ways to “combine” their values, while for the ones that involve on a single value, there’s only one way to “combine” their values.

That means for ObservableValue and Task, there’s one simple combine transformation. A combined ObservableValue is one whose value is the tuple/collection made up of its sources’ values, and therefore it changes when any one of its source values changes. A combined Task is one that runs each one of its source Tasks in parallel, waits for them all to finish, then returns the combined results of all as its own result (notice that, since Task is fundamentally concerned with execution order, this becomes a key feature of one of its transformations).

With EventStream and DataStream, there are multiple ways in which we can combine their values. With EventStream, we can wait for all sources to publish one value, at which point we publish the first combined value, we store these combined values, then each time any source value changes, we update only the published one, keeping all the rest the same, and publish that new combined value. This is the combineLatest operator: each published event represents the most recently published events from each source stream. We can alternatively wait for each source to publish once, at which point we publish the combination, then we don’t save any values, and again wait for all sources to publish again before combining and publishing again. This is the zip operator.

But combineLatest doesn’t make sense for DataStream though, because it based on when each source stream publishes. The “latest” of combineLatest refers to the timing of the source stream events. Since DataStreams are consumer-driven, there is no timing. The DataStream is simply asked to produce a value by a consumer. Therefore, there’s only combine, where when a consumer consumes a value, the combined DataStream then consumes a value from each of its sources, combines then and returns it. This is the zip operator, which continues to apply to DataStream.

Both EventStream and DataStream also ways to combine multiple streams into a single stream of the same type. With EventStream, this is simply the stream that subscribes to multiple sources and publishes when it receives a value from any of them. This is the merge operator. The order in which a merged EventStream publishes is dictated by the order it receives events from its source streams. We can do something similar with DataStream, but since DataStreams are consumer-driven, the transformation has to decide which source to consume from. A merge would be for the DataStream to first consume all the values of the first source stream, then all the values of second one, and so on (thus making it equivalent to a merge on a Collection… we could also call this concatenate to avoid ambiguity). We can also do a roundRobin: each time we consume a value the combined stream consumes one from a particular source, then one from the next one, and so on, wrapping back around after it reaches the end. There are all sorts of ways we can decide how to pick the order of consumption, and a custom algorithm can probably be plugged in as a Strategy to a transformation.

Somewhat surprisingly, I believe that covers it for ObservableValue and Task, with one exception (see below): map, flatten and combine are really the only transformations we can meaningfully do with them, because all other transformations involve either timing or value streams. Most of the remaining transformations from Rx we haven’t talked about will still apply to both EventStream and DataStream, but there are some important ones that only apply to one or the other. Any transformations that involve order apply only to DataStream, for example append or prepend. Any transformations that are driven by timing of the source streams apply only to EventStream, for example debounce or delay. And some transformations are really not transformations but conversions.

The exception I mentioned is for ObservableValue. EventStreams are “hot”, and their transformations are “eager”, and it never makes sense for them to not be (in the realist interpretation of events and “observing”). Derived ObservableValues, however, can be “eager” or “lazy”, and both are perfectly compatible with the abstraction. If we produce one ObservableValue from a sequence of transformations (say, maps and combines) on other ObservableValues, then we can choose to either perform those transformations every time someone reads the value, or we can choose to store the value, and simply serve it up when asked.

I believe the best way to implement this is to have derived ObservableValues be lazy by default: their values get computed from their sources on each read. This also means when there are subscribers to updates, they must subscribe to the source values’ updates, then apply the transformations each time new values are received by the sources. But sometimes this isn’t the performance we want. We might need one of those derived values to be fast and cheap to read. To do that, we can provide the cache operator. This takes an ObservableValue, and creates a new one that stores its value directly. This also requires that it eagerly subscribe to its source value’s updates and use them to update the stored value accordingly. There is also an issue of thread safety: what if we want to read a cached ObservableValue from one thread but it’s being written to on another thread? To handle this we can allow the cache operator to specify how (using a Scheduler) the stored value is updated. These issues of caching and thread safety are unique to observable values.

Converters

Now let’s talk about how we can turn one of these four abstractions into another one of of the four. In total that would be 12 possible converters, assuming that there’s a useful or sensible way to convert each abstraction into each other one.

Let’s start with EventStream as the source.

What does it mean to convert an EventStream into a DataStream? This means we’re talking a producer driven stream and converting it to a consumer driven stream. Remember the key distinction is that EventStreams are defined by timing: the events happen when they do, and subscribers are either subscribed at the time or they miss them. DataStreams are defined by order: the data are returned in a specific sequence, and it’s not possible to “miss” one (you can skip one but that’s a conscious choice of the consumer). Thus, turning an EventStream into a DataStream is fundamentally about storing events as they are received until they are later consumed, ensuring that none can be missed. It is, therefore, buffering. For this reason, this conversion operator is called buffer. It internally builds up a Collection of events received from the EventStream, and when a consumer consumes a value, the first element in the collection is returned immediately. If the Collection is empty, the consumer will be blocked until the next event is received.

What does it means to convert an EventStream to an ObservableValue? It would mean we’re storing the latest event emitted by the stream, so we can query what it is at any time. We call this converter cacheLatest. Note that the latest event must be cached, or else we wouldn’t be able to read it on demand. That’s fundamentally what this converter is doing: taking transient events that are gone right after they occur, and making them persistent values that can be queried as needed. This can be combined with other transformations on EventStream to produce some useful derived converters. For example, if we apply the accumulate operator to the EventStream, then use cacheLatest to produce an ObservableValue, the result is an accumulateTo operator, which stores a running accumulation (perhaps a sum) of incoming values over time.

What does it mean to convert an EventStream to a Task? Well, basically it would mean we create a Task that waits for one or more events to be published, then returns them as the result. But as we will see soon, it makes more sense to create “wait for the next value” Tasks out of DataStreams, and we can already convert EventStreams to DataStreams with buffer. Therefore, this converter would really be a compound conversion of first buffering and then fetching the next value. We can certainly write it as a convenience function, but under the hood it’s just composing other converters.

Now let’s move to DataStream as the source.

What does it mean to convert a DataStream to an EventStream? Well, an EventStream publishes its own events, but a DataStream only returns values when a consumer consumes them. Thus, turning a DataStream into an EventStream involves setting up a consumer to immediately start consuming values, and publish them as soon as they are available. The result is that multiple observers can now listen for those values as they get published, with the caveat that they’ll miss any values if they don’t subscribe at the right time. We can call this conversion broadcast.

What does it mean to convert a DataStream to an ObservableValue? Nothing useful or meaningful, as far as I can tell. Remember meaning of converting an EventStream to an ObservableValue was to cache the latest value. That’s a reference to timing. But timing in a DataStream is controlled by the consumer, so all that could mean is a consumer powers through all the values and saves them to an ObservableValue. The result is a rapidly changing value that then gets stuck on the last value in the DataStream. That doesn’t appear to be a valid concept.

What does it mean to convert a DataStream to a Task? Simple: read the next value! In fact, in .NET, where Task is the return value of all async functions, the return value of the async read method would then have to be a Task. There can, of course, be other related functions to read more than one value. We can also connect an input DataStream to an output DataStream (which, remember, is a valid abstraction but not one carved out from Observable, which only represents sources and not sinks), which results in a Task whose work is to consume values from the input and send them to the output.

Now let’s move to ObservableValue as the source.

What does it mean to convert an ObservableValue to an EventStream? Simple: publish the updates! Now, part of an ObservableValue‘s features is being able to subscribe to updates. How is this related to publishing updates? When we subscribe to an ObservableValue, the subscription is lazy (for derived ObservableValues, the transformations are applied to source values as they arrive), and we have the option of notifying our subscriber immediately with the current value. But when we produce an EventStream from an ObservableValue, remember EventStreams are always hot! It must eagerly subscribe to the ObservableValue and then publish each update it receives. This is significant for derived lazy ObservableValues, because as long as someone holds onto an EventStream produced from it, it has an active subscription and therefore its value is being calculated, which it wouldn’t be if no one was subscribed to it. We can call this converter publishUpdates: it is basically ensuring that updates are eagerly computed and broadcasted so that anyone can observe them as any other EventStream.

What does it mean to convert an ObservableValue to a DataStream? Nothing useful or meaningful that I can think of. At best it would be a stream that lets us read the updates, but that’s just publishUpdates followed by buffer.

What does it mean to convert an ObservableValue to a Task? Again, I can’t think of anything useful or meaningful, that wouldn’t be a composition of other converters.

Now let’s move to Task as the source.

Tasks don’t convert to any of the other abstractions in any meaningful way, because they have only a single return value. Even ObservableValues, which fundamentally represent single values, still have an associated stream of updates. Tasks don’t even have this. For this reason, we can’t derive any kind of meaningful stream from a Task, which means there’s also nothing to observe.

The converters are summarized in the following table (the row is the input, the column is the output):

EventStreamDataStreamObservableValueTask
EventStreamN/AbuffercacheLatestnone
DataStreambroadcastN/Anoneread
ObservableValuepublishUpdatesnoneN/Anone
TasknonenonenoneN/A

There are, in total, only five valid converters.

Conclusion

After separating out the abstractions, we find that the humongous zoo of operators attached to Observable is tidied up into more focused groups of transformations on each of the four abstractions, plus (where applicable) ways to convert from one to the other. What this reveals is that places where an Rx-driven app creates deep confusion over “hotness/coldness” and side effects are areas where an Observable really represents one of these four abstractions but it combined with a transformation operation that does not apply to that abstraction. For example, one true event stream (say, of mouse clicks or other user gestures) appended to another one makes no sense. Nor does trying to merge two observable values into a stream based on which one changes first.

In the final part, we’ll revisit the example app from the first part, rewrite it with our new abstractions and escape the clutches of Rx Hell.

On ReactiveX – Part II

In the last part, we explored the bizarre world of extreme observer-dependence that gets created in an ReactiveX (Rx)-driven app, and how that world rapidly descends into hell, especially when it is applied as a blanket solution to every problem.

Is the correct reaction to all of this to say “Screw Rx” and be done with it? Well, not entirely. The part where we try to cram every shaped peg into a square hole, we should absolutely say “to hell” with that. Whenever you see library tutorials say any variant of “everything is an X”, you should back away slowly, clutching whatever instrument of self-defense you carry. The only time that statement is true is if X = thing. Yes, everything is a thing… and that’s not very insightful, is it? The reason “everything is an X” with some more specific X seems profound is because it’s plainly false, and you have to imagine some massive change in perception for it to possibly be true.

Rx’s cult of personality cut its way through the Android world a few years ago, and now most of its victims have sobered up and moved on. In what is a quintessentially Apple move, Apple invented their own, completely proprietary and incompatible version of Rx, called Combine, a couple of years ago, and correspondingly the “everything is a stream” drug is making its rounds through the iOS world. It, too, will come to pass. A large part of what caused RxJava to wane is Kotlin coroutines, and with finally Swift gaining async/await, Combine will subside as well. Why do these “async” language features replace Rx? Because Rx was touted as the blanket solution to concurrency.

Everything is not an event stream, or an observable, period. Some things are. Additionally, the Rx Observable is a concept with far too much attached to it. It is trying to be so many things at once, owing to the fact it’s trying to live up to the “everything is a me” expectation, which will only result in Observable becoming a synonym for Any, except instead of it doing what the most general, highest category should do (namely, nothing), it endeavors instead to do the opposite: everything. It’s a God object in the making. That’s why it ends up everywhere in your code, and gradually erodes all the information a robust type system is supposed to communicate.

But is an event stream, endeavoring only to be an event stream, with higher-order quasi-functional transformations, a useful abstraction? I believe it’s a massively useful one. I still use it for user interfaces, but I reluctantly do so with Rx’s version of one, mostly because it’s the best one available.

The biggest problem with Rx is that its central abstraction is really several different abstractions, all crammed together. After thinking about this for a while, I have identified four distinct concepts that have been merged under the umbrella of the Observable interface. By disentangling these from each other, we can start to rebuild more focused, well-formed libraries that aren’t infected with scope creep.

These are the four abstractions I have identified:

  • Event streams
  • Data streams
  • Observable values
  • Tasks

Let’s talk about each one, what they are (and just as importantly, what they aren’t), and how they are similar to and different from Rx’s Observable.

Event Streams

Let us return from the mind-boggling solipsism of extreme Copenhagen interpretation, where the world around us is brought into being by observing it, and return to classical realism, where objective reality exists independent of observation. Observation simply tells us what is out there. An event stream is literally a stream of events: things that occur in time. Observing is utterly passive. It does not, in any way, change what events occur. It merely signs one up to be notified when they do.

The Rx Observable practically commands the Copenhagen outlook by making the abstract method, to be overridden by the various subclasses returned by operators, subscribe. It is what, exactly, subscribing (a synonym for observing) means that varies with different types of streams. This is where the trouble starts. It sets us up to have subscribe be what controls publish.

A sane approach to an event stream is for the subscribe method to be final. Subscribing is what it is: it just adds a callback to the list of callbacks to be triggered when an event is published. It should not alter what is published. The interesting behavior should occur exclusively in the constructor of a stream.

Let us recall the original purpose of the Observer Pattern. The primary purpose is not really to allow one-to-many communication. That’s a corollary of its main purpose. The main purpose is to decouple the endpoints of communication, specifically to allow one object to send messages to another object without ever knowing about that other object, not even the interfaces it implements.

Well, this is no different than any delegation pattern. I can define a delegate in class A, then have class B implement that delegate, allowing A to communicate with B without knowing about B. So what is it, specifically, about the Observer pattern that loosens the coupling even more than this?

This answer is that the communication is strictly one way. If an A posts an event, and B happens to be listening, B will receive it, but cannot (without going through some other interface that A exposes) send anything back to Anot even a return value. Essentially, all the methods in an observer interface must have void returns. This is what makes one-to-many broadcasting a trivial upgrade to the pattern, and why you typically get it for free. Broadcasting with return values wouldn’t make sense.

The one-way nature of the message flow creates an objective distinction between the publisher (or sender) and the subscriber (or receiver). The intermediary that moves the messages around is the channel, or broker. This is distinct from, say, the Mediator Pattern, where the two ends of communication are symmetric. An important consequence of the asymmetry of observers is that the presence of subscribers cannot directly influence the publisher. In fact, the publisher in your typical Observer pattern implementation can’t even query who is a subscriber, or even how many subscribers there are.

A mediator is like your lawyer talking to the police. An observer is like someone attending a public speech you give, where the “channel” is the air carrying the sound of your voice. What you say through your lawyer depends on what questions the police ask you. But the speech you give doesn’t depend on who’s in the audience. The speaker is therefore decoupled from his audience to a greater degree than you are decoupled from the police questioning you.

By moving the publishing behavior into subscribe, Rx is majorly messing with this concept. It muddles the distinction between publisher/sender and subscriber/receiver, by allowing the subscribe/receive end of the chain to significantly alter what the publisher/sender side does. It’s this stretching of the word “observe” to mean something closer to “discuss” that can cause confusion like “why did that web request get sent five times?”. It’s because what we’re calling “observing a response event” is more like “requesting the response and waiting for it to arrive”, which is a two-way communication.

We should view event streams as a higher abstraction level for the Observer Pattern. An EventStream is just a wrapper around a channel, that encapsulates publishing and defines transformation operators that produce new EventStreams. The publishing behavior of a derived stream is set up at construction of the stream. The subscribe method is final. Its meaning never changes. It simply forwards a subscribe call to the underlying channel.

Event streams are always “hot”. If the events occur, they are published, if not, they aren’t. The transformation operations are eager, not lazy. The transform in map is evaluated on each event as soon as the event is published, independent of subscribers. This expresses the realism of this paradigm: those mapped events happen, period. Subscribing doesn’t make them happen, it just tells us about them. The way we handle whether derived streams continue to publish their derived events is by holding onto the stream. If a derived stream exists, it is creating and publishing derived events. If we want the derived events to stop firing, we don’t throw away subscriptions, we throw away the stream itself.

There’s no problem of duplication here. The subscribing is one-to-many, but the construction of the events, the only place where any kind of side effects can occur, is tied the construction of derived streams, which only happens once. One stream = one instance of each event. The other side of that coin is that missed events are missed, period. If you want any kind of caching behavior, that’s not an event stream. It’s something else.

I think we’ll also find that by separating out the other concepts we’ll get to next, the need to ever create event streams that have any side effects is reduced to essentially zero.

Rx streams have behavior for handling the stream “completing”, and handling exceptions that get thrown during construction of an item to be emitted. I have gone back and forth over whether it makes sense for a strict event stream to have a notion of “completing”. I lean more toward thinking it doesn’t, and that “completion” applies strictly to the next concept we’ll talk about.

What definitely does not make sense for event streams is failures. Event streams themselves can’t “fail”. Events happen or they don’t. If some exception gets thrown by a publisher, it’s a problem for the publisher, that’s either going to be trapped by the publisher, will kill the publisher, or kill the process. Having it propagate to subscribers, and especially having it (by design) terminate the whole stream doesn’t make sense.

Data Streams

The next concept is a data stream. How are “data” streams different from “event” streams? Isn’t an event just some data? Well, an event holds data, but the event is the occurrence itself. With data streams, the items are not things that occur at a specific time. They may become available at a specific time, but that time is otherwise meaningless. The only significance of the arrival time of a datum is that we have to wait for it.

More importantly, in a stream of data, every datum matters. It’s really the order, not the timing, of the items that’s important. It’s critical that someone reading the data stream receive every element in the correct order. If a reader wants to skip some elements, that’s his business. But it wouldn’t make sense for a reader to miss elements and not know it.

We subscribe to an event stream, but we consume a data stream. Subscribing is passive. It has no impact on the events in the stream. Consuming is active. It is what drives the stream forward. The “next” event in a stream is emitted whenever it occurs, independent of who is subscribed. The “next” event of a data stream is emitted when the consumer decides to consume it. In both cases, once an element is emitted, it is never re-emitted.

Put succinctly, an event stream is producer-driven, and a data stream is consumer-driven. An event stream is a push stream, and a data stream is a pull stream.

This means a data stream cannot be one-to-many. An event stream can have arbitrarily many subscribers, only because subscribing is passive; entirely invisible to the publisher. But a data stream cannot have multiple simultaneous consumers. If we passed a data stream to multiple consumers who tried to read at the same time, they would step on each others’ toes. One would consume a datum and cause the other one to miss it.

To clarify, we’re talking about a specific data stream we call an input stream. It produces values that a consumer consumes. The other type of data stream is an output stream, which is a consumer itself, rather than a producer. Output streams are a separate concept not related to Rx Observables, because Observables are suppliers, not consumers (consumers in Rx are called Subscribers).

Most languages already have input and output stream classes, but they aren’t generic. Their element type is always bytes. We can define a generic one like this:

interface InputStream<Element>
{
    bool hasMore();

    Element read();

    UInt skip(UInt count);
}

This time it’s a pure interface. There’s no default behavior. Different types of streams have to define what read means.

Data streams can be transformed in ways similar to event streams. But since the “active” part of a data stream is the reading, it is here that a derived stream will interact with its source stream. This will look more like how Rx Observable implements operators. The read method will be abstract, and each operator, like map and filter, will implement read by calling read on the source stream and applying the transform. In this case, the operators are lazy. The transform is not applied to a datum until a consumer consumes the mapped stream.

The obvious difference between this an Rx Observables is that this is a pull, rather than push, interface. The read method doesn’t take a callback, it returns a result. This is exactly what we want for a stream where the next value is produced by the consumer requesting it. A data stream is inherently a pull paradigm. A push-style interface just obscures this. Typical needs with data streams, for example reading “n” values, then switching to do other stuff and then returning to read some more, become incredibly convoluted with an interface designed for a stream where the producer drives the flow.

A pull interface requires that if the next datum isn’t available yet, the thread must block. This is the horror that causes people to turn everything into callbacks: so they never block threads. The phobia of blocking threads (which is really a phobia of creating your own threads that can be freely blocked without freezing the UI or starving a thread pool) is a topic for another day. For the sake of argument I’ll accept that it’s horrible and we must do everything to avoid it.

The proper solution to the problem of long-running methods with return values that don’t block threads is not callbacks. Callback hell is the price we pay for ever thinking it was, and Rx hell is really a slight variation of callback hell with even worse problems layered on top. The proper solution is coroutines, specifically async/await.

This is, of course, exactly how we’d do it today in .NET, or any other language that has coroutines. If you’re stuck with Java, frankly I think you should just let the thread block, and make sure you do the processing on a thread you created (not the UI thread). That is, after all, exactly how Java’s InputStream works. If you are really insistent on not blocking, use a Future. That allows consuming with a callback, but it at least communicates in some way that you only expect the callback to be called once. That means you get a Future each time you read a chunk of the stream. If that seems ugly/ridiculous to you, then just block the damn thread!

Data streams definitely have a notion of “completing”. Their interface needs to be able to tell a consumer that there’s nothing left to consume. How does it handle errors? Well, since the interface is synchronous, an exception thrown by a transformation will propagate to the consumer. It’s his business to trap it and decide how to proceed. It should only affect that one datum. It should be possible to continue reading after that. If an intermediate derived stream doesn’t deal with an exception thrown by a source stream, it will propagate through until it gets to an outer stream that handles it, or all the way out to the consumer. This is another reason why a synchronous interface is appropriate. It is exactly what try-catch blocks do. Callback interfaces require you to essentially try-catch on every step, even if a step actually doesn’t care about (and cannot handle) an error and simply forwards it. You know you hate all that boilerplate. Is it really worth all of that just to not block a thread?

(If I was told I simply cannot block threads I’d port the project to Kotlin before trying to process data streams with callbacks)

Observable Values

Rx named its central abstraction Observable. This made me think if I create an Observable<String>, it’s just like a regular String, except I can also subscribe to be notified when it changes. But that’s not at all what it is. It’s a stream, and streams aren’t values. They emit values, but they aren’t values themselves. What’s the difference, exactly? Well, if I had what was literally an observable String, I could read it, and get a String. But you can’t “read” an event stream. An event stream doesn’t have a “current value”. It might have a most recently emitted item, but those are, conceptually, completely different.

Unfortunately, in its endeavor toward “everything is me”, Rx provides an implementation of Observable whose exact purpose is to try to cram these two orthogonal concepts together: the BehaviorSubject. It is a literal observable value. It can be read to get its current value. It can be subscribed to, to get notified whenever the value changes. It can be written to, which triggers the subscribers.

But since it implements Observable, I can pass it along to anything that expects an Observable, thereby forgetting that its really a BehaviorSubject. This is where it advertises itself as a stream. You might think: well it is a stream, or rather changes to the value are a stream. And that is true. But that’s not what you’re subscribing to when you subscribe to a BehaviorSubject. Subscribing to changes would mean you don’t get notified until the next time the value gets updated. If it never changes, the subscriber would never get called. But subscribers to a BehaviorSubject always get called immediately with the current value. If all you know if you’ve got an Observable, you’ll have no idea if this will happen or not.

Once you’ve upcast to an Observable, you lose the ability to read the current value. To preserve this, you’ll have to expose it as a BehaviorSubject. The problem then becomes that this exposes both reading and writing. What if you want to only expose reading the current value, but not writing? There’s no way to do this.

The biggest problem is that operators on a BehaviorSubject produce the same Observable types that those operators always do, which again loses the ability to read the current value. You end up with a derived Observable where the subscriber always gets called immediately (unless you drop or filter or do something else to prevent this), so it certainly always has a current value, you just can’t read it. This has forced me to do very stupid stuff like this:

BehaviorSubject<Int> someInt = new BehaviorSubject<Int>(5);

Observable<String> stringifiedInt = someInt
    .map(value -> value.toString());

...

String currentStringifiedInt = null;

Disposable subscription = stringifiedInt
    .subscribe(value ->
    {
        currentStringifiedInt = value;
    });

subscription.dispose();

System.out.print("Current value: " + currentStringifiedInt);
...

This is ugly, verbose, obtuse and unsafe. I have to subscribe just to trigger the callback to produce the current value for me, then immediately close the subscription because I don’t want that callback getting called again. I have to rely on the fact that a BehaviorSubject-derived observable will emit items immediately (synchronously), to ensure currentStringifiedInt gets populated before I use it. If I turn the derived observable back into a BehaviorSubject (which basically subscribes internally and sticks each updated value into the new BehaviorSubject), I can read the current value, but I can write to it myself, thereby breaking the relationship between the derived observable and the source BehaviorSubject.

The fundamental problem is that observable values and event streams aren’t the same thing. We need a separate type for this. Specifically, we need two interfaces: one for read-only observable values, and one for read-write observable values. This is where we’re going to see the type of subscribe-driven lazy evaluation that we see inside of Rx Observables. Derived observables are read-only. Reading them triggers whatever cascade of processing and upstream reading is necessary to produce the value. When we subscribe, that is where it will subscribe to its source observables, inducing them to compute their values when necessary (when those values update) to send them downstream.

Furthermore, the subscribe method on our Observable should explicitly ask whether the subscriber wants to be immediately notified with the current value (by requiring a boolean parameter). Since we have a separate abstraction for observable values, we know there is always a current value, so this question always makes sense.

Since the default is lazy, and therefore expensive and repetitious evaluation, we’ll need an operator specifically to store a derived observable in memory for quick evaluation. Is this comparable to turning a cold (lazy) Rx Observable into a hot (eager) one? No, because the thing you subscribe to with observable values, the changes, are always hot. They happen, and you miss them if you aren’t subscribed. Caching is purely a matter of efficiency, trading computing time for computing space (storage). It has no impact whatsoever on when updates get published.

Caching will affect whether transformations to produce a value are run repeatedly, but only for synchronous reads (multiple subscribers won’t cause repeated calculations). The major difference is that we can eliminate repeated side-effects from double-calculating a value without changing how or when its updates are published. What subscribers see is totally separate from whether an observable value is cached, unlike in Rx where “sharing” an Observable changes what subscribers see (it causes them to miss what they otherwise would have received).

A single Observable represents a single value. Multiple subscribers means multiple people are interested in one value. There’s no issue of “making sure all observers see the same sequence”. If a late subscriber comes in, he’ll either request the current value, whatever it is, or just request to be notified of later changes. The changes are true events (they happen, or they don’t, and if they do they happen at a specific time). We’d never need to duplicate calculations to make multiple subscribers see stale updates.

Furthermore, we communicate more clearly what, if any, “side effects” should be happening inside a transformation. They should be limited to whatever is necessary to calculate the value. If we have a derived value that requires an HTTP request to calculate it, this request will go out either when the source value changes, requiring a re-evaluation, or it will happen when someone tries to read the value… unless we cache it, which ensures the request always goes out as soon as it can. It is multiple synchronous reads that would, for non-cached values, trigger multiple requests, not multiple subscribers. This makes sense. If we’ve specified we don’t want to store the value, we’re saying each time we want to query the value we need to do the work of computing it.

Derived (and therefore read-only) observable values, which can both be subscribed to and read synchronously, is the most important missing piece in Rx. It’s so important I’ve gone through the trouble multiple times to build rudimentary versions of it in some of my apps.

“Completion” obvious makes no sense for observable values. They never stop existing. Errors should probably never happen in transformations. If a runtime exception sneaks through, it’s going to break the observable. It will need to be rethrown every time anyone tries to read the value (and what about subscribing to updates?). The possibility of failure stretches the concept of a value whose changes can be observed past, in my opinion, its range of valid interpretation. You can, of course, define a value that has two variations of success and failure (aka a Result), but the possibility of failure is baked into the value itself, not its observability.

Tasks

The final abstraction is tasks. Tasks are just asynchronous function invocations. They are started, and they do or do not produce a result. This is fundamentally different from any kind of “stream” because tasks only produce one result. They may also fail, in which case they produce one exception. The central focus of tasks is not so much on the value they produce but on the process of producing it. The fact the process is nontrivial and long-running is the only reason you’d pick a task over a regular function to begin with. As such, tasks expose an interface to start, pause/resume and cancel. Tasks are, in this way, state machines.

Unlike any of the other abstractions, tasks really do have distinct steps for starting and finishing. This is what ConnectableObservable is trying to capture with its addition (or rather, separation from subscribe) of connect. The request and the response are always distinct. Furthermore, once a task is started, it can’t be “restarted”. Multiple people waiting on its response doesn’t trigger the work to happen multiple times. The task produces its result once, and stores it as long as it hangs around in case anyone else asks for it.

Since the focus here is on the process, not the result, task composition looks fundamentally different from stream composition. Stream composition, including pipelines, focuses on the events or values flowing through the network. While task composition deals with the results, it does so primarily in its concern with dependency, which is about the one thing task composition is really concerned with: exactly when the various subtasks can be started, relative to when other tasks start or finish. Task composition is concerned with whether tasks can be done in parallel or serially. This is even a concern for tasks that don’t produce results.

Since tasks can fail, they also need to deal with error propagation. An error in a task means an error occurring somewhere in the process of running the task: moving it from start to finish. It’s the finishing that is sabotaged by an error, not the starting. We expect starting a task to always succeed. It’s the finishing that might never happen due to an error. This is represented by an additional state for failed. This is why it is not starting a task that would throw an exception, but waiting on its result. It makes sense that in a composed task, if a subtask fails, the outer task may fail. The outer task either expects and handles the error by trapping it, or it doesn’t, in which case it propagates out and becomes a failure of the outer task.

This propagation outward of errors, through steps that simply ignore those errors (and therefore, ideally, should contain absolutely no boilerplate code for simply passing an error through), is similar to data streams, and it therefore demands a synchronous interface. This is a little more tricky though because tasks are literally concerned with composing asynchrony. Even if we’re totally okay with blocking threads, what if we want subtasks to start simultaneously? Well, that’s what separating starting from waiting on the result lets us do. We only need to block when we need the result. That can be where exceptions are thrown, and they’ll automatically propagate through steps that don’t deal with them, which is exactly what we want. This separates when an exception is thrown from when an exception is (potentially) caught, and therefore requires tasks to cache exceptions just like they do their result.

We can, of course, avoid blocking any threads by using coroutines. That’s exactly what the .NET Tasks do. If you’re in a language that doesn’t have coroutines, I have the same advice I have for data streams: just block the damn threads. You’ll tear your hair out with the handleResult/handleError pyramids of callback doom, where most of your handleError callbacks are just calling the outer handleError to pass errors through.

What’s missing in the Task APIs I’ve seen is functional transformations like what we have on the other abstractions. This is probably because the need is much less. It’s not hard at all to do what it is essentially a map on a Task:

async Task<MappedResult> mapATask()
{
    Task<Result> sourceTask = getSourceTask();
    Function<Result, MappedResult> transform = getTransform();

    return transform(await sourceTask);
}

But still, we can eliminate some of that boilerplate with some nice extension methods:

static async Task<MappedResult> Map<Result, MappedResult>(this Task<Result> ThisTask, Function<Result, MappedResult> transform)
{
    return transform(await ThisTask);
}

...

Task<Result> someTask = getTask();

await someTask
  .map(someTransform)
  .map(someOtherTransform);

Conclusion

By separating out these four somewhat similar but ultimately distinct concepts, we’ll find that the “hot” vs. “cold” distinction is expressed by choosing the right abstraction, and this is exposed to the clients, not hidden in the implementation details. Furthermore, the implication of side effects is easier to understand and address. We make a distinction of how “active” or “passive” different actions are. Observing an event is totally passive, and cannot itself incur side effects. Constructing a derived event stream is not passive, it entails the creation of new events. Consuming a value in a data stream is also not passive. Notice that broadcasting requires passivity. The only one-to-many operations available, once we distinguish the various abstractions, are observing an event stream and observing changes to an observable value. The former alone cannot incur side effects itself, and the latter can only occur side effects when going from no observers to more than none, and thus is independent of the multiplicity of observers. We have, in this way, eliminated the possibility of accidentally duplicating effort in the almost trivial manner that it is possible in Rx.

In the next part, we’ll talk about those transformation operators, and what they look like after separating the abstractions.

On ReactiveX – Part I

If a tree falls in the forest and no one hears it, does it make a sound?

The ReactiveX libraries have finally answered this age-old philosophical dilemma. If no one is listening for the tree falling, not only does it not make a sound, the tree didn’t even fall. In fact, the wind that knocked the tree down didn’t even blow. If no one’s in the forest, then the forest doesn’t exist at all.

Furthermore, if there are three people in the forest listening, there are three separate sounds that get made. Not only that, there are three trees, each one making a sound. And there are three gusts of wind to knock each one down. There are, in fact, three forests.

ReactiveX is the Copenhagen Interpretation on steroids (or, maybe, just taken to its logical conclusion). We don’t just discard counterfactual definiteness, we take it out back and shoot it. What better way to implement Schrodinger’s Cat in your codebase than this:

final class SchrodingersCat : Observable<boolean>
{
    public SchrodingersCat()
    {
        cat = new Cat("Mittens");
    }

    private void subscribeActual(@NonNull Observer<boolean> observer)
    {
        if(!observed)
        {
            observed = true;

            boolean geigerCounterTripped = new Random().nextInt(2) == 0;
            if(geigerCounterTripped)
                new BluntInstrument().murder(cat);
        }

        observer.onNext(cat.alive());
    }

    private final Cat cat;

    boolean observed = false;
}

In this example, I have to go out of my way to prevent multiple observers from creating multiple cats, each with its own fate. Most Observables aren’t like that.

When you first learn about ReactiveX (Rx, as I will refer to it from now on), it’s pretty cool. The concept of transforming event streams, whose values occur over time, as opposed to collections (Arrays, Dictionarys, etc.), whose values occur over space (memory, or some other storage location), the same way that you transform collections (map, filter, zip, reduce, etc.) immediately struck me as extremely powerful. And, to be sure, it is. This began the Rx Honeymoon. The first thing I knew would benefit massively from these abstractions are the thing I had already learned to write reactively, but without the help of explicit abstractions for that purpose: graphical user interfaces.

But, encouraged by the “guides”, I didn’t stop there. “Everything is an event stream”, they said. They showed me the classic example of executing a web request, parsing its result, and attaching it to some view on the UI. It seems like magic. Just define your API service’s call as an Observable, which is just a map of the Observable for an a general HTTP request (if your platform doesn’t provide for you, you can easily write it bridging a callback interface to an event stream). Then just do some more mapping and you have a text label that displays “loading…” until the data is downloaded, then it automatically switches to display the loaded data:

class DataStructure
{
    String title;
    int version;
    String displayInfo;
    ...
}

class AppScreenViewModel
{
    ...

    public final Observable<String> dataLabelText;

    ...

    public AppScreenViewModel()
    {
        ...

        Observable<HTTPResponse> response = _httpClient.request(
             "https://myapi.com/getdatastructure",
             HTTPMethod::get
         );

        Observable<DataStructure> parsedResponse = response
            .map(response -> new JSONParser().parse<DataStructure>(response.body, new DataStructure());

        Observable<String> loadedText = parsedResponse
             .map(dataStructure -> dataStructure.displayInfo);

        Observable<String> loadingText = Observable.just("Loading...);

        dataLabelText = loadingText
             .merge(loadedText);
    }
}

...

class AppScreen : View
{
    private TextView dataLabel;

    ...

    private void bindToViewModel(AppScreenViewModel viewModel)
    {
        subscription = viewModel.dataLabelText
            .subscribe(text -> dataLabel.setText(text));
    }
}

That’s pretty neat. And you wouldn’t actually write it like this. I just did it like this to illustrate what’s going on. It would more likely look something like this:

    public AppScreenViewModel()
    {
        ...

        dataLabelText = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure())
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just("Loading...");

        ...
    }

And, of course, you’d want to move the low-level HTTP client stuff out of the ViewModel. What you get is an elegant expression of a pipeline of retrieval and processing steps, with end of the pipe plugged into your UI. Pretty neat!

But… hold on. I’m confused. I have my UI subscribe (that is, listen) to a piece of data that, through a chain of processing steps, depends on the response to an HTTP request. I can understand why, once the response comes in, the data makes its way to the text label. But where did I request the data? Where did I tell the system to go ahead and issue the HTTP request, so that eventually all of this will get triggered?

The answer is that it happens automatically by subscribing to this pipeline of events. That is also when it happens. The subscription happens in bindToViewModel. The request will be triggered by that method calling subscribe on the observable string, which triggers subscribes to all the other observables because that’s how the Observable returned by operators like map and merge work.

Okay… that makes sense, I guess. But it’s kind of a waste of time to wait until then to send the request out. We’re ready to start downloading the data as soon as the view-model is constructed. Minor issue, I guess, since in this case these two times are probably a fraction of a second apart.

But now let’s say I also want to send that version number to another text label:

class DataStructure
{
    String title;
    int version;
    String displayInfo;
    ...
}

class AppScreenViewModel
{
    ...

    public final Observable<String> dataLabelText;
    public final Observable<String> versionLabelText;

    ...

    public AppScreenViewModel()
    {
        ...

        Observable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure());

        String loadingText = "Loading...";

        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just(loadingText);

        versionLabelText = dataStructure
            .map(dataStructure -> Int(dataStructure.version).toString())
            .merge(Observable.just(loadingText);
    }
}

...

class AppScreen : View
{
    private TextView dataLabel;
    private TextView versionLabel;

    ...

    private void bindToViewModel(AppScreenViewModel viewModel)
    {
        subscriptions.add(viewModel.dataLabelText
            .subscribe(text -> dataLabel.setText(text)));

        subscriptions.add(viewModel.versionLabelText
            .subscribe(text -> versionLabel.setText(text)));
    }
}

I fire up my app, and then notice in my web proxy that the call to my API went out twice. Why did that happen? I didn’t create two of the HTTP request observables. But remember I said that the request gets triggered in subscribe? Well, we can clearly see two subscribes here. They are each to different observables, but both of them are the result of operators that begin with the HTTP request observable. Their subscribe methods call subscribe on the “upstream” observable. Thus, both of the two chains eventually calls subscribe, once each, on the HTTP request observable.

The honeymoon is wearing off.

Obviously this isn’t acceptable. I need to fix it so that only one request gets made. The ReactiveX docs refer to these kinds of observables as cold. They don’t do anything until you subscribe to them, and when you do, they emit the same items for each subscriber. Normally, we might think of “items” as just values. So at worst this just means we’re making copies of our structures. But really, an “item” in this world is any arbitrary code that runs when the value is produced. This is what makes it possible to stuff very nontrivial behavior, like executing an HTTP request, inside an observable. By “producing” the value of the HTTP response, we execute the code that calls the HTTP client. If we produce that value for “n” listeners, we literally have to produce it “n” times, which means we call the service “n” times.

The nontrivial code that happens as part of producing the next value in a stream is what we can call side effects. This is where the hyper-Copenhagen view of reality starts getting complicated (if it wasn’t already). That tree falling sound causes stuff on its own. It chases birds off, and shakes leaves off of branches. Maybe it spooks a deer, causing it to run into a street, which causes a car driving by to swerve into a service pole, knocking it down and cutting off the power to a neighborhood miles away. So now, “listening” to the tree falling sound means being aware of anything that was caused by that sound. Sitting in my living room and having the lights go out now makes me an observer of that sound.

There’s a reason Schrodinger put the cat in a box: to try as best he could to isolate events inside the box from events outside. Real life isn’t so simple. “Optimizing” out the unobserved part of existence requires you to draw a line (or box?) around all the effects of a cause. The Butterfly Effect laughs derisively at the very suggestion.

Not all Observables are like this. Some of them are hot. They emit items completely on their own terms, even if no subscribers are present. By subscribing, you’ll receive the same values at the same time as any other subscribers. If one subscriber subscribes late, they’ll miss any previously emitted items. An example would be an Observable for mouse clicks. Obviously a new subscriber can’t make you click the mouse again, and you can click the mouse before any subscribers show up.

To fix our problem, we need to convert the cold HTTP response observable to a hot one. We want it to emit its value (the HTTP response, which as a side effect will trigger the HTTP request) on its own accord, independent of who subscribes. This will solve both the problem of waiting too long to start the request, and having the request go out twice. To do this, Rx gives us a subclass of Observable called ConnectableObservable. In addition to subscribe, these also have a method connect, which triggers them to start emitting items. I can use the publish operator to turn a cold observable into a connectable hot one. This way, I can start the request immediately, without duplicating it:

        ConnectableObservable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure())
            .publish();

        dataStructure.connect();

        String loadingText = "Loading...";

        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just(loadingText);

        versionLabelText = dataStructure
            .map(dataStructure -> Int(dataStructure.version).toString())
            .merge(Observable.just(loadingText);

Now I fire it up again. Only one request goes out! Yay!! But wait… both of my labels still say “Loading…”. What happened? They never updated.

The response observable is now hot: it emits items on its own. Whatever subscribers are there when that item gets emitted are triggered. Any subscribers that show up later miss earlier items. Well, my dev server running in a VM on my laptop here served up that API response in milliseconds, faster than the time between this code running and the View code subscribing to these observables. By the time they subscribed, the response had already been emitted, and the subscribers miss it.

Okay, back to the Rx books. There’s an operator called replay, which will give us a connectable observable that begins emitting as soon as we call connect, but also caches the items that come in. When anyone subscribes, it first powers through any of those cached items, sending them to the new subscriber in rapid succession, to ensure that every subscriber sees the same sequence of items:

        ConnectableObservable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure())
            .replay();

        dataStructure.connect();

        String loadingText = "Loading...";

        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just(loadingText);

        versionLabelText = dataStructure
            .map(dataStructure -> Int(dataStructure.version).toString())
            .merge(Observable.just(loadingText);

I fire it up, still see one request go out, but then… I see my labels briefly flash with the loaded text, then go back to “Loading…”. What the fu…

If you think carefully about the last operator, the merge, well if the response comes in before we get there, we’re actually constructing a stream that consists first of the response-derived string, and then the text “Loading…”. So it’s doing what we told it to do. It’s just confusing. The replay operator, as I said, fires off the exact sequence of emitted items, in the order they were originally emitted. That’s what I’m seeing.

But wait… I’m not replaying the merged stream. I’m replaying the upstream event of the HTTP response. Now it’s not even clear to me what that means. I need to think about this… the dataStructure stream is a replay of the underlying stream that makes the request, emits the response, then maps it to the parsed object. That all happens almost instantaneously after I call connect. That one item gets cached, and when anyone subscribes, it loops through and emits the cached items, which is just that one. Then I merge this with a Just stream. What does Just mean, again? Well, that’s a stream that emits just the item given to it whenever you subscribe to it. Each subscriber gets that one item. Okay, and what does merge do? Well, the subscribe method of a merged stream subscribes to both the upstream observables used to build it, so that the subscriber gets triggered by either one’s emitted items. It has to subscribe to both in some order, and I guess it makes sense that it first subscribes to the stream on which merge was called, and then subscribes to the other stream passed in as a parameter.

So what’s happening is by the time I call subscribe on what happens to be a merged stream, it first subscribes to the replay stream, which already has a cached item and therefore immediately emits it to the subscriber. Then it subscribes to the Just stream, which immediately emits the loading text. Hence, I see the loaded text, then the loading text.

If I swapped the operands so that the Just is what I call merge on, and the mapped data structure stream is the parameter, then the order reverses. That’s scary. I didn’t even think to consider that the placement of those two in the call would matter.

Sigh… okay, I need to express that the loading text needs to always come before the loaded text. Instead of using merge, I need to use prepend. That makes sure all the events of the stream I pass in will get emitted before any events from the other stream:

    ConnectableObservable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
        .map(response -> parseResponse<DataStructure>(response, new DataStructure())
        .replay();

    dataStructure.connect();

    String loadingText = "Loading...";

    dataLabelText = dataStructure
        .map(dataStructure -> dataStructure.displayInfo)
        .prepend(Observable.just(loadingText);

    versionLabelText = dataStructure
        .map(dataStructure -> Int(dataStructure.version).toString())
        .prepend(Observable.just(loadingText);

Great, now the labels look right! But wait… I always see “Loading…” briefly flash on the screen. All the trouble I just dealt with derived from my dev server responding before my view gets created. I shouldn’t ever see “Loading…”, because by the time the labels are being drawn, the loaded text is available.

But the above explanation covers this as well. We’ve constructed a stream where every subscriber will get the “Loading…” item first, even if the loaded text comes immediately after. The prepend operator produces a cold stream. It always emits the items in the provided stream before switching to the one we’re prepending to.

The stream is still too cold. I don’t want the subscribers to always see the full sequence of items. If they come in late, I want them to only see the latest ones. But I don’t want the stream to be entirely hot either. That would mean if the subscribers comes in after the loaded text is emitted, they’ll never receive any events. I need to Goldilocks this stream. I want subscribers to only receive the last item emitted, and none before that. I need to move the replay up to the concatenated stream, and I need to specify that the cached items should never exceed a single one:

    Observable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
        .map(response -> parseResponse<DataStructure>(response, new DataStructure());

    dataStructure.connect();

    String loadingText = "Loading...";

    dataLabelText = dataStructure
        .map(dataStructure -> dataStructure.displayInfo)
        .prepend(Observable.just(loadingText)
        .replay(1);

    dataLabelText.connect();

    versionLabelText = dataStructure
        .map(dataStructure -> Int(dataStructure.version).toString())
        .prepend(Observable.just(loadingText)
        .replay(1);

    versionLabelText.connect();

Okay, there we go, the flashing is gone. Oh shit! Now two requests are going out again. By moving the replay up to after the stream bifurcated, each stream is subscribing and caching its item, so each one is triggering the HTTP response to get “produced”. Uggg… I have to keep that first replay to “share” the response to each derived stream and ensure each one gets it even if it came in before their own connect calls.

This is all the complexity we have to deal with to handle a simple Y-shaped network of streams to drive two labels on a user interface. Can you imagine building an entire even moderately complex app as an intricate network of streams, and have to worry about how “hot” or “cold” each edge in the stream graph is?

Is the honeymoon over yet?

All this highly divergent behavior is hidden behind a single interface called Observable, which intentionally obscures it from the users of the interface. When an object hands you an Observable to use in some way, you have no idea what kind of observable it is. That makes it difficult or impossible to track down or even understand why a system built out of reactive event streams is behaving the way it is.

This is the point where I throw up my hands and say wait wait wait… why am I trying to say an HTTP request is a stream of events? It’s not a stream of any sort. There’s only one response. How is that a “stream”? What could possibly make me think that’s the appropriate abstraction to use here?

Ohhh, I see… it’s asynchronous. Rx isn’t just a library with a powerful transformable event stream abstraction. It’s the “all purpose spray” of concurrency! Any time I ever need to do anything with a callback, which I put in because I don’t want to block a thread for a long-running process, apparently that’s a cue to turn it into an Observable and then have it spread to everything that gets triggered by that callback, and so on and so on. Great. I graduated from callback hell to Rx hell. I’ll have to consult Dante’s map to see if that moved me up a level or down.

In the next part, I’ll talk about whether any of the Rx stuff is worth salvaging, and why things went so off the rails.

REST SchmEST

On almost every project I’ve worked on, everyone has been telling themselves the web services are RESTful services. Most of them don’t really follow RESTful principles, at least not strictly. But the basic ideas of REST are the driving force of how the APIs are written. After working with them for years, and reading into what a truly RESTful interface is, and what the justification is, I am ready to say:

REST has no reason to exist.

It’s perfectly situated in a middle ground no one asked for. It’s too high-level to give raw access to a database to make arbitrary queries using an actual query language like SQL, and it’s too low-level to directly drive application activities without substantial amounts of request forming and response stitching and processing.

It features the worst of both worlds, and I don’t know what the benefit is supposed to be.

Well, let’s look at what the justification for REST is. Before RESTful services, the API world was dominated by “remote procedure call” (RPC) protocols, like XML-RPC, and later Simple Object Access Protocol (SOAP). The older devs I’ve worked with told horror stories about how painful it was to write requests to those APIs, and they welcomed the “simplicity” of REST.

The decision to use REST is basically the decision to not use an RPC protocol. However, if we look at the original paper for REST, it doesn’t mention RPC protocols a single time. It focuses on things like statelessness, uniformity, and cacheability. But since choosing this architectural style for an API is just as much about not choosing the alternative, the discussion began to focus on a SOAP vs. REST comparison.

On the wiki page for SOAP, it briefly mentions one justification in the comparison:

SOAP is less “simple” than the name would suggest. The verbosity of the protocol, slow parsing speed of XML, and lack of a standardized interaction model led to the dominance of services using the HTTP protocol more directly. See, for example, REST.

This point tends to come up a lot. For example, this page repeats the idea that by using the capabilities of HTTP “directly”, REST can get away with defining less itself, which makes it “simpler”. So it seems the idea is that protocols like SOAP add unnecessary complexity to APIs, by reinventing capabilities already contained in the underlying communication protocols. The RPC protocols were written to be application-layer agnostic. As such, they couldn’t take advantage of the concepts already present in HTTP. Once it became clear that all these RPC calls are being delivered over HTTP anyway, they’ve become redundant. We can instead simply use what HTTP gives us out of the box to design APIs.

See, for example, the answers on this StackOverflow question:

SOAP is a protocol on top of HTTP, so it bypasses a lot of HTTP conventions to build new conventions in SOAP, and is in a number of ways redundant with HTTP. HTTP, however, is more than sufficient for retreiving, searching, writing, and deleting information via HTTP, and that’s a lot of what REST is. Because REST is built with HTTP instead of on top of it, it also means that software that wants to integrate with it (such as a web browser) does not need to understand SOAP to do so, just HTTP, which has to be the most widely understood and integrated-with protocol in use at this point.

Bottom line, REST removes many of the most time-consuming and contentious design and implementation decisions from your team’s workflow. It shifts your attention from implementing your service to designing it. And it does so without piling gobbledygook onto the HTTP protocol.

It’s curious that this “SOAP reinvents what HTTP already gives us” argument did not appear in the original REST paper. It’s a bad argument, which leads directly to the no-man’s land between raw database access and high-level application interfaces.

HTTP does already gives us what we need to make resource-based requests to a server; in short, CRUD. They claim that a combination of the HTTP request method (GET, PUSH, DELETE, etc.), path elements, and query parameters, already do for us what the RPC protocols stuff into overcomplicated request bodies.

The problem with this argument is that what HTTP provides and what RPC provides are not the same, either in implementation or in purpose. Those features of HTTP (method, path, and query parameters) expose a much different surface than what RPC calls expose. RPC is designed to invoke code (procedure) on another machine, while HTTP is designed to retrieve or manipulate resources on another machine.

Somewhere along the line, the idea arose that REST tells you how to map database queries and transactions to HTTP calls. For example, a SELECT of a single row by ID from a table maps to a GET request with the table name being a path element, and the ID being the next (and last) path element. An INSERT with column-cell value pairs maps to a POST request, again with the table in the path, and this time no further path elements, and the value pairs as body form data.

This certainly didn’t come from the original paper, which doesn’t mention “database” a single time. It mentions resources. The notion most likely arose because REST was shoehorned into a replacement for RPC, used to build APIs that almost always sits on top of a database. If you’re supposed to “build your API server RESTfully”, and that server is primarily acting as a shell with a database at its kernel, then a reading of “REST principles”, which drives your API toward dumb data access (rather than execution of arbitrary code), will inevitably become a “HTTP -> SQL” dictionary.

The mapping covers basic CRUD operations on a database, but it’s not a full query language. Once you start adding WHERE clauses, simple ones may map to query parameters, but there’s no canonical way to do the mapping, and there’s no way to do more sophisticated stuff like subqueries. There’s not even a canonical way to select specific columns. Then there’s joining. Since neither selecting specific columns nor joining map to any HTTP concept, you’re stuck with an ORM style interface into the database, where you basically fetch entire rows and all their related rows all at once, no matter how much of that data you actually need.

The original paper specifically called out this limitation:

By applying the software engineering principle of generality to the component interface, the overall system architecture is simplified and the visibility of interactions is improved. Implementations are decoupled from the services they provide, which encourages independent evolvability. The trade-off, though, is that a uniform interface degrades efficiency, since information is transferred in a standardized form rather than one which is specific to an application’s needs. The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.

So, basically, this manifestation of “REST as querying through HTTP” gives us a lot less than what query languages already gave us, and nothing more. I’ll bet if you ask people why they picked REST over just opening SQL connections from the clients, they’ll say something about security, i.e. by supplying an indirect and limited interface to the database, REST allows you to build a sort of firewall in the server code on the way to actually turning the requests into queries. Well, you’re supposed to be able to solve that with user permissions. Either way, it’s nothing about the interface actually being nicer to work with or more powerful.

It’s only barely more abstract. You could probably make a fair argument it’s really not more abstract. REST is just a stunted query language. We can say this is a straw man, since the paper never said to do this. But unless the very suggestion to make application servers RESTful in general is a straw man (making the vast majority of “RESTful” services a misapplication of REST), I don’t see how it could have ended up any differently.

Claiming that this serves as a replacement for RPC fundamentally misunderstands what RPC protocols do. It’s in the name: remote procedure call. It’s a protocol to invoke a function on another machine. The protocol is there to provide a calling convention that works over network wires, just like a C compiler defines a calling convention for functions that works within a single process on a single machine. It defines how to select the function (by name), how to send the parameters, and how to receive the return value.

How much does HTTP help with this? Well, I guess you can put the class name, or function name (or both) into the URL path. But there’s no one way to do this that jumps out to me as obviously correct. The HTTP methods aren’t of much use (functions don’t, generally, correspond to the short list of HTTP methods), and query parameters are quite inappropriate for function parameters, which can be arbitrarily complex objects. Any attempt to take what SOAP does and move “redundant” pieces into HTTP mechanisms isn’t going to accomplish much.

We can, of course, send RPC calls over HTTP. The bulk, if not entirety, of the calling convention goes into the request body. By limiting HTTP to RESTful calls, we’re foregoing the advantage of the very difference between an API and direct querying: that it triggers arbitrary code on the server, not just the very narrowly defined type of work that a database can do. We can raise the abstraction layer and simply invoke methods on models that closely represent application concerns, and execute substantial amounts of business logic on top of the necessary database queries. To demand that APIs be “RESTful” is to demand that they remain mere mappings to dumb data accesses, the only real difference being that we’re robbed of a rich query language.

What you get is clients forced to do all the business logic themselves, and make inefficient coarse fetches of entire rows or hierarchies of data, or even worse a series of fetches, each incurring an expensive round trip to the server, to stitch together a JOIN on their side. When people start noticing this, they’ll start breaking the REST rules. They’ll design an API a client really wants, which is to handle all that business logic on the server, and you end up with a /api/fetchStuffForThisUseCase endpoint that has no correspondence whatsoever to a database query (the path isn’t the name of any table, and the response might be a unique conglomerate and/or mapping of database entities into a different structure).

That’s way better… and it’s exactly what this “REST as HTTP -> SQL” notion tries to forbid you from doing.

The middle of the road is, as usual, the worst place to be. If the client needs to do its own queries, just give it database access, and don’t treat REST as a firewall. It’s very much like the fallacy of treating network address translation as being a firewall. Even if it can serve that role, it’s not designed for it, and there are much better methods designed specifically for security. Handle security with user permissions. If you’re really concerned with people on mobile devices MITM attacking your traffic and seeing your SQL queries, have your clients send those queries over an end-to-end encrypted request.

If you’re a server/database admin and you’re nervous about your client developers writing insane queries, set up a code review process to approve any queries that go into client code.

If the client doesn’t need to do the business logic itself, and especially if you have multiple clients that all need the same business logic, then implement it all on the server and use an RPC protocol to let your clients invoke virtual remote objects. You’re usually not supposed to hand-write RPC calls anyways. That’s almost as a bad as handwriting assembly with a C compiler spec’s calling convention chapter open in your lap. That can all be automated, because the point of RPC protocols is that they map directly to function or method calls in a programming language. You shouldn’t have to write the low-level stuff yourself.

What, then, is the purpose of the various features of HTTP? Well, again it’s in the name: hypertext transfer protocol. HTTP was designed to enable the very first extremely rudimentary websites on the very first extremely rudimentary internet to be built and delivered. It’s designed to let you stick HTML files somewhere on a server, in such a way they can be read back from the same location where you stuck them, to update or delete them later, and to embed links among them in the HTML.

The only reason we’re still using HTTP for APIs is the same reason we’re still using HTML for websites: because that’s the legacy of the internet, and wholesale swapping to new technology is hard. Both of them completely outdated and mostly just get in our way now. Most internet traffic isn’t HTML pages anymore, it’s APIs, and it’s sitting on top of a communication protocol built for the narrow purpose of sticking HTML (to be later downloaded directly, like a file) onto servers. Even TCP is mostly just getting in our way, which is why it’s being replaced by things like QUIC. There’s really no reason to not run APIs directly on the transport protocol.

Rather than RPC, it’s really HTTP that’s redundant in this world.

Even for the traffic that is still HTML, it’s mostly there to bootstrap Javascript, and act as the final delivery format for views. Having HTML blend the concepts of composing view components and decorating text with styling attributes in, I believe, outdated and redundant in the same way.

To argue that APIs need to use such a protocol, to the point of being restricted to that protocol’s capabilities, makes no sense.

The original REST paper never talked about trying to map HTTP calls to database queries or transactions. Instead it focused on resources, which more closely correspond to (but don’t necessarily have to be) the filesystem on the server (hence why they are identified with paths in the URL). It doesn’t even really talk about using query parameters. The word “query” appears a single time, and not in the context of using them to do searches.

The driving idea of REST is more about not needing to do searches at all. Searches are ways to discover resources. But the “RESTful” way to discover resources is to first retrieve a resource you already know about, and that tells you, with a list of hyperlinks, what other resources exist and where to find them. If we were strict, a client using a RESTful API would have to crawl it, following hyperlinks to build up the data needed to drive a certain activity.

The central justifications of REST (statelessness, caching and uniformity) aren’t really ever brought up much in API design discussions… well, caching is, and I’ll get to that. RPC protocols can be as stateless (or stateful) as you want, so REST certainly isn’t required to achieve statelessness. Nor does sticking to a REST-style interface guarantee statelessness. Uniformity isn’t usually a design requirement, and since it comes at the cost of inefficient whole-row (or more) responses, it usually just causes problems.

That leaves caching, the only really valid reason I can see to make an API RESTful. However, the way REST achieves caching is basically an example of its “uniformity”: it sticks to standard HTTP mechanisms, which you get “for free” on almost any platform that implements HTTP, but it comes at the cost of being very restricted. For it to work, you have to express your query as an HTTP GET, with the specific query details encoded as query parameters. As I’ve mentioned, there’s not really a good way to handle complex queries like this.

Beside, what does HTTP caching do? It tells the client to reuse the previous response for a certain time range, and then send a request with an ETag to let the server respond with a 304 and save bandwidth by not resending an identical body. The first part, the cache age, can be easily done an any system. All you need is a client-side cache with a configurable age. The second part… well what does the server do when it gets a request with an ETag? Unless the query is a trivial file or row lookup, it has to generate the response and then compute the ETag and compare it. For example, any kind of search or JOIN is going to require the server to really hit the database to prove whether the response will change.

So, what are you really saving by doing this? Some bandwidth in the response. In most cases, that’s not the bottleneck. If we’re talking about a huge search that returns thousands (or more) of entities, and your customers are mostly using your app on slow cell networks in remote locations, then… sure, saving that response bandwidth is a big deal. But the more typical use case is that response bodies are small, bandwidth is plentiful, but the cloud resources needed to compute the response and prove it’s unchanged are what’s scarce.

You’ll probably do a much better job optimizing the system in this way by making sure you’re only requesting exactly what you need… the very capability that you lose with REST. This even helps with the response body size, which is going to be way bigger if you’re returning entire rows when all you need is a couple column values. Either that, or the opposite, where you basically dump entire blobs of the database onto the client so that it can do its querying locally, and just periodically asks for diffs (this also enables offline mode).

Again, the caching that REST gives us is a kind of no-man’s land middle ground that is suboptimal in all respects. It is, again, appropriate only for the narrow use case of hypertext driven links to resource files in a folder structure on a server that are downloaded “straight” (they aren’t generated or modified as they’re being returned).

The next time I have the authority to design an API, there’s either not going to be one (I’ll grant direct database access to the clients), or it will be a direct implementation of high-level abstract methods that can be mapped directly into an application’s views, and I’ll pick a web framework that automates the RPC aspect to let me build a class on one machine and call its methods from another machine. Either way, I’ll essentially avoid “designing” an API. I’ll either bypass the need and just use a query language, or I’ll write classes in an OOP language, decide where to slice them to run on separate machines, and let a framework write the requisite glue.

If I’m really bold I might try to sell running it all directly on top of TCP or QUIC.