September 17, 2023 – Agile Design

Introduction

By now, pretty much every major modern OOP language has gained async/await. Java is the odd man out (they have a very different way of handling lightweight concurrency).

So, you go on a rampage, async-ifying all the things, and basking in the warm rays of the much more readable and naturally flowing fruits of your labor. Everything is fine, everything is good.

Then you get to your tests.

And suddenly you’re having lots of problems.

What’s the issue, exactly? Well, it isn’t async code per se. In fact, usually the test frameworks that are used with each language support test methods that are themselves async. If you want to test an async method, simply make the test method async, call the async method you’re testing, and await the result. You end up with what is functionally the same test as if everything were synchronous.

But what if the async code you want to test isn’t awaitable from the test?

Async vs. Await

This is a good time to review the paradox that async and await are almost antonyms. Asynchrony allows for concurrency: by not synchronizing two events, other stuff is able to happen between the two events. But adding async to code just allows you to await it. “Waiting” means to synchronize with the result: don’t continue executing below the await until the result is ready. What exactly is “asynchronous” about that?

Let’s look at an example async method:

func doSomeStuff() async -> String {

  let inputs = Inputs(5, "hello!", getSomeMoreInputs())

  let intermediateResult = await processTheInputs()

  let anotherIntermediateResult = await checkTheResultForProblems()

  logTheResults(anotherIntermediateResult)

  await save(result: anotherIntermediateResult)

  return "All done!"
}

Asynchrony appears in three places (where the awaits are). And yet this code is utterly sequential. That’s the whole point of async-await language features. Writing this code with callbacks, while functionally equivalent, obscures the fact that this code all runs in a strict sequence, with the output of one step being used as the input to later steps. Async-await restores the natural return-value flow of code while also allowing it to be executed in several asynchronous chunks.

So then why bother making stuff async and awaiting it? Why not just make the functions like processTheInputs synchronous? After all, when you call a regular old function, the caller waits for the function to return a result. What’s different?

The answer is how the threads work. If the function were synchronous, one thread would execute this from beginning to end, and not do anything else in the process. Now if the functions like processTheInputs are just crunching a bunch of CPU instructions, this makes sense, because the thread keeps the CPU busy with something throughout the entire process. But if the function is asking some other thread, possibly in some other process or even on another machine, to do something for us, our thread has nothing to do, so it has to sit there and wait. You typically do that, ultimately, by using a condition variable: that tells the operating system this thread is waiting for something and not to bother giving it a slice of CPU time.

This doesn’t waste CPU resources (you aren’t busy waiting), but you do waste other resources, like memory for the thread’s stack. Using asynchrony lets multiple threads divide and conquer the work so that none of them have to sit idle. The await lets the thread go jump to other little chunks of scheduled work (anything between two awaits in any async function) while the result is prepared somewhere else.

Okay, but this is a rather esoteric implementation detail about utilizing resources more efficiently. The code is logically identical to the same method with the async and awaits stripped out… that is, the entirely synchronous flavor of it.

So then why does simply marking a method as async, and replacing blocking operations that force a thread to go dormant with awaits that allow the thread to move onto something else, and a (potentially) different thread to eventually resume the work, suddenly make testing a problem?

It doesn’t. Just add async to the test method, and add await to the function call. Now the test is logically equivalent to the version where everything is synchronous.

The problem is when we introduce concurrency.

The Problem

How do we call async code? Well, if we’re already in an async method, we can just await it. If we’re not in an async method, what do we do?

It depends on the language, but in some way or another, you start a background task. Here, “background” means concurrency: something that happens “in the background”, meaning it hums along without interfering with whatever else you want to do in the mean time. In .NET, this means either calling an async function that returns a Task but not awaiting it (which causes the part of the task before the first suspension to run synchronously), or calling Task.Run(...) (which causes the entire part of the task to run concurrently). In Swift, it means calling Task {...}. In Kotlin, it means calling launch { ... }. In Javascript, similar to .NET, you call an async function that returns a Promise but don’t await it, or construct a new Promise that awaits the async function then resolves itself (and, of course, don’t await this Promise).

That’s where the trouble happens.

This is how you kick off async code from a sync method. The sync method continues along, while the async code executes concurrently. The sync method does not wait for the async code to finish. The sync code can, and often will, finish before the async code does. We can also kick off concurrent async work in an async method, the exact same way. In this case we’re allowed to await the result of this concurrent task. If we do, then the outer async function will only complete after that concurrent work completes, and so awaiting the outer function in a test will ensure both streams of work are finished. But if the outer function only spawns the inner task and doesn’t await it, the same problem exists: the outer task can complete before the inner task does.

Let’s look at an example. You create a ViewModel for a screen in your app. As soon as it gets created, in the init, it spawns a concurrent task to download data from your server. When the response comes in, it is saved to a variable that the UI can read to build up what it will display to the user. Before it’s ready, we show a spinner:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() {
    Task {
      let (data, response) = try await urlSession.data(for: dataRequest)

      try await MainActor.run {
        self.results = try JSONDecoder().decode([String].self, from: data)
      }
    }
  }
}

struct MyView: View {
  @ObservedObject var viewModel: MyViewModel = .init()

  var body: some View {
    if let results = model.result {
      ForEach(results, id: \.self) { result in 
        Text(result)
      }
    } else {
      ProgressView()
    }
  }
}

You want to test that the view model loads the results from the server and stores them in the right place:

final class MyViewModelTests: XCTestCase {
  func testLoadData() {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

You run this test, and it passes… sometimes. Sometimes it fails, complaining that viewModel.results is nil. It’s actually kind of surprising that it ever passes. The mock server response you set up can be fetched almost instantaneously, it doesn’t have to actually call out to a remote server, so the Task you spin off the init completes in a few microseconds. It also takes a few microseconds for the thread running testLoadData to get from the let viewModel = ... line to the XCTAssertEqual line. The two threads are racing against each other: if the Task inside the init wins, the test passes. If not, it fails, and viewModel.results will be nil because that’s the initial value and it didn’t get set by the Task yet.

Do we fix this by making things async? Let’s do this:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() async throws {
    let (data, response) = try await urlSession.data(for: dataRequest)

    try await MainActor.run {
      self.results = try JSONDecoder().decode([String].self, from: data)
    }
  }
}

...

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = try await MyViewModel()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

Now it passes every time, and it’s clear why: it’s now the init itself that awaits the work to set the results. And the test is able to (in fact it must) await the init, so we’re sure this all completes before we can get past the let viewModel = ... line.

But the view no longer compiles. We’re supposed to be able to create a MyView(), which creates the default view model without us specifying it. But that init is now async. We would have to make an init() async throws for MyView as well. But MyView is part of the body of another view somewhere, and that can’t be async and so can’t await this init.

Plus, this defeats the purpose of the ProgressView. In fact, since results is set in the init, we can make it a non-optional let, never assigning it to nil. Then the View will always have results and never need to show a spinner. That’s not what we want. Even if we push the “show a spinner until the results are ready” logic outside of MyView, we have to solve that problem somewhere.

This is a problem of concurrency. We want the spinner to show on screen concurrent with the results being fetched. The problem isn’t the init being async per se, it’s that the init awaits the results being fetched. We can keep the async on the init but we need to make the fetch concurrent again:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() async throws {
    Task {
      let (data, response) = try await urlSession.data(for: dataRequest)

      try await MainActor.run {
        self.results = try JSONDecoder().decode([String].self, from: data)
      }
    }
  }
}

...

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = try await MyViewModel()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

Now we’re back to the original problem. The async on the init isn’t helping. Sure we await that in the test, but the thing we actually need to wait for is concurrent again, and we get the same flaky result as before.

This really has nothing to do with the async-await language feature per se. The exact same problem would have arisen had we achieved our desired result in a more old school way:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init() {
    DispatchQueue.global().async { // This really isn't necessary, but it makes this version more directly analogous to the async-await one
            
      URLSession.shared.dataTask(with: .init(url: .init(string: "")!)) { data, response, _ in
        guard let data, let response else {
          return
        }
                
        DispatchQueue.main.async {
          self.results = try? JSONDecoder().decode([String].self, from: data)
        }
      }
    }
  }
}

There’s still no way for the test to wait for that work thrown onto a background DispatchQueue to finish, or for the work to store the result thrown back onto the main DispatchQueue to finish either.

So what do we do?

The Hacky “Solution”

Well, the most common way to deal with this is ye old random delay.

We need to wait for the spun off task to finish, and the method we’re calling can finish slightly earlier. So after the method we call finishes, we just wait. How long? Well, for a… bit. Who knows. Half a second? A tenth of a second? It depends on the nature of the task too. If the task tends to take a couple seconds, we need to wait a couple seconds.

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    try await Task.sleep(0.2) // That's probably long enough!

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

(Note that Task.yield() is just another flavor of this, it’s just causing the execution to pause for some indeterminate brief amount of time)

To be clear, this “solves” the problem no matter what mechanism of concurrency we decided to use: async-await, dispatch queues, run loops, threads… doesn’t matter.

Whatever the delay is, you typically discover it by just running the test and seeing what kind of delay makes it usually pass. And that works… until random fluctuations in the machine running the tests make it not work, and you have to bump the delay up slightly.

That, ladies and gentlemen, is how you get brittle tests.

Is This a Solution?

The problem is that by adding a delay and then going ahead with our assertions as though the task completed in time, we’re encoding into our tests a requirement that the task complete within the time we chose. The time we chose was utterly random and tuned to make the test pass, so it’s certainly not a valid requirement. You don’t want tests that inadvertently enforce nonexistent requirements.

I’ve heard some interesting takes on this. Well, let’s think about the requirements. Really, there is a requirement that mentions a time window. After all, the product owner wouldn’t be happy if this task completed 30 minutes after the sync method (the “when” of the scenario) got triggered. The solution, according to this perspective, is to sit down with the product owner, work out the nonfunctional requirements regarding timing constraints (that this ought to finish in no more than some amount of time), and voilà: there’s your delay amount, and now your tests are enforcing real requirements, not made up ones.

But hold on… why must there be a nonfunctional requirement about timing? This is about a very technical detail in code that concerns how threads execute work on the machine, and whether it’s possible for the test method to know exactly when some task that got spun off has finished. Why does this implementation detail imply that a NFR about timing exists? Do timing NFRs exist for synchronous code? After all, nothing a computer does is instantaneous. If this were true, then all requirements, being fundamentally about state changes in a computer system, would have to mention something about the allowed time constraints for how long this state change can take.

Try asking the product owner what the max allowed time should be. Really, ask him. I’ll tell you what his answer will be 99% of the time:

“Uhh, I don’t know… I mean, as short as it can be?”

Exactly. NFRs are about tuning a system and deciding when to stop optimizing. Sure, we can make the server respond in <10ms, but it will take a month of aggressive optimization. Not worth it? Then you’ll get <100ms. The reason to state the NFR is to determine how much effort needs to be spent obtaining realistic but nontrivial performance.

In the examples we’re talking about with async methods, there’s no question of optimization. What would that even mean? Let’s say the product owner decides the results should be ready in no more than 10 seconds. Okay, first of all, that means this test has to wait 10 seconds every time it runs! The results will actually be ready in a few microseconds, but instead every test run costs 10 seconds just to take this supposed NFR into account. It would be great if the test could detect that the results came in sooner that the maximum time and go ahead and complete early. But if we could solve that problem, we’d solve the original problem too (the test knowing when the results came in).

Even worse, what do we do with that information? The product owner wants the results in 10ms, but the response is large and it takes longer for the fastest iPhone to JSON decode it. What do we do with this requirement? Invent a faster iPhone?

Fine, then the product owner will have to take the limitations of the system into account when he picks the NFR. Well now we’re just back to “it should be as fast as it reasonably can”. So, like, make the call as early as possible, then store the results as soon as they come in.

The Real Requirements

The requirement, in terms of timing, is quite simply, that the task get started immediately, and that it finish as soon as it can, which means that the task only performs what it needs to, and nothing else.

These are the requirements we should be expressing in our tests. We shouldn’t be saying “X should finish in <10 s”, we should be saying “X should start now, and X should only do A, B and C, and nothing else”.

The second part, that code should only do something, and not do anything else, is tricky because it’s open-ended. How do you test that code only does certain things? Generally you can’t, and don’t try to. But that’s not the thing that’s likely to “break”, or what a test in a TDD process needs to prove got implemented.

The first part… that’s what our tests should be looking for… not that this concurrent task finishes within a certain time, but that it is started immediately (synchronously!). We of course need to test that the task does whatever it needs to do. That’s a separate test. So, really, what starts off as one test:

Given some results on the server
When we create a view model
Then the view model’s results should be the server’s results

Ends up as two tests:

When we create the view model
Then the “fetch results” task should be started with the view model

Given some results on the server
And the “fetch results” task is running with the view model
When the “fetch results” finishes
Then the view model’s results should be the server’s results

I should rather say this ended up as two requirements. Writing two tests that correspond exactly to these two tests is a more advanced topic that I will talk about another time so it doesn’t distract too much from the issue here.

For now let’s just try to write a test that indeed tests both of these requirements together, but still confirms that the task does start and eventually finish, doing what it is supposed to do in the process, without having to put in a delay.

Solution 1: Scheduler Abstraction

The fundamental problem is that a async task is spun off concurrently, and the test doesn’t have access to it. How does that happen? By initializing a Task. This is, effectively, how you schedule some work to run concurrently. By writing Task.init(...), we are hardcoding this schedule call to the “real” scheduling of creating a Task. If we can abstract this call, then we can substitute test schedulers that allow us more capabilities, like being able to see what was scheduled and await all of it.

Let’s look at the interface for Task.init:

struct Task<Success, Failure: Error> {

}

extension Task where Failure == Never {

  @discardableResult
  init(
    priority: TaskPriority? = nil, 
    operation: @Sendable @escaping () async -> Success
  )
}

extension Task where Failure == Error {

  @discardableResult
  init(
    priority: TaskPriority? = nil, 
    operation: @Sendable @escaping () async throws -> Success
  )
}

There are actually some hidden attributes that you can see if you look at source code for this interface. I managed to catch Xcode admitting to me once that these attributes are there, but I can’t remember how I did it and can no longer reproduce it. So really this is the interface:

extension Task where Failure == Never {

  @discardableResult
  @_alwaysEmitIntoClient
  init(
    priority: TaskPriority? = nil, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  )
}

extension Task where Failure == Error {

  @discardableResult
  @_alwaysEmitIntoClient
  init(
    priority: TaskPriority? = nil, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  )
}

Okay, so let’s write an abstract TaskScheduler type that presents this same interface. Since protocols don’t support default parameters we need to deal with that in an extension:

protocol TaskScheduler {
  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never>

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error>
}

extension TaskScheduler {
  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never> {
    schedule(priority: nil, operation: operation)
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error> {
    schedule(priority: nil, operation: operation)
  }
}

Now we can write a DefaultTaskScheduler that just creates the Task and nothing else:

struct DefaultTaskScheduler: TaskScheduler {
  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never> {
    .init(priority: priority, operation: operation)
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error> {
    .init(priority: priority, operation: operation)
  }
}

And we can introduce this abstraction into MyViewModel:

@MainActor
final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init(taskScheduler: some TaskScheduler = DefaultTaskScheduler()) {
    taskScheduler.schedule {
      let (data, response) = try await urlSession.data(for: dataRequest)

      self.results = try JSONDecoder().decode([String].self, from: data)
    }
  }
}

Now in tests, we can write a RecordingTaskScheduler decorator that records all tasks scheduled on some other decorator, and lets us await them all later. In order to do that, we need to be able to store tasks with any Success and Failure type. Then we need to be able to await them all. How do we await a Task? By awaiting its value:

extension Task {
  public var value: Success {
    get async throws
  }
}

extension Task where Failure == Never {
  public var value: Success {
    get async
  }
}

In order to store all running Tasks of any success type, throwing and non-throwing, and to be able to await their values, we need a protocol that covers all those cases:

protocol TaskProtocol {
  associatedtype Success

  public var value: Success {
    get async throws
  }
}

extension Task: TaskProtocol {}

Now we can use this in RecordingTaskScheduler:

final class RecordingTaskScheduler<Scheduler: TaskScheduler>: TaskScheduler {
  private(set) var runningTasks: [any TaskProtocol] = []

  func awaitAll() async throws -> {
    // Be careful here.  While tasks are running, they may schedule more tasks themselves.  So instead of looping over runningTasks, we need to keep repeatedly pull off the next one if it's there.
    while !running tasks.isEmpty {
      let task = runningTasks.removeFirst()
      _ = try await task.value
    }
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async -> Success
  ) -> Task<Success, Never> {
    let task = scheduler.schedule(priority: priority, operation: operation)
    runningTasks.append(task)
    return task
  }

  @discardableResult
  @_alwaysEmitIntoClient
  func schedule<Success>(
    priority: TaskPriority?, 
    @_inheritActorContext @_implicitSelfCapture operation: __owned @Sendable @escaping () async throws -> Success
  ) -> Task<Success, Error> {
    let task = scheduler.schedule(priority: priority, operation: operation)
    runningTasks.append(task)
    return task
  }

  init(scheduler: Scheduler) {
    self.scheduler = scheduler
  }

  let scheduler: Scheduler
}

extension TaskScheduler {
  var recorded: RecordingTaskScheduler<Self> {
    .init(scheduler: self)
  }
}

You probably want to make that runningTasks state thread safe.

Now we can use this in the test:

final class MyViewModelTests: XCTestCase {
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let taskScheduler = DefaultTaskScheduler().recorded
    let viewModel = MyViewModel(taskScheduler: taskScheduler)

    try await taskScheduler.awaitAll()

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

We’ve replaced the sleep for an arbitrary time with a precise await for all scheduled tasks to complete. Much better!

Now, there are other ways to schedule concurrent work in Swift. Initializing a new Task is unstructured concurrency: creating a new top-level task that runs independently of any other Task, even if it was spawned an async function being run by another Task. The other ways to spawn concurrent work are the structured concurrency APIs: async let and with(Throwing)TaskGroup. Do we need to worry about these causing problems in tests?

No, we don’t. The consequence of the concurrency being structured is that these tasks spawned inside another Task are owned by the outer Task (they are “child tasks” of that “parent task”). This primarily means parent tasks cannot complete until all of their child tasks complete. It doesn’t matter if you explicitly await these child tasks or not. The outer task will implicitly await for all that concurrent work at the very end (before the return) if you didn’t explicitly await it earlier than that. Thus, as long as the top-level Task that owns all these child tasks is awaitable in the tests, then doing so will await all those concurrent child tasks as well.

It’s only the unstructured part of concurrency we need to worry about. That is handled by Task.init, and our TaskScheduler abstraction covers it.

(It’s becoming popular to claim that “unstructured concurrency is bad” and that you should replace all instances of it with structured concurrency, but this doesn’t make sense. Structured concurrency might very well be called structured awaiting. When you actually don’t want one thing to await another, i.e. genuine concurrency, unstructured concurrency is exactly what you need. The view model where we made init async throws is an example: it’s not correct to use async let to kick off the fetch work, because that causes init to await that child task, and destroys the very concurrency we’re seeking to create.)

Things look pretty similar in other platforms/frameworks, with some small caveats. In Kotlin, the way to spawn concurrent work is by calling CoroutineScope.launch. There’s a global one available, but many times you need to launch coroutines from other scope. This is nice because this is already basically the abstraction we need. Just make it configurable in tests, and make a decorator for CoroutineScope that additionally records the launched coroutines and lets you await them all. You might even be able to do this by installing a spy with mockk.

In .NET, the equivalent to Task.init in Swift is Task.Run:

void SpawnATask() {
  Task.Run(async () => { await DoSomeWork(); });
}

Task DoSomeWork() async {
  ...
}

Task.Run is really Task.Factory.StartNew with parameters set to typical defaults. Whichever one you need, you can wrap it in a TaskScheduler interface. Let’s assume Task.Run is good enough for our needs:

interface ITaskScheduler {
  Task Schedule(Func<Task> work);
  Task<TResult> Schedule<TResult>(Func<TResult> work);
  Task<TResult> Schedule<TResult>(Func<Task<TResult>> work);
}

struct DefaultTaskScheduler: ITaskScheduler {
  Task Schedule(Func<Task> work) {
    Task.Run(work);
  }

  Task<TResult> Schedule<TResult>(Func<TResult> work) {
    Task.Run(work);
  }

  Task<TResult> Schedule<TResult>(Func<Task<TResult>> work) {
    Task.Run(work);
  }
}

Then we replace naked Task.Run with this abstraction:

void SpawnATask(ITaskScheduler scheduler) {
  scheduler.Schedule(async () => { await DoSomeWork(); });
}

And similarly we can make a RecordingTaskScheduler to allow tests to await all scheduled work:

sealed class RecordingTaskScheduler: ITaskScheduler {
  public IImmutableQueue<Task> RunningTasks { get; private set; } = ImmutableQueue.Empty;

  Task AwaitAll() async {
    while !RunningTasks.IsEmpty {
      Task task = RunningTasks.Dequeue();
      await task;
    }
  }

  Task Schedule(Func<Task> work) {
    var task = _scheduler.schedule(work);
    RunningTasks = RunningTasks.Enqueue(task);
    return task;
  }

  Task<TResult> Schedule<TResult>(Func<TResult> work) {
    var task = _scheduler.schedule(work);
    RunningTasks = RunningTasks.Enqueue(task);
    return task;
  }

  Task<TResult> Schedule<TResult>(Func<Task<TResult>> work) {
    var task = _scheduler.schedule(work);
    RunningTasks = RunningTasks.Enqueue(task);
    return task;
  }

  RecordingTaskScheduler(ITaskScheduler scheduler) {
    _scheduler = scheduler;
  }

  static RecordingTaskScheduler Recorded(this ITaskScheduler scheduler) {
    new(scheduler);
  }

  private ITaskScheduler _scheduler;
}

Because in C#, generic classes are generally created as subclasses of a non-generic flavor (Task<TResult> is a subclass of Task), we don’t have to do any extra work to abstract over tasks of all result types.

There’s a shark in the water, though.

Async/await works a little differently in C# than in Swift. Kotlin is Swiftlike, while Typescript and C++ are C#-like in this regard.

In C# (and Typescript and C++), there aren’t really two types of functions (sync and async). The async keyword is just a request to the compiler to rewrite your function to deal with the awaits and return a special type, like Task or Promise (you have to write your own in C++). And correspondingly, an async function can’t just return anything, it has to return one of these special types. But that’s all that’s different. Specifically, you can call async functions from anywhere in these languages. What you can’t do in non-async functions is await. You can call that async function from a non-async function, you just can’t await the result, which is always going to be some awaitable type like Task or Promise.

Long story short, you can do this:

void SpawnATask() {
  DoSomeWork();
}

Task DoSomeWork() async {
  ...
}

There’s a slight difference in how this executes, compared to wrapping it in Task.Run, but I certainly hope you aren’t writing code that depends in any way on that difference. So you should be able to wrap it in Task.Run, and then change that to scheduler.Schedule.

But before you make that change, this is a sneaky little ninja lurking around in your codebase. It’s really easy to miss. If you’re running a test, it’s failing clearly due to not waiting long enough, and you’re going crazy because you searched for every last Task.Run (or Task. in general) in your code base, the culprit can be one of these crypto task constructors that you’d never even notice is spawning concurrent work. Just keep that in mind. Again, it should be fine to wrap it in scheduler.Schedule.

This isn’t a thing in Swift/Kotlin because they do async/await differently. In those languages there are two types of functions, and you simply aren’t allowed to call async functions from non-async ones. You have to explicitly call something like Task.init to jump from one to another.

A Not So Good Solution

This is not a new problem. I showed earlier that we can handle the concurrency with DispatchQueue. Similarly, and I’ve done this plenty of times, you would write an abstraction that captures work scheduled on global queues so the test can synchronize with it…

…well, no that’s not exactly what I did. I did something a little different. First I made a protocol so that I can customize what happens when I dispatch something to a queue:

protocol DispatchQueueProtocol {
  func async(
    group: DispatchGroup?, 
    qos: DispatchQoS, 
    flags: DispatchWorkItemFlags, 
    execute work: @escaping @convention(block) () -> Void
  )
}

extension DispatchQueueProtocol {
  // Deal with default parameters
  func async(
    group: DispatchGroup? = nil, 
    qos: DispatchQoS = .unspecified, 
    flags: DispatchWorkItemFlags = [], 
    execute work: @escaping @convention(block) () -> Void
  ) {
    // Beware of missing conformance and infinite loops!
    self.async(
      group: group,
      qos: qos,
      flags: flags,
      execute: work
    )
  }
}

extension DispatchQueue: DispatchQueueProtocol {}

Just as with the scheduler, the view model takes a dispatch queue as an init parameter:

final class MyViewModel: ObservableObject {
  @Published
  var results: [String]?

  init(
    backgroundQueue: some DispatchQueueProtocol = DispatchQueue.global(),
    mainQueue: some DispatchQueueProtocol = DispatchQueue.main
  ) {
    backgroundQueue.async {
            
      URLSession.shared.dataTask(with: .init(url: .init(string: "")!)) { data, response, _ in
        guard let data, let response else {
          return
        }
                
        mainQueue.async {
          self.results = try? JSONDecoder().decode([String].self, from: data)
        }
      }
    }
  }
}

Then I defined a test queue that didn’t actually queue anything, it just ran it outright:

struct ImmediateDispatchQueue: DispatchQueueProtocol {
  func async(
    group: DispatchGroup?, 
    qos: DispatchQoS, 
    flags: DispatchWorkItemFlags, 
    execute work: @escaping @convention(block) () -> Void
  ) {
    work()
  }
}

And if I supply this queue for both queue parameters in the test, then it doesn’t need to wait for anything. I have, in fact, removed concurrency from the code entirely. That certainly solves the problem the test was having!

This is often the go-to solution for these kinds of problems: if tests are intermittently failing because of race conditions, just remove concurrency from the code while it is being tested. How would you do this with the async-await version? You need to be able to take control of the “executor”: the underlying object that’s responsible for running the synchronous slices of async functions between the awaits. The default executor is comparable to DispatchQueue.global, it uses a shared thread pool to run everything. You would replace this with something like an ImmediateExecutor, which just runs the slice in-line. That causes “async” functions to become synchronous.

Substituting your own executor is possible in C# and Kotlin. It’s not in Swift. They made one step in this direction in 5.9, but they’re still working on it.

However, even if it was possible, I don’t think it’s a good idea.

Changing the underlying execution model from asynchronous to synchronous significantly changes what you’re testing. Your test is running something that is quite different than what happens in prod code. For example, by making everything synchronous and running in a single thread, everything becomes thread safe by default (not necessarily reentrant). If there are any thread safety issues with the “real” version that runs concurrently, your test will be blind to that. On the opposite side, you might encounter deadlocks by running everything synchronously that don’t happen when things run concurrently.

It’s just not as accurate as a test as I’d like. I want to exercise that concurrent execution.

That’s why I prefer to not mess with how things execute, and just record what work has been scheduled so that the test can wait on it. This is a little more convoluted to get working in the DispatchQueue version, but we can do it:

final class RecordingDispatchQueue<Queue: DispatchQueueProtocol>: DispatchQueueProtocol {
  private(set) var runningTasks: [DispatchGroup] = []

  func waitForAll(onComplete: @escaping () -> Void) {
    let outerGroup = DispatchGroup() 
    while let innerGroup = runningTasks.first {
      runningTasks.removeFirst()
      outerGroup.enter()
      innerGroup.notify(queue: .global(), execute: outerGroup.leave)
    }
    
    outerGroup.notify(queue: .global(), execute: onComplete)
  }

  func async(
    group: DispatchGroup?, 
    qos: DispatchQoS, 
    flags: DispatchWorkItemFlags, 
    execute work: @escaping @convention(block) () -> Void
  ) {
    let group = DispatchGroup()
    group.enter()

    queue.async {
      work()
      group.leave()
    }

    runningTasks.append(group)
  }

  init(queue: Queue) {
    self.queue = queue
  }

  private let queue: Queue
}

extension DispatchQueueProtocol {
  var recorded: RecordingDispatchQueue<Self> {
    .init(queue: self)
  }
}

Then in the tests:

final class MyViewModelTests: XCTestCase {
  func testLoadData() {
    let mockResults = setupMockServerResponse()

    let backgroundQueue = DispatchQueue.global().recorded
    let mainQueue = DispatchQueue.main.recorded

    let viewModel = MyViewModel(
      backgroundQueue: backgroundQueue,
      mainQueue: mainQueue
    )

    let backgroundWorkComplete = self.expectation(description: "backgroundWorkComplete")
    let mainWorkComplete = self.expectation(description: "mainWorkComplete")

    backgroundQueue.waitForAll(onComplete: backgroundWorkComplete.fulfill)
    mainQueue.waitForAll(onComplete: mainWorkComplete.fulfill)

    waitForExpectations(timeout: 5.0, handler: nil)

    XCTAssertEqual(viewModel.results, mockResults)
  }
}

A lot less pretty than async-await, but functionally equivalent.

Solution 2: Rethink the Problem

Abstracting task scheduling gives us a way to add a hook for tests to record scheduled work and wait for all of it to complete before making its assertions. Instead of just randomly waiting and hoping it’s long enough for everything to finish, we expose the things we’re waiting for so we can known when they’re done. This solves the problem we had of the test needing to know how long to wait… but are we thinking about this correctly?

Why does the test need to be aware that tasks are being spawned and concurrent work is happening? Does the production code that uses the object we’re testing need to know all of that? It didn’t look that way. After all we started with working production code where no scheduling abstraction was present, the default scheduling mechanism (like Task.init) was hardcoded inside the private implementation details of MyViewModel, and yet… everything worked. Specifically, the object interacting with the MyViewModel, MyView, didn’t know and didn’t care about any of this.

Why, then, do the tests need to care? After all, why in general do tests need to know about private implementation details? And it is (or at least was, before we exposed the scheduler parameter) a totally private implementation detail than any asynchronous scheduling was happening at all.

What were we trying to test? We wanted to test, basically, that our view shows a spinner until results become ready, that those results will eventually become ready, and that they will be displayed once they are ready. We don’t want to involve the view in the test so we don’t literally look for spinners or text rows, instead we test that the view model instructs the view appropriately. The key is asking ourselves: why is this “we don’t know exactly when the results are ready” problem not a problem for the view? How does the view get notified that results are ready?

Aha!

The view model’s results are an @Published. It is publishing changes to its results to the outside world. See, we’ve already solved the problem we have in tests. We had to, because it’s a problem for production code too. It’s perhaps obscured a bit inside the utility types in SwiftUI, but the view is notified of updates by being bound to an ObservableObject, that has a objectWillChange publisher that fires any time any of its @Published members are about to change (specifically in willSet). This triggers an evaluation of the view’s body in the next main run loop cycle, where it reads from viewModel.results.

So, we just need to simulate this in the test:

final class MyViewModelTests: XCTestCase {
  @MainActor
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    let updates = viewModel
      .objectWillChange
      .prepend(Just()) // We want to inspect the view model immediately in case the change already occurred before we start observing
      .receive(on: RunLoop.main) // We want to inspect the view model after updating on the next main run loop cycle, just like the view does

    for await _ in updates.values {
      guard let results = viewModel.results else {
        continue
      }

      XCTAssertEqual(viewModel.results, mockResults)
      break
    }
  }
}

Now this test is faithfully testing what the view does, and how a manual tester would react to it: the view model’s results are reevaluated each time the view model announces it updated, and we wait until results appear. Then we check that they equal the results we expect.

With this, a stalling test is now a concern. If the prod code is broken, the results may never get set, and then this will wait forever. So we should throw it in some sort of timeout check. Usually test frameworks come with timeouts already. Unfortunately XCTestCase only has a resolution of 1 minute. It would be nice to specify something like 5 seconds, so we can write our own withTimeout function (I won’t show the implementation here):

final class MyViewModelTests: XCTestCase {
  @MainActor
  func testLoadData() async throws {
    let mockResults = setupMockServerResponse()

    let viewModel = MyViewModel()

    let updates = viewModel
      .objectWillChange
      .prepend(Just()) // We want to inspect the view model immediately in case the change already occurred before we start observing
      .receive(on: RunLoop.main) // We want to inspect the view model after updating on the next main run loop cycle, just like the view does
 
    try await withTimeout(seconds: 5.0) {
      for await _ in updates.values {
        guard let results = viewModel.results else {
          continue
        }

        XCTAssertEqual(viewModel.results, mockResults)
        break
      }
    }
  }
}

The mindset here is to think about why anyone cares that this concurrent work is started to begin with. The only reason why anyone else would care is that they have access to some kind of state or notification that is changed or triggered by the concurrent task. Whatever that is, it’s obviously public (otherwise no one else could couple to it, and we’re back to the original question), so have your test observe that instead of trying to hook into scheduling of concurrent work.

In this example, it was fairly obvious what the observable side effect of the task running was. It may not always be obvious, but it has to exist somewhere. Otherwise you’re trying to test a process that no one can possibly notice actually happened, in which case why is it a requirement (why are you even running that task)? Whether this is setting some shared observable state, triggering some kind of event that can be subscribed to, or sending a message out to an external system, all of those can be asserted on in tests. You shouldn’t need to be concerned about tasks finishing, that’s an implementation detail.

Conclusion

We found a way to solve the immediate problem of a test not knowing how long to wait for asynchronous work to complete. As always, introducing an abstraction is all that’s needed to be able to insert a test hook that provides the missing information.

But the more important lesson is that inserting probes like this into objects to test them raises questions: why would you need to probe an object in a way production code can’t to test the way that object behaves from the perspective of the production objects that interact with it (that is, after all, all that matters)? I’m not necessarily saying there’s never a good answer for this. At the very least it may replace a more faithful (in terms of recreating what users do) but extremely convoluted, slow and unreliable test with a much more straightforward, fast and reliable one that “cheats” slightly (is the risk of the convolutedness, skipping runs because it’s slow and ignoring its results because it’s unreliable more or less than the risk of producing invalid results by cheating?).

But you should always ask this question, and settle for probing the internals of behavior only after you have exhaustively concluded that it really is necessary, and for good reason. In the case of testing concurrent tasks, the point of waiting for tasks to “complete” is that the tasks did something, and it’s this side effect that you care about. You should assert on the side effect, which is the only real meaningful test that code executed (and it’s all that matters).

In general, if you write concurrent code, you will already solve the notification problem for the sake of production code. You don’t need to insert more hooks to observe the status of jobs in tests. The only reason the status of that job could matter is because it affects something that is already visible, and it is possible to listen to and be immediately notified of when that change happens. Whether it’s an @Published, an Observable/Publisher, a notification, event, C# delegate, or whatever else, you have surely already introduced one of these things into your code after introducing a concurrent task. Either that, or you’re calling out to an external system, and that can be mocked.

Just find that observable state that the job is affecting, and then you have something to listen to in tests. Introducing the scheduling interface is a way to quickly cover areas with tests and get reliable passing, but you should always revisit this and figure out what the proper replacement is.

Agile Design

Day: September 17, 2023

Testing Async Code