What Is “Agile” Anyway?

No one spends any significant amount of time working in an “agile” software shop, or going through any type of “agile” training, without encountering the distinction between “agile” and “waterfall”. In fact, the distinction is so important, it might be fair to say that the only real definition of an “agile” process is that it isn’t waterfall. At least that is the case in this age of big consulting firms teaching software shops how to be “properly agile” (Big Agile, or Agile, Inc., as we call it). Waterfall is the ultimate scapegoat. However much we try to pin down its essential features, what matters, and what ultimately verifies whether we have correctly described it, is that it is the cause of everything that ever went wrong in the software world.

You may think I’m being unfair, and I’m not necessarily saying this was conscious on anyone’s part (I’m also not necessarily saying it wasn’t conscious), but if you haven’t already, eventually you will encounter debates over process that amount to, “that’s waterfall, and waterfall is bad”. Accusing something of being “waterfall” is the ultimate diss in the agile world. The defense will always be to deny the charge as ridiculous.

This really hits home if you, like the proverbial heretic going through a crisis of faith, start doing some research and discover that “waterfall” is a hypothetical process created in a paper as an unrealistic and over-simplified thought experiment. No company has ever even tried to follow it. It was conjured up in order to use it as a comparison point for a more realistic process. The now prevalent charge that the software business was following waterfall, and that’s why things were so expensive, bug-prone, slow-to-adapt, or whatever else, is literally a straw man!

This is so important because, like I said earlier, the most coherent definition of “agile” in the various training programs I have been able to identify is:

agile = !waterfall

(for those non-programmers reading this, the “!” means “not”)

So then the software business has been “agile” (not doing waterfall) all along! Why, then, are we paying this company millions of dollars to teach us how to stop doing what we’ve been doing?

In the effort to comprehend this, I have often seen people identify the essential feature of “waterfall” as being the existence of a sequence of steps that are performed in a specific order:

feature design -> architecture/implementation -> testing -> delivery

Any time this kind of step-by-step process rears its ugly head, we scream “waterfall!” and change what we’re doing. The results are humorous. It’s like yelling at someone to put their hands up and drop their pants at the same time.

These steps are necessary steps to building a piece of software, and no process is ever going to change that. You can’t implement a feature before the feature is specified. You can’t test code that doesn’t exist yet (TDD isn’t saying “test before implement”, it’s saying “design the test first and use it to drive implementation”), and you at least shouldn’t deliver code before testing it. The only way to not follow these steps, in that order, is to not build software. From the paper:

One cannot, of course, produce software without these steps

(While these are the unavoidable steps of building software, this does not imply that other things, that which become commonplace in the industry, are unavoidable. This includes organizing a business into siloed “departments” around each of these steps, believing that “architecture” and “coding” are distinct steps that should be done by different people, etc.)

As the Royce paper explains, the existence and order of these steps is not the essential feature of the “waterfall” model he was describing. The essential feature, the “fall” of “waterfall”, is that the process is unidirectional and there is no opportunity to move “up”. Once a feature is specified and coding begins, feature design is over and cannot be revisited. Once coding is finished and testing begins, we can’t revisit coding (not in the sense that we can’t fix bugs uncovered by testing, but that we can’t go back and rearchitect the code). Royce then “corrects” the process by letting the water “flow up”. Not only can something that occurs in one step induce a return to a previous step, it can induce a return to any of the previous steps, even the first one.

If “agile” is promising a way to build software without this sequence of steps, it is promising something impossible, even nonsensical. So then what is agile promising? What is its point?

Let’s remind ourselves of what the word “agility” actually means. Anyone who’s played an action RPG like the Elder Scrolls should remember that “Agility” is one of the “attributes” of your character, for which you can train and optimize. In particular, it is not “Speed”, and it is not “Strength”. Agility is the ability to quickly change direction. The easiest way to illustrate agility, and the fact it competes with speed, is with air travel vehicles: airplanes and helicopters. An airplane is optimized for speed. It can go very fast, and is very good and making a beeline from one airport to another. It is not very good at making a quick U-turn mid-flight. A helicopter, on the other hand, is much more maneuverable. It can change direction very quickly. To do so, it sacrifices top speed.

An airplane is optimized for the conditions of being 30,000 ft in the air. There are essentially no obstacles, anything that needs to be avoided is large and detectable from far away and for a long time (like a storm), and the flight path is something that can be pretty much exactly worked out ahead of time.

A helicopter is optimized for low altitude flight. There are more smaller obstacles that cannot feasibly be mapped out perfectly. The pilot needs to make visual contact with obstacles and avoid them “quickly” by maneuvering the helicopter. There is, in a sense, more “traffic”: constantly changing, unpredictable obstacles that prevent a flight path from being planned ahead of time. The flight path needs to be discovered and worked out one step at a time, during flight.

(This is similar to the example Eric Reis uses in The Lean Startup, where he compares the pre-programmed burn sequence of a NASA rocket to the almost instantaneous feedback loop between a car driver’s eyes and his hands and feet, operating the steering wheel, gas and brakes)

The goal of an “agile” process in software development is to optimize for business agility over speed. Instead of deciding now that 5 years from now we want one specific software product, with all the requirements and features worked out today, and to get there in one fast-as-possible straight shot, we accept that the software industry is rapidly growing and changing, and both the evolution of technology and the changes in market trends are unpredictable. If we even know where we want to be in 5 years, we don’t know the best path to get there. It’s more likely we really don’t know what kind of software we want to have in 5 years.

That is fairly abstract. To put in more concretely: instead of building a large set of features into infrequent releases of software, say once or twice per year, we would like to frequently release small updates (anywhere from once or twice per week to several times per day), so we can get quick feedback on them. Why? So we can feed the data we can only acquire by releasing into the market back into the process of deciding what features to build next. That is the essential goal, I believe, of agile software development. We want to be able to discover empirically what software product we want by quickly, iteratively releasing small pieces of it, and using the results of release as part of the information on what the product will eventually look like.

To be clear, “releasing” to internal groups like product owners doesn’t count. Agility isn’t prototyping. This is another standard practice that’s been going on for decades that Agile, Inc. is trying to claim it invented. Giving a rough alpha build every week with a new feature to the designer isn’t something new. What I’m talking is giving new builds on a weekly (or similar) basis to your paying customers (or for a “free” app, the public).

What this is not about is being able to build a piece of software faster. In fact, as I will explain more, being able to achieve this type of rapid iteration process comes at the cost of raw speed, though this tends to be obscured. Becoming agile is often the force that pushes a software shop to adopt good engineering practices that actually do make them faster. Agility is not going to magically make it take less time than it otherwise would to build a large, complex software product. It enables the product to be built and released one piece at a time, instead of all-at-once.

Unlike agile consultants, I have no claim that this is, or should be, the goal of every software shop. In fact, I can imagine some industries that would find this quite useless. Does the control software for an automatic transmission in a car need to be iterated and released to customers every two weeks? What can this software do with the same mechanical parts of the car (which isn’t updated more frequently than once per year) that it couldn’t do before? I’m not claiming to have an answer for that, and I don’t think agile consultants have a blanket right answer either.

As I mentioned, agility gets mixed up with mature engineering practices like test-driven development, design patterns, modularizing into APIs, test automation, and so on. Those are practices that every software shop will benefit from adopting, because they represent the proper way to build software. But those can be adopted without worrying about rapid release cycles. The two are, fundamentally, orthogonal to each other, even if the ability to rapidly release tends to come up against bad engineering practices more dramatically.

Regardless, most software shops, especially small businesses, very highly value agility. They need it to help guide exactly what software they should be building. It helps them beat larger, slower-moving corporations to market with new features. Assuming that a business values being able to rapidly release new features one-by-one, the question is now: what does it take do this? There are businesses who are used to delivering software infrequently, and now want to do it frequently. It is this “transition to agile” problem that the consultant shops are really trying to solve. To understand the problem, we need to understand why a software shop would become optimized for infrequent release cycles to begin with.

First, let’s think about this sequence of steps in building software. Each one of these steps has an associated group of specialists who are skilled primarily, or exclusively, in one of these steps. As in any case where a structure has specialized parts for performing one step in a sequence, the steps will be pipelined. The organization won’t work on just one step at a time. The designers will design some features, hand them to engineering, then immediately begin designing the next round of features. By the time they finish this new round, hopefully engineering has finished the first round of implementation, handed them off to testers, and are ready to accept this new batch of features. Since any real software pipeline is not unidirectional like, say, a pipelined CPU (the fabled “waterfall”), it’s a bit a more complicated, but the essentials are unchanged. Each group is constantly busy working on whatever is the next thing available for them to do.

Like any pipeline, this is only as fast as its slowest stage. If one stage in this pipeline takes 2 months, then nothing can flow from one end of the pipe to the other in less than 2 months. If stage X can complete something in 1 week, but stage X + 1 takes 1 month on that stage, then a clog is going to occur at the junction between these stages. At the end of month 1 (that is, a month after Stage X + 1 has begun), 3 weeks worth of work still remains. At the end of month 2, 6 weeks is now piled up. Then 9, then 12, and so on. It is, in fact, quite pointless for the other stages to produce any more output than this one limiting stage can handle. This is the central consideration we need to keep in mind.

Now, we need to decide how “long” the pipe is going to be. Not how large, how long. Large refers to how much work can flow in parallel through the pipeline. This means how many workers you have in each stage, assuming they can all be kept busy. We need to decide how long it takes for each batch of work to flow through this pipeline. A very long pipe processes a very large amount of work in each cycle, and delivers a large quantity of finished work infrequently. A very short pipe processes a very small amount of work in each cycle, and delivers a small quantity of finished work frequently. Note the total rate of delivery is, we can assume, the same for both. What varies is the delay, in time, between input and output.

Agility is about building a very small pipe. An agile shop runs this whole pipeline, from beginning to end, with a delivered product coming out each time, on a small time scale, adding a small set of new features each time. A non-agile shop runs the whole pipeline on a large time scale, adding a large set of new features to each release. The essential goal of “agile” processes is to figure out how to operate a short pipeline.

Let’s assume that the time each stage takes to complete its part of the work is directly proportional to the “size” of the work. Let’s also assume a large software product can be divided into a large number of small, equally “sized” (in terms of effort) features (and this is a big assumption, we’ll need to relax it eventually). Now let’s say it takes designers X many days to work out the UI/UX specs for each of those features. It takes the engineers Y many days to implement each one. And it takes testers/devs Z many days to test, fix, and eventually deliver each one. The implicit assumption is that it takes 2X days to design 2 features, 2Y days to implement 2 features, and 2Z days to test/fix/deliver 2 features. We are assuming that the time every step of the pipeline takes is directly proportional to the number of (equally “sized”) features they are working on.

If this assumption is reasonably accurate (it does not need to be exact), then it would make no essential difference how long we make the pipe. We could push a single feature through the whole pipe, beginning to end, pretty rapidly, or we could push a big batch through, with each step completing all of them before handing them all off to the next step. It would be trivial to make the pipe shorter. Just reduce the batch size. That’s it.

But it isn’t this simple, because our assumption of proportionality is completely wrong for some of the steps. The reason why is that very important variables change over time and do not reset after each cycle. We need to keep in mind that the goal is not to release one tiny standalone app every week. The goal is to continue growing one app to have a larger and larger set of features. The crucial variable that is going to be different at the start of each cycle is the size of the current feature set that has already been delivered, and will need to be delivered again at the end of the next iteration. Today the app does X and Y. Tomorrow, we won’t want it to do Z. We’ll want it to do X, Y and Z. It still needs to do X and Y. That basic fact of software development totally spoils our proportional time model.

We can assume proportional time is reasonably accurate for the design step. Designers will typically need about the same amount of time to whip up the new UI/UX for each new feature of roughly the same scope, and while admitting I am not a designer, based on what I’ve seen, the presence of other screens/experiences in the app doesn’t wildly impact the design time (if anything, it may make the time shorter because already designed widgets/concepts can be reused).

What about implementation? Does adding a button to a brand new project look wildly different, in terms of effort, than adding a button to a million-line project? Well, it depends. A million-line codebase with terrible design will make adding a button a Herculean task. A very well-designed million-line codebase will easily accommodate a new button and, like with feature design, provide reusable tools to make new work even easier. So now we’re starting to see what the answer to, “what does it take to be agile?” involves. Part of that answer is excellent code design.

Have you figured out yet that I am obsessed with design, and preach that it is the most important part of a developer’s job?

Now let’s move onto testing. This is where things get really interesting. Remember, we aren’t trying to deliver an isolated micro-app every week. We’re trying to add a feature to an existing feature set while preserving that existing feature set. One of the most fundamental lessons that any software QA department learns is that you can’t “just” test the new features. At every release, you have to test everything. It is crucially important that you do what is called regression testing: making sure the existing features weren’t broken by the latest round of development. The obvious implication is that the workload for the testers grows proportionally with the total number of features, not the total number of new features. As the app grows older, each cycle of testing is going to involve more and more work. This completely screws up our wish to deliver new features every week. If there are 200 existing features, they all have to be regression tested. How in the world are you going to do that every week!?

The answer (unless you solve this problem, which we’ll talk about next) is you don’t, and either accept it and don’t even try to release frequently, or deny it and have your QA team scrambling and working overtime just to either fail builds and slip deadlines, or worse allow large quantities of bugs to escape to customers. Eventually you’ll end up releasing infrequently even though you “plan” to release frequently (most of the release candidates will fail).

It’s even worse than what I’ve described. The “testing” phase doesn’t just include testing. It’s a repeated cycle of:

test, find bugs, fix bugs, retest, find more bugs, fix bugs, retest, etc.

Just like the first round of testing needs to include regression, every one of these cycles needs full regression. This is another place where if people are under intense time pressure, they’ll try to “cheat”. We found a bug on screen X, got a new build from devs that fixed that bug, but we tested screen Y in the first round and found no bugs. Do we really need to test it again? Yes! That bug fix? That’s a change just like any other. It can break things. In fact, this is the end of the development process, when everyone is getting impatient, and deadlines are creeping up (or already slipping). This is where people get sloppy. Devs start throwing in patches and bandaids hoping to make one little symptom uncovered by QA disappear. This stage can easily generate most of the escaping defects.

The sensible way to deal with is problem, if the testing time cannot be prevented from growing linearly, is what successful software companies chose to do for decades: make the release cycle long enough to accommodate regression testing. If it’s going to take a few months to test and stabilize a large software product before releasing, then we can bundle feature designs and new implementation work so that those steps also take a few months each. This makes all the steps of the pipeline “fit” together nicely, everyone is kept busy, and the pipeline spits out working software iterations consistently.

A lot of the design and engineering practices of yesteryear were created to optimize for these conditions. If months of testing, bug fixing, retesting, and so on is necessary before each release, then it’s best to leave all the testing and bug fixing (or at least most of it) to this stage, and not worry about stability during development of new features. Remember, things are pipelined, so the people in the implementation step are busy coding the next round of new features at the same time the people in the test/stabilize step are tightening the screws on the upcoming release. A fundamental paradigm for working in this way is branching, and more specifically a branching policy called unstable master. The “trunk” of the codebase is the “latest but not the greatest”. This is where the “implementation” guys are coding new features, and spending little to no time making sure everything is still working fully (they will stabilize, but only enough to unblock their work on new features).

Simultaneously, there is a “release branch”, which is the code that will be delivered to customers at the end of the cycle. This branch is subject to a code freeze. No work is done on this code except to fix bugs found by the testers. In particular, adding a new feature to a release branch is the exception, not the rule, and typically requires a special “change request” process to allow it. This may be where some of the misconceptions about “waterfall” arose. Yes, within the release branch, going “back” to feature design or implementation is avoided as much as possible, but it isn’t forbidden. In fact there’s a special change request process made just for going back, even if it isn’t invoked haphazardly. This does legitimately express the inflexibility of this way of building software. If we decide this late that some feature really needs to be shipped with the other features in this upcoming release, inserting them is difficult, error-prone and likely to slip the delivery date for the entire batch of features. But this is out of necessity. If regression testing takes a really long time, then a stabilizing phase is required, and introducing large changes while trying to stabilize is disruptive. This is a consequence of the difficulty in testing large complex software, not of any “process” for building software.

An important point to consider is whether it is inherently “wasteful” to operate a long pipeline like this. If testing is the phase where some kind of flaw in the architecture, or even the feature design, gets uncovered, then didn’t we waste a lot of time building out feature specs and implementations that have to be scrapped anyways? This is a more complex problem than I think people give it credit. The time at which a flaw is discovered that necessitates revisiting (thus triggering the “upflow” that a “waterfall” would forbid) is certainly shifted right. But this shift occurs within the large time window of a single release cycle. Since the testing phase is longer, there’s actually more time, potentially, between discovery and the upcoming planned release date. The issue here is not so much that the testing of any one feature began much later, but the very fact that you have to get all the way to testing to discover the problem. The same waste is implied, it’s just shifted around, unless that waste spills over into other features that were batched into the release.

For example, let’s say a whole batch of features are all handled with one overall architecture, and a flaw in that architecture is discovered during testing that necessitates reimplementing all of those features. It would be less wasteful to discover the flaw after implementing just one of those features, right? Yes, but this implies high coupling among those features, which typically means the whole architecture has to be laid down just to deliver any one of them. This also tends to imply coupling in the testing of features. In this scenario, the features aren’t really conceived of as standalone, so the prospect of building a single feature just to get to testing earlier will typically entail significant rethinking of the features and architecture themselves, just to make this kind of isolation possible. This is extra cost. Whether it outweighs the saved cost of implementing features that end up having to be reimplemented will depend on the exact details of a situation.

The real solution to eliminating the waste of late discovery is shifting the discovery to earlier in the process. This is a whole different matter worthy of its own discussion: how do you ensure design flaws are caught at the moment they appear instead of some later part of the process? Whatever the answer is, it can be applied to a large, infrequent release paradigm and a small, frequent release paradigm (spoiler alert, the solution is BDD and TDD).

Once a release branch is stabilized, it is merged back to master, hopefully carrying along all the stability with it (though some will be lost or made worse by merge conflicts, but since master is expected to be unstable, this is okay). Then, immediately, a new release branch is created for the next release from master (meaning it contains all the new features that were added after the last release branch was made), and work begins on stabilizing that.

If you want a shorter pipeline, you have to get rid of the linear growth of testing time. By this point, the industry knows full well what the solution is: test automation. Human testers will never be able to test 200 features in anywhere near the amount of time they could test 10 features. But computers? Now we’re talking. How long does it take a computer to test a feature? 200ms? Even if its 1 minute per feature, that means a computer could test 200 features in less than 3 hours. Not bad. Get that down to 1 second per feature, and you’re running full regression in less than 2 minutes.

Now, you might hear this and think: “why do I need to retest everything? If I make a change to this screen, sure I should retest the stuff on that screen, but why do I need to retest everything?” The problem this brings up is: can you divide an app into truly separate components that can be validated independently? The answer is yes, and that is a very important part of the strategy to become agile, but for now I can only say you don’t get this for free. The fact something is on a different “screen” is an arbitrary division, and any company that has tried to get away with this kind of “only test around what you changed” strategy inevitably gets burned by it. Especially if the code is badly designed (there I go about design again), it will contain tons of tight coupling (“spooky action-at-distance”, as we call it) that will almost guarantee “random” bugs popping up everywhere after a change to one single area.

Assuming the problem of modularizing hasn’t been solved, which is a highly complex engineering problem, the only other solution is to somehow make the amount of time it takes to test a growing set of features independent of the total size of those features. Now obviously it can’t be completely independent. More features will take some more time to test, period. But whatever that proportionality constant is, we have to make it as small as possible.

It’s instructive to think about how the engineering practices change when testing becomes a rapid step and there is no longer a need to bundle features up together into large numbers of “releases”. When we do this large batching, we create branches for each release. When, instead, we want to release frequently, one feature at a time, the branching policy becomes inverted. We switch to a stable master paradigm, where the trunk is “the latest and the greatest”. Automated regression enables a large master branch to both be updated with new work frequently, and also remain stabilized. Developers instead create feature branches off of master, complete that one feature, then attempt to merge it back to master. If testing is automated, it can be integrated into the integration pipeline (the pipeline that takes a feature branch and merges it into master), so that a dev can simply submit a merge request, and it will either be automatically merged after running the test suite or, if a test fails, get rejected and notify the dev.

By decentralizing the new work, ideally a problem introduced in one feature branch only delays delivery of that feature. By preventing it from getting into master, other features being worked on can, if they have no issues, be merged to master and get delivered to customers. This is very much unlike the older paradigm, where the “pass/fail” condition is on the entire batch of features. This is continuous integration (CI). If master is kept stable by it, then there’s no reason to not automate the delivery of every new snapshot of master, which becomes continuous delivery (CD). So then with a frequent release cycle, CI/CD becomes a crucial part of the tools.

The more extreme form of this is to take feature branches down to the level of single commits, which removes the need to branch altogether. If master is protected by automated tests, and especially if those automated tests are so well-designed they take mere seconds to run, you might as well make quick, rapid single working changes and immediately commit and push them to master. But that means “half-finished” features are in master, and automatically getting delivered to customers. How do we handle that? With “dark releasing”, meaning some way of toggling access to a feature in code. This decouples making a feature “available” to the end user from having the code from that feature in the app the user is running.

There is then no reason to not get the code into master as quickly as possible. The less time spent between code getting typed into your code editor, and getting pushed to master, the less you have to deal with merge conflicts, and other unpleasant side effects of maintaining multiple divergent copies of the code (what’s really going on here is realizing that the problem we’ve delegated to “version control” software like GIT isn’t really a problem of version control but a problem of variation, and that’s better solved with the same code that handles all the other variations in software).

Anyone who’s attempted this paradigm of working on a codebase with bad design can attest to how utterly impossible it is. You cannot make changes to master multiple times per day and expect it to stay “release-worthy” if you have not worked to keep your code in excellent shape.

So in summary, we’ve identified three engineering problems that need to be solved to enable agility:

  • Excellent code design that ensures the development effort for new features does not grow with the number of current features, and hopefully shrinks
  • Modularizing the software into small components, each of which can be independently worked on, tested/validated and delivered in isolation
  • Automating the regression tests, so that the only work in each cycle for testers is to implement automation for the new features. Everything doesn’t need to be automated. As long as the set of non-automated features remains reasonably small, so that manual testing of those can be completed quickly

Note that all of these problems are nonexistent at the beginning of an app’s life, and only become issues later, and they continue to become bigger and bigger issues if not addressed. This is the honeymoon period. I think the combination of starting a fresh new project and bringing in the expensive agile consultants at the same time is like mixing alcohol with coed college kids. It’s gonna be a fun, exhilarating time, until you wake up the next morning with a disaster on your hands trying to trace your steps and figure out what went wrong. It’s easy to be agile at the beginning. None of these problems need to be solved. The consultants might even tell you, “those aren’t essential problems. As long as you follow our process of goofy team names, backlog groomings, and Fibonacci sequences, you’ll get the agility you seek” (this isn’t a straw man. On multiple occasions I had the “Agile gurus” specifically tell my dev team to eschew good design and “just get the feature working”). And at this stage, that will appear true. It’s extremely seductive, especially for the product/marketing side who never fully understand the engineering problems anyways and get annoyed they exist in the first place.

Don’t fall for it.

Now that it is clear test automation is the central enabler of business agility for software, we can realize it’s quite an unfair accusation that the software industry wasn’t agile three decades ago. Test automation is a new, cutting edge technology. It simply didn’t exist that long ago. The only choice companies had was to manually test, and the process they built around this necessity was quite well-optimized. Today we’re spoiled by all the advanced tools for test automation, plus the knowledge gained from experience of engineering practices that effectively produce trustworthy automation (TDD and BDD).

I want to touch briefly on the problem of modularizing in order to allow “isolated testing”. This deserves an entire article, or series even, on its own, but I want to mention what that entails. If you want to break an app up into pieces like this, each piece has to be its own product. It has to have its own pipeline, its own product requirements (even if it is a code library, the library needs requirements to be precisely defined), and its own deliveries. The central engineering practice that needs to be adopted for this is API design.

The goal is to build software like manufacturers build PCs. A PC is made up of a motherboard, a CPU, memory, peripherals, etc., and they can all be purchased independently and assembled in endless combinations. The number of combinations grows factorially, or worse (this is called combinatorial explosion). It is simply not possible for every motherboard manufacturer to plug in every single combination of CPU, memory, graphics card, hard disks, etc. that exists for testing. Instead, they design specifications for how these components will work together.

This includes both physical specifications about number of pins, shape and size of the plugs, and so on. By following these specs, we can be sure a CPU will actually fit into the CPU slot on a motherboard. Then, they have to define specs for what voltage levels the pins work on, and other aspects of the electronics. Finally, they have to define logical specs: what sequence of bits gets presented on each pin, what they mean, and what is expected to come back to another pin. Each manufacturer takes a very precisely defined specification and uses it to test that:

  • Its own components follow the specification correctly
  • Its own components perform the desired function when connected to other components that follow their specifications correctly

A motherboard manufacturer then test their components across the range of valid behavior for CPUs, memory, etc. in the spec. If the spec says the pins operate between 1 and 3 volts, they’ll test to make sure the motherboard works within that range. Meanwhile, the CPU manufacturers test to make sure their CPUs work within that entire voltage range.

You may have conceptually divided your software up into areas of concerns, even defined them as “libraries” or “modules” in your project. The system architecture may have drawn clear lines between areas (the most obvious is the line between the client and the server). But this doesn’t mean you’ve truly modularized the software. There’s a simple check you can make: do you test the modules on their own, or do you test them by connecting them to the app and then testing the app? If your workflow on the server side is: make server updates, run unit tests you don’t really trust, then have the client developers connect to your new endpoint and make sure all the client features that use your service still work, then even the server isn’t a standalone module (and if it isn’t, you better not be making updates to it that go to the production environment without going through a normal “release cycle” for the client).

Let’s say, instead, you have an API spec that is a first-class citizen in the requirements world, the server code is updated, tested and delivered only against this spec (using acceptance tests for each item in the spec). You are developing and validating the client app against a mock server based, again, on this API spec. If any defect is found in a final, end-to-end test, it requires root cause analysis to determine which of the two components’ acceptance tests are broken and why this wasn’t caught earlier, then you can safely say the server is a standalone module, and you can make changes to it without having to regression-test the client. Until then, you better do proper regression testing after every change, and since those tests will necessarily be end-to-end, they’ll probably take a long time to run.

That brings us, finally, to the product side. At the beginning I said, “let’s assume a product has been divided up into a large set of small features”, and that this is a big assumption. As it turns out, this is no more trivial to accomplish than the engineering capabilities we just discussed. The agile consultants parrot the words “vertical slicing, vertical slicing” over and over, as if it provides any really meaningful advice on how to conceive of a new capability as a composition of small capabilities. In reality, what I’ve seen is that a large feature is “broken up” into its various technological facets: the database, and third party library requirements, server-side work, client-side backend work, client UI, and so on. This is the kind of “horizontal slicing” we’re instructed not to do. But how do we not do it!? Usually the team can’t imagine how else to do it (this is evident in the creation of the so-called “enabler story” to formalize the workflow of building up a feature in multiple steps, with nothing being deliverable until the last step is completed).

What “vertical slicing” really looks like is defining minimum viable products (MVP). Even if you can find a way to “vertically” slice, say, a new screen in an app, if it’s useless to customers until every widget is built, then what’s the point? You aren’t going to release the half-finished screen to customers anyways, so this isn’t “agility” (getting something new all the way to customers frequently) in the sense we mean here. That doesn’t mean getting working widgets into master one at a time isn’t valuable (it is). What is agility (business agility) is figuring out what the bare minimum screen that is useful to customers is.

This probably isn’t a screen that is blank and just has a “back” button. You have to figure out what is “essential” to the new feature, and what is an enhancement. This can either be an aspect of the feature, or an aspect of quality (for example, an MVP might only cover the happy path, acknowledging that it might explode if any unexpected edge cases occur).

This tends to make product owners upset. No one wants to release “incomplete” or “unpolished” work like this. And no discussion of how to be agile can tell you whether you actually want to release incomplete/unpolished work, or exactly how complete/polished it needs to be before releasing it. All we can say is, this is the question you have to work out for your business to determine how agile you really want to be. The obvious question that comes up is: what’s the point? If you’re going to make a new screen with no bells or whistles, why give it to customers before?

Well, there are some good answers to this question. Maybe doing so will let you beat your competition to market. They’re worried about perfecting it, but you get something out there quickly, and you’ll polish it by the time your competitor releases their polished version. In the meantime you can genuinely say you’re offering more than the competition.

Even more critically, are you sure this feature is worth the business’s time and money? Maybe you think this new screen will double your app’s purchases for month. But will it? Maybe a rough draft of it in customer’s hands will help answer this. If that experiment provides clear evidence that customers don’t care about this new screen, then it’s a waste of time to polish it! Better to know that before you’ve polished it than after.

However, if you’re dead set on this batch of features, they’re all already designed, you’re not worried about getting to market quickly, and no market feedback is going to make those features budge, then it’s quite likely that it serves no purpose to iterate like this (again, this doesn’t mean there’s no purpose in making small, frequent working changes to master). It will inevitably make the eventual goal of getting to the polished feature set take longer. It is extra work to make sure early versions of the features are a useful customer experience by themselves and good enough to release, compared to there being no requirement whatsoever that any of it works or provides value until everything, in its polished form, is working at the end. It’s just a needless cost if you’re going to ignore any feedback this early releasing will afford you anyways.

Okay, so dividing a new capability into small features that can be iteratively released quickly involves defining MVP. What does this mean exactly? Primarily it means making difficult choices. Designers will typically always conceive of features in their final state that they hope to one day achieve. The main thing they need to do is decide now what is going to be sacrificed for the time being, to enable what’s left to get out the door. The more they’re willing to sacrifice, the faster what’s left gets delivered. Product owners and designers have trouble with this. The stripped down, bare minimum feature isn’t something they really want to ever exist. It is only a temporary stop on the way to a long-term destination. But they have to let it go. They have to avoid the emotional attachment to the finished product, and recognize arguments like, “we’ll get bad reviews if we don’t polish it” as justification for this emotional attachment.

Okay, so the product people let go of their pot of gold at the end of the rainbow. How do you carve a feature up in a way that identifies a small piece as the first minimally viable one? This is the practice of product development that requires continual practice, ideally some training, and patience. It’s a craft/skill just like software engineering. The techniques of behavior-driven development, particularly expressing requirements in Gherkin, are well-suited in aiding this task. The Gherkin, developed in collaboration among the Three Amigos, helps to flesh out all the minutia of details that are involved in building a feature.

This will, hopefully, result in a very large scenario will a lot of givens and, even better, a lot of whens and thens (from what initially looked like a small, “atomic” or indivisible feature). This provides a path forward to slicing it up. Separate along the “whens” and the “thens”. Then try to identify “given-when-then” sequences embedded in the given of one scenario, and factor them out. Then start organizing the collection of small scenarios. Some of them are independent. Others are tied sequentially to each other (the “then” of one is the “given/when” of another). This will reveal dependencies among these scenarios. Dependent ones can’t be implemented without implementing the ones they depend on first. So identify where the dependency chains begin. How long does that chain need to be to provide some valuable unit of customer experience?

This is, like the engineering problems above, a dedicated topic of its own. But that is, roughly, what starting to solve the problem of product development would look like.

The main point here is that solving these engineering and product development problems is the essential task of increasing and optimizing for business agility. If you want to release working software frequently, this is where your time needs to be spent. If you work on learning about and solving these problems, you will work toward achieving business agility.

If you spend all your effort thinking about how to define your org chart, whether to call this organization a “release train” or “solution train”, how many planning meetings to have per week and what to call them, etc., and you’re ignoring or downplaying the engineering and product development problems, then you won’t.

Tests vs. Test Specifications

When you first get introduced to the idea of test driven development, it may seem strange that tests become such a central focus. Sure, testing is important, and without doing so you wouldn’t catch unintended behavior before releasing it to customers. But why should tests be the driving concern of development? Why are they the driver instead of, say, a design document, or user stories?

In order to understand this, we need to draw an important distinction. I’ll start with an example. Consider the following sequence of instructions:

1: Place 4 cups of water in a large pot
2: Bring the water to a boil, then reduce to a simmer
3: Place 2 cups of white rice into the water, and cover
4: Cook for 25 minutes

Are you looking at food? No, you’re looking at a recipe for food. The recipe is a sequence of instructions that, if followed, will result in food. You can’t eat a recipe. You can follow the recipe, and then eat the food that gets created.

When we talk about “tests” in test-driven development (TDD), we’re not actually talking about the act of “testing”. We’re actually talking about the recipes for testing. When a developer who writes “automated” tests hears the word “test”, he most likely thinks of something like this:

@Test
public void testSomeBehavior() {

prepareFixturesForTest();

SomeClass objectUnderTest = createOUT();

Entity expectedResult = createExpectedResult();
Entity actualResult = objectUnderTest.doSomeBehavior();

Assert.assertEquals(expectedResult, actualResult);
}

That sequence of instructions is what we mean we way say “test”. But calling this a “test” is potentially confusing, because it would be like calling the recipe I printed above “food”. The “test”, meaning the process that is performed and ends with a “success” or “failure”, is what happens when we follow the instructions in this block of code. The code itself is the instructions for how to run the test. A more accurate term for it is a “test recipe”, or test specification. It is the specification of how to run a test. Testing means actually executing this code, either by compiling it and executing in on a machine, or having a human perform each step “manually”.

Before “automated” tests that developers write in the same (or similar) language in which they write their production code, testers were writing documents in English to describe what to do when it is time to test a new version. The only difference is the language. Both of these are test specifications, which are the instructions followed when doing the actual testing.

When we say “test-driven development”, we’re not talking about the driving force being the act of running tests on an application. We’re really talking about the creation of test specifications. We really mean “test-specification-driven development”. Once that is clear, it starts to make sense why it is so effective for test specifications to be the driver.

The full, explicit realization of what test specifications actually are is, arguably, the defining characteristic of “behavior-driven development” (BDD). By building on top of TDD, BDD recognizes that tests (really, test specifications) are the most thorough, accurate and meaningful form in which the specification for behavior/requirements exist. After all, what is the difference between a “story”, or “design spec”, or some other explanation of what a piece of software is supposed to do, and the instructions for how to validate whether it actually does that or not? The answer is… nothing! Well, the real difference is that stories or design specs can be vague, ambiguous, missing details, etc., and it’s not obvious. When you interpret a design spec as the step-by-step instruction for how to validate the behavior, so exact and detailed that a machine can understand it, suddenly those missing details will become obvious, and they’ll need to be filled in.

Before the underlying equivalence of design spec and test spec was properly understood, testers often became the ones who filled in the missing details, as they were turning vague spec requirements into fleshed out test scripts (whether they were writing them down, or just storing them in their heads). Ultimately, the testers were the true product owners. They ultimately dictated the minute details of behavior in the app, by defining exactly what behavior is “pass”, and what is “fail”. Of course a necessary step in releasing software is that it “passes” QA. When the software ends up in the hands of product owners and they aren’t happy with what they see despite it “passing” the tests, (or, the opposite, they are happy with what they see but QA insists it “failed” the tests), it creates a lot of confusing noise in the development pipeline, in the form of undocumented change requests (that will typically re-trigger confusion on future releases) or bogus bug reports. Furthermore, developers won’t really know if they coded what they were supposed to until after they send something to testers and get the feedback. In the “classic”, more siloed shops, with less communication between the “dev” org and the “QA” org, devs often wouldn’t see the test scripts QA is using, and would have to gradually discover what they consider “correct” behavior to be through a game of back-and-forth of releasing, failing, re-releasing, etc.

TDD and BDD are the solution to these problems. If it’s not the same developers who will eventually implement the behavior who also write the tests for that behavior (one of the common objections to TDD is that the coders and testers should be different, but they still are. Automated tests are run by machines, not the coders), they at least have access to that test and are using it as the basis for what code to write and when to decide it is satisfactorily completed. The creation of a test specification is correctly placed at the beginning, rather that the end, of the development cycle, and is actively used by the developers as a guide for implementation. This is exactly what they used to do, except they used the “design spec” or “story acceptance criteria” instead of the exact sequence of steps, plus the exact definition of what is “pass” and “fail”, that the testers will eventually use to validate it.

The alternative to TDD is “X-driven-development”, where X is whatever form in which a design requirement exists in the hands developers as they develop it. Whatever that requirement is, the testers also use it to produce the test script. The error in this reasoning is failing to understand that when the testers do this, they are actually completing the “design spec”, which is really an incomplete, intermediate form of a behavioral requirement. TDD, and especially BDD, move this completion step to where it should be (at the beginning), and involve all the parties that should be in attendance (most importantly the product owners and development team).

Also note that while the creation of the test spec is moved to the beginning of the development, the passing execution of the test is still at the end, where it obviously must be (another major benefit TDD has is adding test executions earlier, when they are supposed to fail, which tests the test to ensure it’s actually a valid test). The last step is still to run the test and see it pass. Understanding this requires explicitly separating what we typically call “tests” (which are actually test specifications) from the act of running tests.

With this clarified, hopefully developers will acquire the appropriate respect for the tests in their codebase. They aren’t just some “extra” thing that gets used at the end as a double-check. They are your specifications. In your tests lie the true definition of what your code is supposed to do. It is the best form of design specification and code documentation (much better than a code comment explaining in often vague words what the author intends, is a test that can be read to understand exactly what will make it pass, plus the ability to actually run it and confirm it does pass) that could possibly exist. That’s why they are arguably more important that the production code itself, and why a developer who has truly been touched by the TDD Angel and “seen the light” will regard the tests as his true job, and the production code as the secondary step.

This, I believe, is the underlying force that additionally makes TDD a very effective tool at discovering the best design for code, which I think is its most valuable feature. Well-designed code emerges from a thorough understanding of exactly what problem you are trying to solve. The fact that writing unit tests helps you discover this design earlier than you otherwise would (through writing a version of it, then experiencing the pain points of the initial design firsthand and refactoring in response to them) is because tests (test specifications) are specifications placed on every level, in every corner, of the codebase.

Code that is “not testable” is code whose behavior cannot be properly specified. The reason why “badly designed” code is “bad” is because it cannot be made sense of (if it works, it’s a happy, and typically quite temporary, accident). Specifying behavior down to unit levels requires making sense of code, which will quickly reveal the design forces contributing to it being un-sensible. This is really the same thing that happens on the product level. Instead of waiting until a defective product is out and discovering misunderstandings, the misunderstandings get resolved in the communal development of the behaviors. Likewise, developers who use TDD to drive design, which is when development truly becomes test-driven, don’t have to wait until a problem is solved to realize that the solution is problematic. Those design defects get discovered and corrected early on.

What’s driving development in TDD isn’t the act of validating whether the code is correct. It is the act of precisely defining what correctness means that drives development.

Massive View Controller

Massive View Controller

The Model-View-Controller (MVC) set of design patterns for GUI application development has devolved into what is derisively called “Massive View Controller”.  It is a good lesson in design thinking to follow how this devolution occurred.  The most interesting point, and what is in most need of explanation, is that in the original formulation of MVC, the controller was meant to be the smallest of the three components.  How did it end up engulfing almost all application code?

The answer, I believe, is that two forces have contributed to the controller becoming the dumping ground for almost everything.  One is in how the application frameworks for various platforms are designed.  When we look at mobile platforms like iOS and Android, both instruct developers to create an application by first creating a new subclass of their native “controller” class.  On iOS, this is UIViewController, and on Android, it is Activity (the fact either of these is seen as the C of MVC is a problem already, which we’ll get to).  This is a required step to hook into the framework and get an opportunity for your application code to begin executing.  But there is no similar requirement to create customized components for the M or V of MVC.  With no other guidance, novice developers will take this first required step, and put as much of their application code into this subclass they are required to create as possible.

The other is a widespread misunderstanding among developers of what the “model” and “view” of MVC are supposed to be.  Both “Model” and “View” are somewhat vague terms that mean different things in different contexts.  The word “model” is often used to refer to the data objects that represent different concepts in a code base.  For example, in an application for browsing a company’s employees, there will be a class called Person, with fields like name, title, startDate, supervisor, and so on.  A lot of developers, especially mobile developers, have apparently assumed that the M in MVC refers to these data objects.

But the authors of MVC weren’t instructing people to define data objects.  This is already a given in object-oriented programming.  Obviously you’re going to have data objects.  They didn’t think it was necessary to say this.  The M in MVC refers to the model for an application page, which specifically means the “business logic”.  It is the class representing what a page of an application does.  It handles the data, state and available actions (both user-initiated and event-driven) of a certain screen or visual element of a GUI application.  Most of what developers tend to stuff into the controller actually belongs in the model.  The old joke of MVC is that it’s the opposite of the fashion industry: we want fat models, not thin models.  Models should contain most of the code for any particular page or widget of an application.

Similarly, a lot of developers tended to assume “View” meant widgets: the reusable, generic toolbox of visual components that are packaged with a platform framework.  Buttons, labels, tables, switches, text boxes, and so on.  Unless some kind of custom drawing was needed, any “view” of an application is really just a hierarchical combination of these widgets.  Assuming that a custom “View” is only needed when custom drawing is needed, the work of defining and managing a hierarchy of widgets was put into the controller.

With these two misunderstandings, clearly none of the application-specific code would go into models, which are generic data objects not associated at all with any particular screen/form/activity, or into views, which are generic widgets usable by any graphical application.  Well, there’s only one other place for all the actual application logic to go.  And since developers were being told, “you need three components”, it appears many of them interpreted this as meaning, “all the application code goes into this one component”.  And thus, Massive View Controller was born.

As this antipattern spread throughout the community, the blame was misplaced on MVC itself, and new pattern suites to “fix” the “problems” with MVC emerged.  One of the better known ones in the iOS community is the VIPER Pattern.  This renames “Model” which, remember, devs think means the data objects, to “Entity”, and according to most of what you read about it, “splits” the Controller into a Presenter, which handles presentation logic, and Interactor, which handles the use case or business logic.

Now that we understand the confusion about MVC, we can see that VIPER is just Model-View-Presenter (MVP) reinvented. All that happened here is that the mistaken notions were corrected, but it was framed as the invention of a new pattern, instead of the reassertion of the correct implementation of an old pattern.  The “entities” were never part of the GUI design patterns to begin with.  The “Model” is actually what VIPER calls the “Interactor”, and always has been.  The only really novel part is the concept of a Router, which is supposed to handle higher-level navigation around an application.  But the need for such a component arose from another misunderstanding about MVC that I’ll talk about in a moment.  There are some more specific suggestions in VIPER about encapsulation: specifically, to hide the backend data objects entirely from the presentation layer, and instead define new objects for the Interactor to send to the Presenter.  This wasn’t required in MVC, but it isn’t incompatible with it either.  If anything that’s an additional suggestion for how to do MVC well.

As I mentioned before, the intention of MVC was that the Model would contain most of the code.  In fact, the Controller was supposed to be very thin.  It was intended to do little more than act as a strategy for the View to handle user interaction.  All the controller is supposed to do is intercept user interactions with the views, and decide what, if anything, to do with them, leaving the actual heavy lifting to the Model.  The Model is supposed to present a public interface of available actions that can be taken, and the controller is just supposed to decide which user interaction should invoke which action.  In MVC, the Controller is not supposed to talk back to the View to change state, because the Model would become out of sync with what is being displayed.  The Controller is only supposed to listen, and talk to the Model.  The Controller is not supposed to manage a view hierarchy.  The view hierarchy is a visual concern, to be handled by the visual component: the View.  A page in an application that is made up of a hierarchy of widgets should have its own View class that encapsulates and manages this hierarchy.  Any visual changes to the hierarchy should be handled by this View class, which observes the Model for state changes.  The presentation logic is all in the View, and the business logic is all in the Model.

This leaves very little in the Controller.  The Controller is just there to avoid having to subclass the View to support variations in interaction.  Views can be reused in different scenarios. For example, a “Edit Details” screen can be used to edit the details for a person in an organization, and also edit the details for a department in an organization, by allowing the displayed fields to vary. But another variation here is what happens when the user presses “Save”. In one situation, that triggers a person object to be saved to a backing store. In the other, it may trigger a prompt to display the list of people that will be impacted by the update. To avoid having to subclass the EditDetailsView component, the decision of which Model action to invoke is delegated out to an EditDetailsController.

Another major point of confusion is that in the original MVC, every component on the page was an MVC bundle.  For example, if we have a login page, which contains two rows, each of which has a label and a textbox, the first row being for entering the username and the second for the password, a submit button, and a loading indicator that can be shown or hidden, the typical way developers will do this is to build one big MVC bundle for this entire page, which manages everything down to the individual labels, textboxes, button, etc.  But originally, each one of these components was supposed to have a Model, View and Controller.  Each label would have a Model and a View (the Controller wouldn’t be necessary, since the labels are passive visual elements that cannot be interacted with), each textbox would have a Model, View and Controller, same for the button, and so on.

This is another point where the framework designers encouraged misunderstandings.  The individual widgets are designed as single classes that contain everything.  A Label, for example, not only contains the drawing logic for rendering text, it also holds all the data for what needs to be drawn (namely, the text string), and all the presentation data for how to draw it (text attributes, alignment, font, color, etc.).  The same is true of text boxes.  Only the Controller part is delegated out. iOS, as with all Apple platforms, uses targets and selectors for this delegation, but the target may or may not be what Apple frameworks call the “controller” (though it almost always is), and the granularity is on the level of individual interactions. Android uses a more standard OOP pattern of callback interfaces, but they are still one-per-interaction.

Along with this pattern of having the page-level components do all the management for the entire page, the inverse problem emerged of what to do when different pages need to communicate.  Thus the “Router” of VIPER was born, out of a perceived need to stick this orchestration logic somewhere.  But if you understand that MVC is inherently hierarchical, with all three components existing on each level of the view hierarchy, then it becomes clear where this “routing” behavior goes: in the M of whatever container view holds the different pages of an app and decides when and how to display them.  Since the platform frameworks are so inheritance-based, and typically give you subclasses with little to no configurability for these “container” views (examples on iOS would be UINavigationController, UITabBarController, etc.), they really don’t give you a way to follow their intended patterns and also have a sensible place for this “routing” logic to go.  But if the navigation or tab-bar (or other menu-selecting) views were all MVC bundles, then that logic would naturally live in the Models of those views.

Examples are also helpful, so I developed four implementations of a login page in an Android app to illustrate what traditional MVC is intended to look like.  The first one is Massive View Controller, what so many devs think MVC means.  There is a LoginService class that performs the backend work of the login web call, but all the business logic, visual logic, and everything in between is stuffed into a LoginController, which subclasses Activity.

public class LoginController extends AppCompatActivity implements LoginService.OnResultHandler {

private static final int MAX_USERNAME_LENGTH = 16;
private static final int MAX_PASSWORD_LENGTH = 24;

private TextView usernameLabel;
private EditText usernameField;

private TextView passwordLabel;
private EditText passwordField;

private Button submitButton;

private ProgressBar loadingIndicator;

private View errorView;
private TextView errorLabel;
private Button errorDismissButton;

private LoginService loginService;

@Override
protected void onCreate(Bundle savedInstanceState) {

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main);

// Assign view fields
usernameLabel = findViewById(R.id.username_label);
usernameField = findViewById(R.id.username_field);

passwordLabel = findViewById(R.id.password_label);
passwordField = findViewById(R.id.password_field);

submitButton = findViewById(R.id.submit_button);

loadingIndicator = findViewById(R.id.loading_indicator);

errorView = findViewById(R.id.error_view);
errorLabel = findViewById(R.id.error_label);
errorDismissButton = findViewById(R.id.error_dismiss_button);

// Configure Views
usernameLabel.setText("Username:");
passwordLabel.setText("Password:");

submitButton.setText("Submit");
errorDismissButton.setText("Try Again");

// Assign text update listeners
usernameField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

handleUserNameUpdated(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

passwordField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

handlePasswordUpdated(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

// Assign click handlers
usernameField.setOnClickListener(v -> usernameFieldPressed());
passwordField.setOnClickListener(v -> passwordFieldPressed());
submitButton.setOnClickListener(v -> submitPressed());
errorDismissButton.setOnClickListener(v -> errorDismissPressed());

// Create Service
loginService = new LoginService(this);
}

private void usernameFieldPressed() {

usernameField.requestFocus();
}

private void passwordFieldPressed() {

if(usernameField.length() > 0)
{
passwordField.requestFocus();
}
else
{
showErrorView("Please enter a username");
}
}

private void submitPressed() {

loadingIndicator.setVisibility(View.VISIBLE);

loginService.submit(usernameField.getText().toString(), passwordField.getText().toString());
}

private void errorDismissPressed() {

errorView.setVisibility(View.INVISIBLE);
}

// OnResultHandler
@Override
public void onResult(boolean loggedIn, String errorDescription) {

loadingIndicator.setVisibility(View.INVISIBLE);

if(loggedIn)
{
// Start home page activity
}
else
{
showErrorView(errorDescription);
}
}

private void handleUserNameUpdated(String text) {

if(text.length() > MAX_USERNAME_LENGTH)
usernameField.setText(text.substring(0, MAX_USERNAME_LENGTH));

updateSubmitButtonEnabled();
}

private void handlePasswordUpdated(String text) {

if(text.length() > MAX_PASSWORD_LENGTH)
passwordField.setText(text.substring(0, MAX_PASSWORD_LENGTH));

updateSubmitButtonEnabled();
}

private void updateSubmitButtonEnabled() {

boolean enabled = usernameField.length() > 0 && passwordField.length() > 0;

submitButton.setEnabled(enabled);
}

private void showErrorView(String errorDescription) {

errorLabel.setText(errorDescription);
errorView.setVisibility(View.VISIBLE);
}
}

The features implemented here are a basic login screen with two rows of text entry, one for the username, and one for the password. There is a “submit” button that initiates the login request, during which time a loading indicator is shown. If the login fails, an error view is shown with a description of the error, and a “Try Again” button that dismisses the error view and allows the user to make another attempt. There are some additional requirements I added to make the example more illustrative: the username and password fields have maximum length limitations, and attempting to edit the password field while the username is empty causes an error to be shown.

If we want to start refactoring this, the first step is to create proper MVC components for the Login page. This is the correction of the main misunderstanding about MVC. The Model is not a backend object representing a logged-in user or a login request, or a service object for performing the web call. The Model is for the login page. It is where the business logic of this page should live, independently of any logic for actually displaying it to a user. The Model is concerned with data, but it is data for the login page. Hence we call it the LoginModel. Likewise, everything about the view hierarchy, i.e. which widgets are on the screen, should be encapsulated into a LoginView, which does not expose this hierarchy to the outside world. I left it to the Activity to inflate a layout, and then pass the inflated view into the LoginView, but it would also be acceptable to have the View do this privately (the downside of course is that the layout is inflexible in that case).

Also, I started moving away from inheritance. A common way to do MVC is have the View inherit the framework View class. But this creates the classic problem of inheritance, which for this Android example would mean hardcoding which type of ViewGroup the Login page should be (ConstraintLayout, LinearLayout, FrameLayout, etc.). Instead I opted for composition: the LoginView doesn’t inherit anything, but contains the View object that holds the framework view hierarchy. The Activity subclass, which is required by Android, was factored out into a separate component that only creates and holds onto the MVC bundle. The Controller is reduced to its intended role of being a strategy for how the View triggers behavior in the Model (which allows a different strategy to be picked without having to change the View code, which deals exclusively with displaying the page). The Activity is now this:

public class LoginActivity extends AppCompatActivity {

private LoginView view;

@Override
protected void onCreate(Bundle savedInstanceState) {

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main);
this.view = new LoginView(findViewById(R.id.login_view));
}
}

It creates and holds onto the LoginView, which looks like this:

public class LoginView implements LoginModel.LoginModelObserver {

private final View view;

private TextView usernameLabel;
private EditText usernameField;

private TextView passwordLabel;
private EditText passwordField;

private Button submitButton;

private ProgressBar loadingIndicator;

private View errorView;
private TextView errorLabel;
private Button errorDismissButton;

private LoginController controller;
private LoginModel model;

public LoginView(View view) {

this.view = view;

this.model = new LoginModel();
this.controller = new LoginController(model);

// Assign model observer
model.observer = this;

// Assign view fields
usernameLabel = view.findViewById(R.id.username_label);
usernameField = view.findViewById(R.id.username_field);

passwordLabel = view.findViewById(R.id.password_label);
passwordField = view.findViewById(R.id.password_field);

submitButton = view.findViewById(R.id.submit_button);

loadingIndicator = view.findViewById(R.id.loading_indicator);

errorView = view.findViewById(R.id.error_view);
errorLabel = view.findViewById(R.id.error_label);
errorDismissButton = view.findViewById(R.id.error_dismiss_button);

// Configure Labels
usernameLabel.setText(this.model.getUsernameLabelText());
passwordLabel.setText(this.model.getPasswordLabelText());

submitButton.setText(this.model.getSubmitButtonText());
errorDismissButton.setText(this.model.getErrorDismissButtonText());

// Assign text update listeners
usernameField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

controller.usernameFieldEdited(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

passwordField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

controller.passwordFieldEdited(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

// Assign click handlers
usernameField.setOnClickListener(v -> controller.usernameFieldPressed());
passwordField.setOnClickListener(v -> controller.passwordFieldPressed());
submitButton.setOnClickListener(v -> controller.submitPressed());
errorDismissButton.setOnClickListener(v -> controller.errorDismissPressed());
}

public View getView() {

return this.view;
}

@Override
public void beginEditingUsername() {

usernameField.requestFocus();
}

@Override
public void beginEditingPassword() {

passwordField.requestFocus();
}

@Override
public void usernameUpdated(String username) {

usernameField.setText(username);
}

@Override
public void passwordUpdated(String password) {

passwordField.setText(password);
}

@Override
public void enableSubmitUpdated(boolean enabled) {

submitButton.setEnabled(enabled);
}

@Override
public void processingUpdated(boolean processing) {

loadingIndicator.setVisibility(processing ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void errorUpdated(boolean hasError, String description) {

errorView.setVisibility(hasError ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void finishLogin() {

// Start home page activity
}
}

The Controller now looks like this:

public class LoginController {

private LoginModel loginModel;

public LoginController(LoginModel loginModel) {

this.loginModel = loginModel;
}

public void usernameFieldPressed() {

loginModel.requestEditUsername();
}

public void passwordFieldPressed() {

loginModel.requestEditPassword();
}

public void usernameFieldEdited(String text) {

loginModel.setUsername(text);
}

public void passwordFieldEdited(String text) {

loginModel.setPassword(text);
}

public void submitPressed() {

loginModel.attemptLogin();
}

public void errorDismissPressed() {

loginModel.dismissError();
}
}

And finally the Model, where the business logic lives:

class LoginModel implements LoginService.OnResultHandler {

private static final int MAX_USERNAME_LENGTH = 16;
private static final int MAX_PASSWORD_LENGTH = 24;

private final String usernameLabelText;
private final String passwordLabelText;
private final String submitButtonText;
private final String errorDismissButtonText;

public static interface LoginModelObserver
{
void beginEditingUsername();
void beginEditingPassword();

void usernameUpdated(String username);
void passwordUpdated(String password);

void enableSubmitUpdated(boolean enabled);

void processingUpdated(boolean processing);
void errorUpdated(boolean hasError, String description);

void finishLogin();
}

LoginModelObserver observer;

private LoginService loginService;

private String username;
private String password;

private boolean processing;
private String errorDescription;

public LoginModel() {

this.loginService = new LoginService(this);

this.usernameLabelText = "Username:";
this.passwordLabelText = "Password:";

this.submitButtonText = "Submit";
this.errorDismissButtonText = "Try Again";
}

public String getUsernameLabelText() {

return usernameLabelText;
}

public String getPasswordLabelText() {

return passwordLabelText;
}

public String getSubmitButtonText() {

return submitButtonText;
}

public String getErrorDismissButtonText() {

return errorDismissButtonText;
}

public void requestEditUsername()
{
observer.beginEditingUsername();
}

public void requestEditPassword()
{
if(username.length() > 0)
{
observer.beginEditingPassword();
}
else
{
setError("Please enter a username");
}
}

public void setUsername(String username)
{
if(username.length() > MAX_USERNAME_LENGTH)
username = username.substring(0, MAX_USERNAME_LENGTH);

if(username.equals(this.username))
return;

this.username = username;

observer.usernameUpdated(this.username);
}

public void setPassword(String password)
{
if(password.length() > MAX_PASSWORD_LENGTH)
password = password.substring(0, MAX_PASSWORD_LENGTH);

if(password.equals(this.password))
return;

this.password = password;

observer.passwordUpdated(this.password);
}

public void attemptLogin()
{
setProcessing(true);

loginService.submit(username, password);
}

public void dismissError() {

setError(null);
}

@Override
public void onResult(boolean loggedIn, String errorDescription) {

setProcessing(false);

if(loggedIn)
{
observer.finishLogin();
}
else
{
setError(errorDescription);
}
}

private void updateSubmitEnabled() {

boolean enabled = username.length() > 0 && password.length() > 0;

observer.enableSubmitUpdated(enabled);
}

private void setProcessing(boolean processing) {

this.processing = processing;
observer.processingUpdated(this.processing);
}

private void setError(String errorDescription) {

this.errorDescription = errorDescription;
observer.errorUpdated(this.errorDescription != null, this.errorDescription);
}
}

Now we have components that aren’t much smaller, but are at least more cohesive. The View is only managing the hierarchy of Android View components, and the Model is only managing the business logic. The are communicating by the View observing the Model. In this case the observing is one-to-one. Typically the Observer Pattern is one-to-many, but we don’t need multiple observers yet.

Now, the next refactoring step would be to introduce MVC components for the parts of the login screen. The login screen has two text entry rows. We can define an abstraction for a text entry row, which has a label (to describe what the entry is for) and a text field for making the entry. Following the MVC pattern, there will be three components for this abstraction. The first is a TextEntryRowView:

public class TextEntryRowView implements TextEntryRowModel.Observer {

private final View view;

private TextView label;
private EditText field;

private TextEntryRowController controller;
private TextEntryRowModel model;

public TextEntryRowView(View view, TextEntryRowModel model) {

this.view = view;

this.model = model;
this.controller = new TextEntryRowController(model);

// Add model observer
model.addObserver(this);

// Assign view fields
label = view.findViewById(R.id.label);
field = view.findViewById(R.id.field);

// Configure Label
label.setText(model.getLabelText());

// Assign text update listeners
field.addTextChangedListener(this.controller);

// Assign click handlers
field.setOnClickListener(v -> controller.fieldPressed());
}

public View getView() {

return this.view;
}

@Override
public void editRequestDeclined(TextEntryRowModel model) {

}

@Override
public void beginEditing(TextEntryRowModel model) {

field.requestFocus();
}

@Override
public void fieldTextUpdated(TextEntryRowModel model, String text) {

field.setText(text);
}
}

Then we have a TextEntryRowController:

class TextEntryRowController implements TextWatcher {

private TextEntryRowModel model;

public TextEntryRowController(TextEntryRowModel model) {

this.model = model;
}

public void fieldPressed() {

model.requestEdit();
}

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

model.setFieldText(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
}

The intention here is really that the Controller should be called when the user interacts with the keyboard to type into the text field. What is actually happening is that the Controller is implementing the TextWatcher interface provided by Android. This will get called even if the text is changed programmatically. For the sake of this example, I didn’t go through the trouble of filtering out those programmatic changes, but ideally the Controller would intercept only user-initiated text-change events, and those events would not change the text in the field unless the Controller decided to tell the Model to do so. This way, simply omitting the call to the Model would effectively disable editing (by the user) of the text field.

And now the TextEntryRowModel:

public class TextEntryRowModel {

public static interface Observer {

void editRequestDeclined(TextEntryRowModel model);

void beginEditing(TextEntryRowModel model);
void fieldTextUpdated(TextEntryRowModel model, String text);
}

public TextEntryRowModel(String labelText, int maxLength) {

this.labelText = labelText;
this.maxLength = maxLength;

observers = new ArrayList<>();
}

private List<Observer> observers;

private int maxLength;

private boolean editable;

private String labelText;
private String fieldText;

public void addObserver(Observer observer) {

observers.add(observer);
}

public boolean getEditable() {

return editable;
}

public void setEditable(boolean editable) {

this.editable = editable;
}

public String getLabelText() {

return labelText;
}

public String getFieldText() {

return fieldText;
}

public void setFieldText(String fieldText) {

if (fieldText.length() > maxLength)
fieldText = fieldText.substring(0, maxLength);

if (fieldText.equals(this.fieldText))
return;

this.fieldText = fieldText;

for(Observer observer: observers)
observer.fieldTextUpdated(this, this.fieldText);
}

public void requestEdit() {

if(editable) {

for(Observer observer: observers)
observer.beginEditing(this);
}
else {

for(Observer observer: observers)
observer.editRequestDeclined(this);
}
}
}

Notice that in this case, the observers are one-to-many. This is now necessary. You can see the TextEntryRowView needs to observe its Model, to know when the field text is updated. Also notice that the only publicly visible place to change the field text is in the model, not the view. The Android TextView holds the text being displayed, because that’s how the Android framework is designed. But that TextView is a private member of TextEntryRowView. The intention is that anyone, including the Controller that receives the user’s typing events, must tell the Model to update the text. The Model then broadcasts that change, allowing any number of interested objects to be notified that the text changed.

Also notice that in the setter for the text, we are checking whether the incoming text is the same as what is already stored in the Model. The Model, View and Controller are tied to each other in a loop. A change to the Model will trigger the View to update, and if we really want to ensure the two stay in sync, a change to the View will trigger the Model to update (in this case that happens because we are using the TextWatcher interface, which gets notified by all changes to the field’s text). This can cause an infinite loop, in which the View updates the Model, which updates the View, which updates the Model, and so on. To prevent this, at some point in the chain we need to check to make sure we aren’t making a redundant update. Doing so will terminate the loop after it makes one full cycle. This is a common pattern, especially in reactive programming. I call these loops “reactive loops”.

We do the same thing for the error view, which is another abstraction we can identify. We start with an ErrorView:

public class ErrorView implements ErrorModel.Observer {

private final View view;

private TextView descriptionLabel;
private Button dismissButton;

private ErrorController controller;
private ErrorModel model;

public ErrorView(View view, ErrorModel model) {

this.view = view;

this.model = model;
this.controller = new ErrorController(model);

// Add model observer
model.addObserver(this);

// Assign view fields
descriptionLabel = this.view.findViewById(R.id.label);
dismissButton = this.view.findViewById(R.id.dismiss_button);

dismissButton.setText(this.model.getDismissButtonText());

dismissButton.setOnClickListener(v -> controller.dismissPressed());
}

public View getView() {

return this.view;
}

@Override
public void dismissRequested() {

}

@Override
public void descriptionUpdated(String description) {

descriptionLabel.setText(description);
}
}

Then the ErrorController:

class ErrorController {

private ErrorModel model;

public ErrorController(ErrorModel model) {

this.model = model;
}

public void dismissPressed() {

model.dismiss();
}
}

And the ErrorModel:

public class ErrorModel {

public static interface Observer {

void dismissRequested();

void descriptionUpdated(String description);
}

public ErrorModel(String dismissButtonText) {

this.dismissButtonText = dismissButtonText;

this.observers = new ArrayList<>();
}

private List<Observer> observers;
private String description;

private String dismissButtonText;

public void setDescription(String description) {

this.description = description;

for(Observer observer: observers)
observer.descriptionUpdated(this.description);
}

public String getDismissButtonText() {

return dismissButtonText;
}

public void dismiss() {

for(Observer observer: observers)
observer.dismissRequested();
}

public void addObserver(Observer observer) {

this.observers.add(observer);
}
}

Now, the Login components will use these new classes. The LoginView now looks like this:

public class LoginView implements LoginModel.Observer {

private final View view;

private TextEntryRowView usernameRow;
private TextEntryRowView passwordRow;

private Button submitButton;

private ProgressBar loadingIndicator;

private ErrorView errorView;

private LoginController controller;
private LoginModel model;

public LoginView(View view) {

this.view = view;

this.model = new LoginModel();
this.controller = new LoginController(model);

// Assign model observer
model.addObserver(this);

// Assign view fields
usernameRow = new TextEntryRowView(this.view, this.model.getUsernameModel());
passwordRow = new TextEntryRowView(this.view, this.model.getPasswordModel());

errorView = new ErrorView(this.view, this.model.getErrorModel());
submitButton = view.findViewById(R.id.submit_button);
loadingIndicator = view.findViewById(R.id.loading_indicator);

submitButton.setOnClickListener(v -> controller.submitPressed());
}

public View getView() {

return this.view;
}

@Override
public void enableSubmitUpdated(boolean enabled) {

submitButton.setEnabled(enabled);
}

@Override
public void processingUpdated(boolean processing) {

loadingIndicator.setVisibility(processing ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void hasErrorUpdated(boolean hasError) {

errorView.getView().setVisibility(hasError ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void finishLogin() {

// Start home page activity
}
}

The LoginController looks like this:

public class LoginController {

private LoginModel model;

public LoginController(LoginModel model) {

this.model = model;
}

public void submitPressed() {

model.attemptLogin();
}
}

And the LoginModel looks like this:

public class LoginModel implements LoginService.OnResultHandler, TextEntryRowModel.Observer, ErrorModel.Observer {

    public interface Observer {

        void enableSubmitUpdated(boolean enabled);

        void processingUpdated(boolean processing);
        void hasErrorUpdated(boolean hasError);

        void finishLogin();
    }

    public static final int MAX_USERNAME_LENGTH = 16;
    public static final int MAX_PASSWORD_LENGTH = 24;

    private List<Observer> observers;

    private TextEntryRowModel usernameModel;
    private TextEntryRowModel passwordModel;
    private ErrorModel errorModel;

    private final String submitButtonText;

    private LoginService loginService;
    private boolean submitEnabled;
    private boolean processing;
    private boolean hasError;

    LoginModel() {

        this.usernameModel = new TextEntryRowModel("Username:", MAX_USERNAME_LENGTH);
        this.passwordModel = new TextEntryRowModel("Password:", MAX_PASSWORD_LENGTH);
        this.errorModel = new ErrorModel("Try Again");

        this.observers = new ArrayList<>();

        this.submitButtonText = "Submit";

        this.loginService = new LoginService(this);

        this.usernameModel.addObserver(this);
        this.passwordModel.addObserver(this);
        this.errorModel.addObserver(this);
    }

    public TextEntryRowModel getUsernameModel() {

        return usernameModel;
    }

    public TextEntryRowModel getPasswordModel() {

        return passwordModel;
    }

    public ErrorModel getErrorModel() {

        return errorModel;
    }

    public void addObserver(Observer observer) {

        observers.add(observer);
    }

    public String getSubmitButtonText() {

        return submitButtonText;
    }

    public void attemptLogin() {

        setProcessing(true);

        loginService.submit(usernameModel.getFieldText(), passwordModel.getFieldText());
    }

    private void setSubmitEnabled(boolean submitEnabled) {

        this.submitEnabled = submitEnabled;

        for(Observer observer: observers)
            observer.enableSubmitUpdated(processing);
    }

    private void setProcessing(boolean processing) {

        this.processing = processing;

        for(Observer observer: observers)
            observer.processingUpdated(processing);
    }

    private void setHasError(boolean hasError) {

        this.hasError = hasError;

        for(Observer observer: observers)
            observer.hasErrorUpdated(this.hasError);
    }

    private void setError(String errorDescription) {

        setHasError(errorDescription != null);
        errorModel.setDescription(errorDescription);
    }

    // LoginService.OnResultHandler
    @Override
    public void onResult(boolean loggedIn, String errorDescription) {

        setProcessing(false);

        if(loggedIn)
        {
            for(Observer observer: observers)
                observer.finishLogin();
        }
        else
        {
            setError(errorDescription);
        }
    }

    // TextRowEntryModel.Observer
    @Override
    public void editRequestDeclined(TextEntryRowModel model) {

        if(model == passwordModel && model.getFieldText().length() == 0)
            setError("Please enter a username");
    }

    @Override
    public void beginEditing(TextEntryRowModel model) {

    }

    @Override
    public void fieldTextUpdated(TextEntryRowModel model, String text) {

        if(model == usernameModel)
            passwordModel.setEditable(text.length() > 0);

        boolean submitEnabled = usernameModel.getFieldText().length() > 0 && passwordModel.getFieldText().length() > 0;
        setSubmitEnabled(submitEnabled);
    }

    // ErrorModel.Observer
    @Override
    public void dismissRequested() {

        setHasError(false);
    }

    @Override
    public void descriptionUpdated(String description) {

    }
}

Here is the one-to-many Observer Pattern in action, and this demonstrates fundamentally how various parts of an application, on any level, communicate with each other: by inter-model observation. That is how changes are propagated around a page of the app, keeping various components in sync with each other. This does not disrupt a View staying in sync with its own Model because there can be multiple observers. It is a business logic concern that one part of the use case changing requires another part of the use case to change. This is not a visual logic concern, and should not be done in views.

The code is in a fairly good state now, but for the sake of illustration I will do one more round of refactoring and create MVC bundles on the level of individual widgets. At this point we’re actually hiding and overriding certain aspects of the framework. The Android framework is not designed for individual widgets to be MVC bundles. We can essentially adapt/wrap the framework, which additionally decouples our application code almost entirely from the platform on which it is running.

First we’ll make the MVC components for a text field (which can be either editable or read-only), starting with a TextFieldView:

public class TextFieldView implements TextFieldModel.Observer {

private TextView view;

private TextFieldController controller;
private TextFieldModel model;

public TextFieldView(TextView view, TextFieldController controller, TextFieldModel model) {

this.view = view;
this.controller = controller;
this.model = model;

this.view.setText(this.model.getText());
this.view.setTypeface(this.model.getFont());
this.view.setTextColor(this.model.getTextColor());

this.view.setOnClickListener(this.controller);
this.view.addTextChangedListener(this.controller);

this.model.addObserver(this);
}

@Override
public void beginEditing(TextFieldModel model) {

view.requestFocus();
}

@Override
public void textUpdated(TextFieldModel model, String text) {

view.setText(text);
}
}

TextFieldController is just an interface, because what exactly should happen when a user interacts with a text field depends on the context:

public interface TextFieldController extends View.OnClickListener, TextWatcher {

}

The Android framework provides interfaces for handling view clicks and text updates (again, ideally we’d want only user-initiated text updates, but for brevity we’ll just piggyback on what Android gives us). So the Controller just extends these already existing interfaces.

One useful implementation we can provide right off the bat is a Controller that does nothing, which disables user interaction with the text field and makes it read-only. This is the the NullTextFieldController:

public class NullTextFieldController implements TextFieldController {

@Override
public void onClick(View v) {


}

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

}

@Override
public void afterTextChanged(Editable s) {

}
}

(Disabling a TextField also requires setting focusable to false. I’m ignoring this in the example)

Then we have the TextFieldModel:

public class TextFieldModel {

public static interface Observer {

void beginEditing(TextFieldModel model);
void textUpdated(TextFieldModel model, String text);
}

private List<Observer> observers;

private String text;
private Typeface font;
private int textColor;

public TextFieldModel(String text) {

this.text = text;

this.observers = new ArrayList<>();
}

public void addObserver(Observer observer) {

observers.add(observer);
}

public String getText() {

return text;
}

public void setText(String text) {

if(text.equals(this.text))
return;

this.text = text;

for(Observer observer: observers)
observer.textUpdated(this, this.text);
}

public Typeface getFont() {

return font;
}

public int getTextColor() {

return textColor;
}

public void beginEditing() {

for(Observer observer: observers)
observer.beginEditing(this);
}
}

The key distinction here is that the data being displayed by a text field now lives in a Model, not in the View (as is typically the case in these platform frameworks). The data for a text field includes the text it is displaying, plus any presentation data (font, color, etc.). Because a Model is observable, anyone (and multiple listeners at once) can listen to changes to what this text field is displaying. As with the previous example, the Model becomes the one place where the outside world can and should change what the text field.

Now let’s do the same for a button, starting with a ButtonView:

public class ButtonView implements ButtonModel.Observer {

    private Button view;

    private ButtonController controller;
    private ButtonModel model;

    public ButtonView(Button view, ButtonController controller, ButtonModel model) {

        this.view = view;
        this.controller = controller;
        this.model = model;

        this.model.addObserver(this);

        this.view.setText(this.model.getText());
        this.view.setOnClickListener(this.controller);
    }

    @Override
    public void enabledUpdated(boolean enabled) {

        view.setEnabled(enabled);
    }
}

Again, the ButtonController is just an interface, so we can decide in each case what happens when a button is pressed:

public interface ButtonController extends View.OnClickListener {

}

And the ButtonModel:

public class ButtonModel {

    public static interface Observer {

        void enabledUpdated(boolean enabled);
    }

    private List<Observer> observers;

    private String text;
    private boolean enabled;

    public ButtonModel(String text) {

        this.text = text;

        this.observers = new ArrayList<>();
    }

    public void addObserver(Observer observer) {

        observers.add(observer);
    }

    public String getText() {

        return text;
    }

    public void setEnabled(boolean enabled) {

        this.enabled = enabled;

        for(Observer observer: observers)
            observer.enabledUpdated(this.enabled);
    }
}

A fully featured ButtonModel would hold everything about a button’s state, including whether it is selected and/or highlighted, any icons, etc.

Now we can use these to implement TextEntryRow, starting with the View:

public class TextEntryRowView implements TextEntryRowModel.Observer {

    private final View view;

    private TextFieldView label;
    private TextFieldView field;

    private TextEntryRowController controller;
    private TextEntryRowModel model;

    public TextEntryRowView(View view, TextEntryRowController controller, TextEntryRowModel model) {

        this.view = view;

        this.model = model;
        this.controller = controller;

        // Add model observer
        model.addObserver(this);

        // Assign view fields
        label = new TextFieldView(view.findViewById(R.id.label), this.controller.getLabelController(), this.model.getLabelModel());
        field = new TextFieldView(view.findViewById(R.id.field), this.controller.getFieldController(), this.model.getFieldModel());
    }

    public View getView() {

        return this.view;
    }

    @Override
    public void editRequestDeclined(TextEntryRowModel model) {


    }

    @Override
    public void fieldTextUpdated(TextEntryRowModel model, String text) {
        
    }
}

Then the Controller:

class TextEntryRowController {

private final NullTextFieldController labelController;
private TextFieldController fieldController;

private TextEntryRowModel model;

TextEntryRowController(TextEntryRowModel model) {

this.model = model;

this.labelController = new NullTextFieldController();

this.fieldController = new TextFieldController() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

model.setFieldText(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}

@Override
public void onClick(View v) {

model.requestEdit();
}
};
}

TextFieldController getLabelController() {

return labelController;
}

TextFieldController getFieldController() {

return fieldController;
}
}

Now it is the TextEntryRowController deciding that the first TextField (the label) is read-only, by assigning it a NullTextFieldController. For the other TextField, the Controller sends a message to its Model, not the TextField’s model. This makes the TextEntryRowModel responsible for how, and if, to update the field.

Here is the Model:

public class TextEntryRowModel implements TextFieldModel.Observer {

    public static interface Observer {

        void editRequestDeclined(TextEntryRowModel model);
        void fieldTextUpdated(TextEntryRowModel model, String text);
    }

    public TextEntryRowModel(String labelText, int maxLength) {

        this.labelModel = new TextFieldModel(labelText);
        this.fieldModel = new TextFieldModel("");

        this.maxLength = maxLength;

        observers = new ArrayList<>();

        this.fieldModel.addObserver(this);
    }

    private List<Observer> observers;

    private int maxLength;

    private boolean editable;

    private TextFieldModel labelModel;
    private TextFieldModel fieldModel;

    public void addObserver(Observer observer) {

        observers.add(observer);
    }

    public TextFieldModel getLabelModel() {

        return labelModel;
    }

    public TextFieldModel getFieldModel() {

        return fieldModel;
    }

    public boolean getEditable() {

        return editable;
    }

    public void setEditable(boolean editable) {

        this.editable = editable;
    }

    public String getFieldText() {

        return fieldModel.getText();
    }

    public void setFieldText(String fieldText) {

        fieldModel.setText(fieldText);
    }

    public void requestEdit() {

        if(editable) {

            fieldModel.beginEditing();
        }
        else {

            for(Observer observer: observers)
                observer.editRequestDeclined(this);
        }
    }

    @Override
    public void beginEditing(TextFieldModel model) {

    }

    @Override
    public void textUpdated(TextFieldModel model, String text) {

        if (text.length() > maxLength)
            text = text.substring(0, maxLength);

        if (text.equals(fieldModel.getText()))
            return;

        fieldModel.setText(text);

        for(Observer observer: observers)
            observer.fieldTextUpdated(this, fieldModel.getText());
    }
}

Here we can see TextEntryRowModel updating the TextField by calling the underlying TextFieldModel, which is a private member of TextEntryRowModel.

Now let’s look at the Error view, implemented with the MVC widgets:

public class ErrorView implements ErrorModel.Observer {

private final View view;

private TextFieldView descriptionLabel;
private ButtonView dismissButton;

private ErrorController controller;
private ErrorModel model;

public ErrorView(View view, ErrorController controller, ErrorModel model) {

this.view = view;
this.controller = controller;

this.model = model;

// Add model observer
model.addObserver(this);

// Assign view fields
descriptionLabel = new TextFieldView(this.view.findViewById(R.id.label), this.controller.getDescriptionController(), this.model.getDescriptionModel());
dismissButton = new ButtonView(this.view.findViewById(R.id.dismiss_button), this.controller.getDismissController(), this.model.getDismissModel());
}

public View getView() {

return this.view;
}

@Override
public void dismissRequested() {

}
}

And the Controller:

class ErrorController {

private ErrorModel model;
private TextFieldController descriptionController;
private ButtonController dismissController;

public ErrorController(ErrorModel model) {

this.model = model;

this.descriptionController = new NullTextFieldController();

this.dismissController = new ButtonController() {

@Override
public void onClick(View v) {

model.dismiss();
}
};
}

public TextFieldController getDescriptionController() {

return descriptionController;
}

public ButtonController getDismissController() {

return dismissController;
}
}

And the Model:

public class ErrorModel {

public static interface Observer {

void dismissRequested();
}

public ErrorModel(String dismissButtonText) {

this.descriptionModel = new TextFieldModel("");
this.dismissModel = new ButtonModel(dismissButtonText);

this.observers = new ArrayList<>();
}

private List<Observer> observers;

private TextFieldModel descriptionModel;
private ButtonModel dismissModel;

public void setDescription(String description) {

descriptionModel.setText(description);
}

public void addObserver(Observer observer) {

this.observers.add(observer);
}

public TextFieldModel getDescriptionModel() {

return descriptionModel;
}

public ButtonModel getDismissModel() {

return dismissModel;
}

public void dismiss() {

for(Observer observer: observers)
observer.dismissRequested();
}
}

Here we see the ErrorModel updating the text displayed in the ErrorView by calling the TextFieldModel it holds as a member. All the data coordination is done through a hierarchy of Models. This is the business logic, and it is separated and collected into the Models of the application. The Views only decided how to turn this use case data into visuals, making sure to stay up to date when the use case changes.

Now we can update the Login components to use the Button MVC classes. First the View:

public class LoginView implements LoginModel.Observer {

private final View view;

private TextEntryRowView usernameRow;
private TextEntryRowView passwordRow;

private ButtonView submitButton;

private ProgressBar loadingIndicator;

private ErrorView errorView;

private LoginController controller;
private LoginModel model;

public LoginView(View view) {

this.view = view;

this.model = new LoginModel();
this.controller = new LoginController(model);

// Assign model observer
model.addObserver(this);

// Assign view fields
usernameRow = new TextEntryRowView(this.view, this.model.getUsernameModel());
passwordRow = new TextEntryRowView(this.view, this.model.getPasswordModel());

errorView = new ErrorView(this.view, this.model.getErrorModel());
submitButton = new ButtonView(view.findViewById(R.id.submit_button), this.controller.getSubmitButtonController(), this.model.getSubmitButtonModel());
loadingIndicator = view.findViewById(R.id.loading_indicator);
}

public View getView() {

return this.view;
}

@Override
public void processingUpdated(boolean processing) {

loadingIndicator.setVisibility(processing ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void hasErrorUpdated(boolean hasError) {

errorView.getView().setVisibility(hasError ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void finishLogin() {

// Start home page activity
}
}

Then the Controller:

public class LoginController {

private ButtonController submitButtonController;

private LoginModel model;

public LoginController(LoginModel model) {

this.model = model;

this.submitButtonController = new ButtonController() {

@Override
public void onClick(View v) {

model.attemptLogin();
}
};
}

public ButtonController getSubmitButtonController() {

return submitButtonController;
}
}

Then the Model:

public class LoginModel implements LoginService.OnResultHandler, TextEntryRowModel.Observer, ErrorModel.Observer {

public interface Observer {

void processingUpdated(boolean processing);
void hasErrorUpdated(boolean hasError);

void finishLogin();
}

public static final int MAX_USERNAME_LENGTH = 16;
public static final int MAX_PASSWORD_LENGTH = 24;

private List<Observer> observers;

private TextEntryRowModel usernameModel;
private TextEntryRowModel passwordModel;
private ErrorModel errorModel;

private ButtonModel submitButtonModel;

private LoginService loginService;
private boolean processing;
private boolean hasError;

LoginModel() {

this.usernameModel = new TextEntryRowModel("Username:", MAX_USERNAME_LENGTH);
this.passwordModel = new TextEntryRowModel("Password:", MAX_PASSWORD_LENGTH);
this.errorModel = new ErrorModel("Try Again");

this.submitButtonModel = new ButtonModel("Submit");

this.observers = new ArrayList<>();

this.loginService = new LoginService(this);

this.usernameModel.addObserver(this);
this.passwordModel.addObserver(this);
this.errorModel.addObserver(this);
}

public TextEntryRowModel getUsernameModel() {

return usernameModel;
}

public TextEntryRowModel getPasswordModel() {

return passwordModel;
}

public ErrorModel getErrorModel() {

return errorModel;
}

public ButtonModel getSubmitButtonModel() {

return submitButtonModel;
}

public void addObserver(Observer observer) {

observers.add(observer);
}

public void attemptLogin() {

setProcessing(true);

loginService.submit(usernameModel.getFieldText(), passwordModel.getFieldText());
}

private void setProcessing(boolean processing) {

this.processing = processing;

for(Observer observer: observers)
observer.processingUpdated(processing);
}

private void setHasError(boolean hasError) {

this.hasError = hasError;

for(Observer observer: observers)
observer.hasErrorUpdated(this.hasError);
}

private void setError(String errorDescription) {

setHasError(errorDescription != null);
errorModel.setDescription(errorDescription);
}

// LoginService.OnResultHandler
@Override
public void onResult(boolean loggedIn, String errorDescription) {

setProcessing(false);

if(loggedIn)
{
for(Observer observer: observers)
observer.finishLogin();
}
else
{
setError(errorDescription);
}
}

@Override
public void editRequestDeclined(TextEntryRowModel model) {

if(model == passwordModel && model.getFieldText().length() == 0)
setError("Please enter a username");
}

@Override
public void fieldTextUpdated(TextEntryRowModel model, String text) {

if(model == usernameModel)
passwordModel.setEditable(text.length() > 0);

boolean submitEnabled = usernameModel.getFieldText().length() > 0 && passwordModel.getFieldText().length() > 0;
submitButtonModel.setEnabled(submitEnabled);
}

// ErrorModel.Observer
@Override
public void dismissRequested() {

setHasError(false);
}
}

Now we have a design that properly represents the original intention of MVC. Notice that Controllers are now tiny. They are the smallest components in the code. As intended, the biggest components are the Models. And with Models on every level of the hierarchy, no single Model has too many responsibilities (the LoginModel is about 25% smaller than the MassiveViewController we started with, and almost half of this is boilerplate code like property accessors). In this small example, the total amount of code inflated by quite a bit, but as an application grows larger and more complex, and reusability increases, this pattern will start to significantly reduce the total amount of code needed. All the classes except LoginModel are available for reuse in other areas. Clearly, whatever valid criticisms there are of MVC, “MassiveViewController” isn’t one of them.

There are, of course, other GUI application patterns, like MVP and MVVM, but that’s another topic. When properly understood, any of these patterns, including MVC, will help you factor your applications into small, often reusable components, with high cohesion and encapsulation (with the unit-testability that comes along with these), and none of them will grow too large. If you see an application with huge “view controllers”, especially if they are subclassing framework classes, whatever it is, it isn’t MVC.

Abstraction Layers

Over time, the software we write continues to increase its sophistication.  We are solving more and more advanced problems, expressing the solution as a specific sequence of ones and zeroes stored on the memory of a Turing-complete machine.  These programs are steadily growing larger in size (today’s executables are typically on the order of a few megabytes, which is a few million 8-bit numbers), and require faster or more sophisticated (i.e. multicore) hardware to execute in reasonable time.  But the human mind is not getting more advanced over time, at least not nearly at the same rate.  We are as mentally advanced today as we were in the 1950s, but our computer programs are several orders of magnitude more advanced and complex.  How can the human software developers who produce this code possibly keep up their understanding with something that is so rapidly increasing in complexity?

The answer is by abstraction.  Today’s programmers do not hand-craft the millions of bytes of machine code instructions that ultimately form cutting edge software.  Nor could they ever read the code in that format and hope to comprehend what it does, much less how it does it.  Attempting to read and follow a modern software program in its compiled machine-language format is a clear demonstration of how vastly more complex computer software has become since the days when computer programs were hand-written machine code.  Instead, programmers use high-level programming languages to write code.  These languages contain much more abstract concepts than “add”, “move”, “jump” or the other concepts that machine instructions represent.

Properly understand, even machine code itself is an abstraction.  Each machine instruction represents an abstract operation done to the state of the machine.  We can take this abstraction away and watch the operation of a machine while executing a program, from the perspective of its electronic state.  We can record the time at which different gates are switched to produce different sub-circuits.  But even this is an abstraction.  We can go below the level of switching gates and hook voltmeters to the circuits to produce a graph of voltage over time.  While trying to understand what a computer program does by reading its machine code is hopeless, trying to understand what it does by recording the physical state of a machine running it is far more hopeless.  It would be hopeless even to understand hand-written machine code from the 50s in this way (and doing so would proceed by first trying to rediscover the machine code).  Even the very low-level (but very high-level, from the perspective of the electronic hardware that implements our Turing machines) abstraction of a central processor and a sequence of instructions to process aids massively in our ability to comprehend and compose software.  Not even a trivial computer program could be feasibly designed by specifying the voltage on the terminals of circuits as a function of time.

But even when looking at modern programs on the higher level of their source code they are still, as a whole, intractable to human comprehension.  It is not uncommon for a modern computer program to contain millions of lines of source code.  How could a human possibly be able to understand something with millions of interworking parts? 

The answer, again, is by abstraction.  A “line” of source code is the lowest level of abstraction in a program’s source code.  These lines are grouped together into functions.  Functions are grouped together into classes.  Classes are grouped together into modules.  Modules are grouped together into subsystems.  Subsystems are grouped together into libraries.  Libraries are grouped together into applications.  We can understand and follow something that ultimately takes millions of lines of code to express because we do not digest it in the form of raw lines.  We understand the application as a composition of a handful of libraries.  We do not attempt to understand how the libraries do what they do, as we try to understand the applications that use them.  We only understand the libraries by what they do, and from that, we understand the application in terms of how it uses those libraries to implement its own behaviors.  In a well-designed software application, the application-level code is not millions of lines of code.  It is thousands, or even merely hundreds, of lines of code.  This is what makes it tractable to the human mind.

But each of those lines is now far removed from what a computer can understand.  A line of code calling a high-level library ultimately gets compiled down to what could be thousands of machine code instructions.  By identifying a boundary between what and how, and equivalently why and what, we are able to take what is otherwise a massive and impenetrable problem, and factor it into individually digestible pieces.  We do not simply divide the problem into parts by size.  We cannot understand a compiled binary of millions of machine instructions by taking the first thousand and looking at them in isolation, then the next thousand, and so on.  The division is conceptual and builds up a hierarchy of higher and higher-level concepts, linked together by a why-how relationship.  The result is a series of layers, like the floors of a building.  We call these abstraction layers.

An abstraction layer is a “why”, “what” or “how” depending on from what perspective we are looking at it.  When considering a particular abstraction layer, it becomes the subject of our consideration: the “what”.  The layer immediately above it is, from this perspective, the “why”.  This abstraction layer exists because the abstraction layer above it needs it.  The layer immediately below it is, from this perspective, the “how”.  It is what the current abstraction layer uses as its implementation details.  If we then focus on the next layer down, it becomes the “what”, the previous layer becomes “why”, and the next layer down becomes “how”.

Abstraction layers are represented in different ways with different types of programming languages.  In object-oriented languages, an abstraction layer is identified by a class, which has two parts: an interface and an implementation.  When focusing on an interface, an implementation of that interface is the how.  When focusing on the implementation, the interface is the why.  Classes then link to each other through composition.  The implementation of a class contains fields, which are references to other interfaces.  A class’s implementation calls methods on its members, and on the parameters of its own methods.  So then these other interfaces are the how of an implementation.  When interface A is implemented by AImp, and AImp is composed of interfaces B and C, then A, AImp, and the set containing B and C each form abstraction layers, in order of most abstract to least abstract.  AImp is the “how” of “A”, while A is the “why” of AImp.  B and C are the “how” of AImp, while AImp is the (or a) “why” of B and C.  Somewhere there will be a BImp and CImp, which continues the sequence of abstraction layers.

In functional languages, function declarations and function bodies perform the analogous roles to interfaces and implementations.  A function body is the “how” of a function declaration, and the function declaration is the “why” of a function body.  Meanwhile, a function body contains a sequence of calls to other function declarations (note that a call to a function is a reference to a function declaration, not to a function body).  When function declaration A has a function body ABody, and ABody calls function declarations B and C, then A, ABody, and the set containing B and C form the analogous abstraction layers to A, AImp and {B, C} in the object-oriented example above.

Programmers navigate a computer program by starting at one implementation, and if needed, clicking on a line of code and selecting “go to definition”, which takes them to another implementation, with other lines of code that can also be followed to their definition.  This is a navigation of abstraction layers, and demonstrates how they link together repeatedly.

This structure of different layers being linked together is fractal.  On one level, a block of code is formed as multiple lines that call other blocks of code.  Those blocks are similarly formed as multiple lines to yet other blocks of code.  Thus the structure of code exhibits self-similarity at different scales.

Note that I said a well-designed application will contain a few hundred or thousand lines of code, in the form of calls to highly abstract library functions.  But a poorly designed application may not organize itself into libraries at all, or do so in a poor fashion that prevents one from truly forgetting about what is under the hood of a single line of code.  Lacking or improper abstractions forces one to digest a larger amount of information in a single “bite” in order to comprehend what the program does.  This makes the program more difficult to understand, because a larger chunk of its inherent complexity must be considered all at once.  Any small part of that chunk’s complexity requires dealing with all the rest of that chunk’s complexity.  While no human programmer could possibly understand the machine-code version of a modern software program, it is commonplace for the source code of a modern application to stretch the ability of human comprehension to its limits.  This takes the form of poorly designed computer code that is missing proper, well-formed abstractions that truly divide the problem into small, and truly distinct, bitesize pieces.

This leads us to the following principle of good software design, that serves as the foundation for all other software design principles:

The complexity in understanding computer code primarily varies proportionally with the distance between its abstraction layers

Each part of a computer program, however abstract, is implemented with less abstract, more concrete code, until one reaches the level of machine code.  By “distance between abstraction layers”, we mean how much less abstract a certain layer’s implementation is than its interface.  If the gap is very large, a class’s methods will inevitably be very long, difficult to follow and difficult to understand.  The gaps can be closed by introducing dividing abstractions: an abstraction layer placed above the low-level implementation details as they currently are, but below the high-level interface being implemented.  The implementation of those intermediate abstractions is simpler because the abstraction is closer to the implementation details.  Meanwhile, the implementation of the higher abstraction in terms of this intermediate abstraction is simpler than the original implementation, for the same reason.

From this it is clear that more and more advanced computer programs, with human designers, are only made possible by building upon the less advanced programs we have already built.  This is quite literally how computers have advanced.  The first computer programs had to be hand-written in machine code.  But then programmers hand-wrote the machine code for an assembler, which enabled them to then write programs in assembly.  With an assembler, they could then write a self-hosting assembler: an assembler whose own source code is not only written in its own assembly language, but that can be successfully assembled by itself.  Then they wrote a BASIC compiler in assembly, which then enabled writing a self-hosting BASIC compiler.  Then they wrote a C compiler in BASIC, and then a self-hosting C compiler.  Then they wrote a C++ or Smalltalk compiler in C, and then self-hosting C++/Smalltalk compilers.  Today, we have high-level programming languages like Java, which were, and often still are, implemented with lower level languages.  Each step is a tool, that becomes the means to constructing a more sophisticated tool, which in turn becomes the means to construct an even more sophisticated tool, and so on.  The tools, which are computer programs themselves, becomes more and more sophisticated, which enables the creation of not only sophisticated programs in general, but more sophisticated tools in particular.

This process is not peculiar to the development and advancement of computer software.  It is rather the general means by which humans have produced all the extremely advanced and sophisticated things they have produced.  A man uses his bare hands to fashion tools from stone, which he then uses to fashion a forge, which he then uses to fashion metal tools, which he then uses to fashion mechanical devices, which he then uses to fashion engines, which he then uses to fashion factories, which he then uses to fashion electronics, which he then uses to fashion all the advanced technology that surrounds us today.  We start by hand-crafting consumer goods.  Then we hand-craft tools that we use to craft consumer goods.  Then we hand-craft tools to make other tools.  Then we craft tools to make the tools we use to make other tools.  And so on.

Economists call this process the elongation of the production structure: a process by which the production goes through increasingly more steps.  Instead of directing building the thing we want, first we build a thing that builds the thing we want.  Even more indirectly, we build the thing that builds the thing that builds the thing we want.  This continues until, in a modern industrial and electronic economy, the actual end-to-end process of manufacturing a good from nature-given resources, when taking into account the production of all the tools used in its production, takes hundreds or thousands of steps, involving resources acquired from and shipped around all parts of the world, and occurring over many years or even decades. 

A modern economy is so complex that it can never be understood all at once by any of the humans whose actions constitute that economy.  Nor does it need to be understood all at once by its operators in order to function.  Instead parts of it are understood in isolation, and in how they fit into the parts immediately adjacent.  No people organize or coordinate the economy as a whole.  A person focuses on one small part, which is possible because the incredibly complex process of modern production has been factored out into conceptually well-formed parts (repeatedly, in a fractal way) that remain well-defined and identifiable in isolation.  If anyone attempted to understand a production process by reading a graph of positions of all the factors involved, as a function of time, they would be hopelessly lost.

The factors of production (tools and machines) of modern industry are implementations of abstractions.  We are able to define the requirement of a tool or machine as a derived requirement of producing something else (another tool/machine or a consumer good/service), because we are able to identify a high-level concept in the process of producing something.  If we defined production of a good in terms of the sequence of movements of physical objects over time that takes place in the process as it is done now, we would have no way of moving to a different sequence of movements of different objects and meaningfully say that the same production process has occurred (or even that the same thing has been produced).  By identifying the “what” as distinct from the “how”, the “how” becomes interchangeable.  This is the only way to correctly express a requirement.  The definition of producing a sandwich does not include details like taking a bladed piece of metal and moving it in a zig-zag pattern through a piece of bread.  Such details do not define a sandwich.  What defines a sandwich is sliced bread.  That definition relies on our ability to identify a high-level abstraction called “sliced”, which can be independently defined and verified.  It is not just a matter of allowing variation in the implementation details of making a sandwich.  It is about correctness.  It is simply wrong to define a sandwich by how the bread was sliced.

This is what we do in computer software when we abstract it.  We correctly define the requirement, which defines the what and not the how.  At the same time, the requirement itself is the “how” of some other, higher-level and more abstract requirement.  For example, the requirement to present an upgrade screen to a user is the “how” of a more abstract requirement to enable users to upgrade their accounts, which itself is the “how” of a still more abstract requirement to maximize profits.  On each level, it is not simply inconvenient or inflexible to put the “how” into the definition of a requirement.  It is simply wrong.  It does not correctly express what the requirement actually is, in the sense of specifying what conditions need to be met in order to say the requirement has been satisfied.

This is so deeply entwined into the structure of human thought, it is not really possible for us to imagine anything without it.  What we call “abstractions” here, are what in language are called “words”.  Every word in a language is an abstraction.  A word has a definition, which is another collection of words.  A word a high-level abstraction, with the words in its definition being lower-level abstractions.  The process of the human mind taking in data and structuring it into a form that is comprehensible to logical thought, is a process of abstraction.  To try to think about something without abstractions at all is to try to think without using language (even one you invented yourself), which is an oxymoron.

Recognizing the fundamental role of abstracting, and more specifically properly abstracting, while designing computer software, is none other than recognizing that abstracting underlies the very process of logical structuring that the human mind does to make reality understandable.  It has perhaps required more explicit emphasis in software than in other places (like manufacturing), because the virtual worlds we are creating in software are more malleable than the real one.  It is less obvious that the higher-level concepts in our code must follow a logical structure, because we create them from scratch (in a sense), than the higher-level physical entities we construct in the real world.  It is perhaps easier to see why a car needs to be built as a composition of an engine, a transmission, an axel, and so on, than it is to see why an application needs to be built as a composition of a user interface, bindings, models, use cases, services, stores and so on.  After all, aren’t all of these things just “made up”?  It’s all 1s and 0s in the end, right?

But these are all just mental constructs.  That a car is composed of an engine, a transmission, an axel, and so on, is only apparent to the mind of a rational observer.  It is not part of the physics of the car itself, which is, ultimately, just a distribution of mass and energy throughout space over time.  These “parts” of a car as just as “made up” as the parts of a software application.  They are both abstractions above the “raw” physical level of reality.  As they belong to the same category, they are just as important.  Trying to build software without abstractions (specifically proper abstractions) is as hopeless as building a car as a big jumbled pile of moving masses.  Good design of computer software ultimately comes down to whether the problem being solved has been correctly understood and broken down, with all the lines between what/why and how/what have been drawn in the right place.  Good design derives from identifying the proper abstractions, and expressing them as such in code.

If you find yourself straining to comprehend the codebase you are working on, it could be that the problem you are trying to solve is so irreducibly complex that it is almost impossible to grasp.  But much more likely (especially if you are working on a GUI application), your codebase is poorly abstracted and needs to be conceptually organized.  Good design all flows downhill from having the virtual world of a codebase composed of well-defined abstractions (the “well-defined” part is typically given names like “high cohesion” in design discussions, which really means the “thing” being considered has a concise and straightforward definition).  The benefit you reap from discovering and using such abstractions is as great as the benefit to human society of creating their wealth with a series of tools and machines rather than by hand.  It will be the difference between an impoverished software shop and an affluent one.