Introduction
What is “agile” software development?
Well, more than anything, it’s definitely not waterfall.
At least in the case in this age of big consulting firms teaching software shops how to be “properly agile” (Big Agile, or Agile, Inc., as we call it), that’s pretty much its definition. Waterfall is the ultimate scapegoat. However much we try to pin down its essential features, what matters, and what ultimately verifies whether we have correctly described it, is that it is the cause of everything that ever went wrong in the software world.
Accusing something of being “waterfall” is the ultimate diss in the agile world. The defense will always be to deny the charge as ridiculous.
This really hits home if you, like the proverbial heretic going through a crisis of faith, start doing some research and discover that “waterfall” originated in a paper (not by name) as the description of a failure mode in software development where fundamental flaws (meaning in the very design of a feature) are discovered after designing and developing software, during testing at the very end. Royce’s solution to this problem not only has nothing to do with agility, but largely runs directly contrary to the popular mantras of agile shops (he placed a ton of emphasis on meticulous documentation, where in agile circles not documenting things is elevated to a principle).
In the effort to comprehend this, I have often seen people identify the essential feature of “waterfall” as being the existence of a sequence of steps that are performed in a specific order:
feature design -> architecture/implementation -> testing -> delivery
Any time this kind of step-by-step process rears its ugly head, we scream “waterfall!” and change what we’re doing.
The results are… humorous.
After all, these steps are necessary steps to building a piece of software, and no process is ever going to change that. You can’t implement a feature before the feature is specified. You can’t test code that doesn’t exist yet (TDD isn’t saying “test before implementing”, it’s saying “design or specify the test before implementing”), and you at least shouldn’t deliver code before testing it. The only way to not follow these steps, in that order, is to not build software. From the paper:
One cannot, of course, produce software without these steps
(While these are the unavoidable steps of building software, this does not imply that other things, commonplace in the industry, are unavoidable. This includes organizing a business into siloed “departments” around each of these steps, believing that “architecture” and “implementation” are distinct steps that should be done by different people, etc.)
As the Royce paper explains, the existence and order of these steps is not the essential feature of the “waterfall” model he was describing. The essential feature, the “fall” of “waterfall”, is that the process is unidirectional and there is no opportunity to move “up”. More specifically, once the process starts, it either goes to the very end, or must be aborted and restarted all the way from the top. There is no built-in capability to walk back a single step to address and correct problems. This is really because failures that require jumping back to the beginning are discovered too late. Royce’s correction is intended to ensure that flaws in the execution of one step are always discovered at latest in the next step, and thus no need to jump back more than one step arises.
But what does this have to do with agility? Well, nothing. Royce wasn’t talking about agility. In fact, his solution is to do a ton more up-front analysis and design of software before starting any other steps, in order to anticipate and correct fundamental flaws at the beginning. This is basically the opposite philosophy of agile, which is to embrace failure at the end but to speed up the pipeline so that you get to the end quickly (this hinges on the idea that you can compartmentalize success and failure into individual features, and thus if a single feature fails it “wastes” only the effort invested into that one feature rather than an entire release. In my opinion this is an extremely dubious idea).
What even is agility, in the context of building software?
The Definition of Agility
Let’s remind ourselves of what the word “agility” actually means. Anyone who’s played an action RPG like the Elder Scrolls should remember that “Agility” is one of the “attributes” of your character, for which you can train and optimize. In particular, it is not “Speed”, and it is not “Strength”. Agility is the ability to quickly change direction. The easiest way to illustrate agility, and the fact it competes with speed, is with air travel vehicles: airplanes and helicopters. An airplane is optimized for speed. It can go very fast, and is very good at making a beeline from one airport to another. It is not very good at making a quick U-turn mid-flight. A helicopter, on the other hand, is much more maneuverable. It can change direction very quickly. To do so, it sacrifices top speed.
An airplane is optimized for the conditions of being 30,000 ft. in the air. There are essentially no obstacles, anything that needs to be avoided is large and detectable from far away and for a long time (like a storm), and the flight path is something that can be pretty much exactly worked out ahead of time.
A helicopter is optimized for low altitude flight. There are more smaller obstacles that cannot feasibly be mapped out perfectly. The pilot needs to make visual contact with obstacles and avoid them “quickly” by maneuvering the helicopter. There is, in a sense, more “traffic”: constantly changing, unpredictable obstacles that prevent a flight path from being planned ahead of time. The flight path needs to be discovered and worked out one step at a time, during flight.
(This is similar to the example Eric Reis uses in The Lean Startup, where he compares the pre-programmed burn sequence of a NASA rocket to the almost instantaneous feedback loop between a car driver’s eyes and his hands and feet, operating the steering wheel, gas and brakes)
The airplane is optimized for speed, and the helicopter is optimized for agility. You cannot optimize for both. This is a simple matter of the “engineering triangle”. Airplanes and helicopters are both faster and more agile than, say, a giant steamboat, but the choice to upgrade from a steamboat to an air travel is obvious and uninteresting. Once you make this obvious choice, you then need to make the unobvious choice of whether to upgrade to an airplane or to a helicopter.
The reason I say this is because “agile” practices are often confused with what are simply good engineering practices; practices that are more akin to upgrading from steam powered water travel to air travel than choosing an airplane or a helicopter. Those practices are beneficial whether you want to be fast or agile.
So what does “agility” (choosing the helicopter) mean in software?
Agility means rapid iteration.
Okay, what does iteration mean, and how rapid is “rapid”?
Iteration means taking a working software product, making changes to it that keep the product in a working state, and delivering the new version of the product.
What does “working” mean? That’s nontrivial, but the spoiler is, it’s decided entirely by the users of the software. Believe me, they’ll tell you whether it’s working or not.
Rapid means, basically, significantly more frequently than what you’d see in a “classical”, “non-agile” shop… which I’d say typically releases new versions every 6-12 months. So, maybe 2-3 months is the absolute upper limit of release frequency for “agile” shops. The goal is usually in the range from every 2 weeks, down to multiple times per day.
Let’s be absolutely clear that agility has nothing to do with speed. Speed refers to how many features or quality improvements per unit time (on average) your shop can deliver (let’s ignore the problem of how to define a “unit” of feature/quality with which to “count” or “measure” them). If you are a traditional shop who delivers 120 units of feature/quality improvements every 6 months, then your speed is 5 units/week. If you are an agile shop who delivers 5 units of feature/quality improvements every week, your speed is also 5 units/week. One shop delivers a big release every 6 months, the other delivers small releases every week. One has low agility, the other high agility, but both have the same speed.
We would measure agility not as the units of feature/quality per unit time delivered, but the inverse of the average release period (release frequency). The first shop’s release frequency is 1 / (6 months), or 1/24 per week. The second shop’s release frequency is 1 per week, and is thus 24 times more agile than the first shop.
We can see from this that agility and speed are separate variables that, at least in principle, can vary independently. If the practice you’re proposing would make a shop both faster and more agile, it has nothing to do with agility per se, but is a whole-system optimization. Using a high-level language instead of assembly is a whole-system optimization. Writing better architected code is a whole-system optimization. You don’t do those things because you want agility (implying you wouldn’t do them if you didn’t care about agility), you do those things because they are better, more advanced practices that improve software development in general.
But after you’re done adopting all the whole-system improvements you can think of, you then face a choice of adopting practices that further optimize one aspect of the system, but you can’t choose them all simultaneously (to do one is to not do the other), and thus you choose to either further optimize one aspect or to further optimize the other. You can choose to optimize for speed, but at the cost of not further optimizing for agility, resulting in a shop who can work faster (higher average feature/quality per week) but deliver less frequently. Or, you can choose to optimize for agility, but at the cost of not further optimizing for speed, resulting in a shop who can release more frequently but works slower.
Objecting that it’s possible to pick both and optimize everything at once is asserting that mutually exclusive practices that optimize one over the other do not exist. Yes, obviously some practices improve both agility and speed, and perhaps everything else too. The point isn’t that such choices don’t exist. The point is that rivalrous choices (choices that cannot coexist, to do one is to not do the other) exist too, and you eventually have to make those choices.
I’m belaboring this because I encounter what I call “anti-economic” thinking a lot… this is where people declare that opportunity costs as a category don’t exist, and choice-making is a straightforward process of figuring out what choices are better in every aspect. This is not how life works. Choices that have no opportunity costs are so uninteresting and unconscious there’s little point in talking about them. This is why economists say that all choices have opportunity costs… “choices” that don’t have costs basically don’t even count as “choosing”. Even something that at first appears to be a non-choice, like adopting air travel over steamboats, comes with temporary opportunity costs (you have to build the airplanes, helicopters, airports, etc., which could take years or decades until you have a robust infrastructure, whereas your steamboats work today).
The Fundamental Problem
Now, what does it take to become agile? A legitimate answer to this question will make it obvious how optimizing for agility does not optimize for speed. If you want to genuinely release every two weeks, you’ll end up with an overall lower average units of feature/quality per week delivery rate than if you were willing to sacrifice frequent delivery.
What we’re asking is, if you’re able to deliver 120 units of features/quality every 6 months, why is it not trivial to start releasing 5 units every week? You have to run the pipeline (feature design -> architect/implement -> test -> deliver) for each feature. Do we just need to reorder things so that we run the whole pipeline for individual features instead of batching? Is that all it takes?
Well, why would shops ever batch to begin with? Why would they design 120 units of features before starting to implement any of them, and then implement all of them before starting to test any of them? Understanding why any shop would batch in this way (and they do) is key to understanding what it takes to become agile.
Let’s imagine we start a new greenfield project by releasing one feature at a time, beginning to end. Now, the first obvious problem is that different people perform each of the pipeline steps. Your designers spend a day designing, then what? They just sit around waiting until the developers finish implementing, then the testers finish testing, the feature is released and the next one is ready to begin? Same question for all the other people.
No, you do what CPUs do, and stagger the pipeline: once the designer designs Feature 1 on Day 1, then she starts on Feature 2 on Day 2, and so on. By the time Feature 1 is delivered at the end of Day 4, she’s designed Features 1-4. And developers have implementing Features 1-3, and testers/bug fixers have tested and stabilized Features 1-2, and DevOps has released Feature 1.
So, we’re already batching by the amount equal to what it takes to run the whole pipeline. But that’s fine. We batch only that amount, so we’re only at most 4 features ahead (and only in the design phase) of what’s released. That’s pretty damn agile.
At the end of the first cycle, we have a software product that performs Feature 1.
Now we start on Feature 2. Is this exactly the same? What’s different now compared to the beginning? The difference is instead of working on totally greenfield software, we’re working on slightly less greenfield software: specifically software that performs Feature 1.
The designer now has to design Feature 2 not in a vacuum, but alongside Feature 1. The developers has to write code not in a brand new repo, but in one with code that implements Feature 1.
And the testers? What do they have to do?
That’s the key question.
Do they need to just test Feature 2 once it’s implemented? No, they have to test both Feature 1 and Feature 2. You can’t release the next increment of the software unless all the features it implements are in acceptable condition.
In short, testers have to regression test.
Being agile now is trivial: just run the whole pipeline each time. This is the first crucial insight we have:
Agility is trivial at the beginning of a project
The important question is then: does it stay trivial? How does this situation evolve as the software grows?
How does a designer’s job change as she designs in the context of a growing set of already designed features? Well, I’m not a designer, so don’t quote me on this, but presumably things should get easier. As long as you’re gradually building up a design language, with reusable components, each feature becomes more of a drag-and-drop assembly process and less of a ground-up component design process. That should accelerate each feature design. If you do poorly at this, maybe you get slower and slower over time, as it becomes harder to cram things in alongside other features, and you haven’t built up reusable components.
How does a developer’s job change as he develops on a growing codebase? I am a developer and I can talk for hours about this problem. If the code is well architected, it becomes easier to add new features for a similar reason with design: there’s a growing set of reusable code that just needs to be assembled into a new feature, utilities have been written to solve common problems, the architecture is flexible and admits modification and extension gracefully, and so on. In this case, feature development is accelerated as the set of existing features grows. If, instead, the code is tightly coupled spaghetti with random breakages popping up on the slightest change (“jenga code”), then feature development is decelerated, and you eventually might find that adding a single new button to a page takes a week.
How does a tester’s job change? Given that testers must test the entire software, there’s simply no opportunity for manual testing to accelerate. The testing burden clearly grows linearly as the software grows, and there’s nothing you can do about this (except skip regression testing, which amounts to skipping the testing pipeline stage). This means once you have 10x the number of features you do today, you can expect testing, which has to occur before each feature is released, will take roughly 10x as long.
Do you see the problem?
Now, I’m talking about manual testing. In manual testing, the bulk (not entirety, just most) of the cost is in running the tests, rather than designing them. This is because design is done once, but running has to be done over and over. But if we substitute automated tests for manual tests, suddenly the cost of running them becomes trivial, leaving only the design cost. The design cost is significantly higher, because telling a computer to do something is harder than telling a human to do it, especially if it involves any actual intelligence. This is huge, because it’s the cost of running tests that grows linearly with the existing feature set. Reduce that so far as to effectively eliminate it, and then you kill the linear growth of testing cost with current features.
Now, you have to pay the cost of designing a feature for every feature, and you have to pay the cost of implementing a feature for every feature. But you only have to pay the cost of testing for every release.
Classical Software Development
The “classical” practices of software development emerged to solve for these two facts:
- Manual testing effort grows linearly with the number of existing features
- Manual testing must be done once per release
When I talk about the “testing” phase, this doesn’t just involve the QA folks. It also involves some contribution from engineers. Testing is a cycle run between devs and testers, and it is typically run many times during a single testing phase. Testers receive the first build, file a bunch of bug tickets, developers fix those bugs and deliver the next build, testers close some bugs and open some others, rinse and repeat until all showstopper bugs are resolved.
Every new build given to testers has to be fully regression tested. Therefore, it is least wasteful to minimize the number of builds testers need to test. This involves two parts: minimizing the number of times the test-fix-deliver cycle is run for a single release, and minimizing the frequency of releases.
For the first part, minimizing the number of builds testers have to test means two things: first, fixing as much as possible in a single pass before delivering a build, and second, minimizing the chance that new bugs will arise in the next build. For the first, developers fix all filed showstopper bugs before cutting a new build. They don’t fix one bug and make a build, then another and make another build. For the second, developers only fix the filed bugs and make no other changes to the code.
This is what we call a code freeze. A code freeze is a crucial aspect of the classical, non-agile software development process.
As the software grows, running this whole process on a single build, which includes a large number of features for a mature product, can take months. A pipeline can only be run as fast as its slowest stage. Therefore, the software delivery pipeline can only be run once every several months. This means designers and developers will have several months to do implementation before the testing phase can start testing them. They’ll implement months worth of features, and the next testing phase will run once testing all of those new features.
It’s not necessary that design deliver in batches that will ultimately be tested in a single release. Designers can deliver features one after another every few days and developers can implement them, and they’ll pile up on the doorstep of the testing phase until the next release goes out. But consider what happens if we decide, during the testing phase for a release, we want to change the features of that release, either by adding, taking out or modifying features. Developers will have to make those changes, which are much more likely than mere bug fixes to generate more bugs, which will likely have a ripple effect of multiple additional test-fix-deliver cycles.
That’s very expensive. It’s a request to unfreeze the code during a code freeze.
For this reason, such requests have to be formally approved and rigorously justified.
Since there’s essentially “no backsies” on what gets admitted into a particular release, and releases are infrequent, the business (especially marketers and designers) typically want to think carefully about what goes into a particular release, and they’ll work that out ahead of time. This leads to batching the feature design for an entire release before handing it to developers.
Now, another important aspect of this is that finding and fixing bugs is taken care of during the testing phase, which means it’s wasteful to also try to do this during the implementation phase. Since you must pay the cost of the test-stabilize cycles during testing, you might as well not pay that cost during development. This means developers implement but don’t stabilize. They will only fix bugs that block development. For anything else, fixing a bug during development only risks the bug regressing later in development, and requiring it to be re-fixed. That’s wasteful, just fix it once. The point of the code freeze is minimizing the chance of regression. That’s why it’s most efficient to do only the bare minimum of bug fixing during the development cycle.
Due to staggering, the development team needs to be busy working on implementing features for the next release simultaneously with the current release being tested and stabilized. This translates into a branching policy called unstable master: the “master” or “trunk” branch of the code repo is where developers implement new features. Since they’re doing only bare minimum stabilization, master is always unstable, and never in a releasable state.
When the features for a release are all implemented, a new branch is created off of master, a release branch, and the code freeze is applied to work in this branch: only bug fixes are allowed to go into this branch (unless a change request is approved). Once testing is completed, a release is cut from the release branch. During this time, development for the next release is occurring in master. Once the release is cut, the release branch is merged back into master, in order to get the bug fixes into master.
The release branch is maintained after releasing, in order to make emergency fixes. For example, if we release version 1.5 from the release-1.5
branch, and customers discover a showstopper, we apply the bug fix to the release-1.5
branch and release it again. This ensures that if we need to make patches to the current live version, we have the exact version of the code currently live, and we can apply only the bug fix to it. Each time this is done, the release branch is merged back to master to get the emergency bug fix in.
Hopefully, after the build is released from the release branch, or at least soon after, feature development for the next release is done, and you can then create the next release branch off of master.
You don’t want multiple simultaneous release branches. Trust me, you don’t. I had to do that once.
You have to try to merge the bug fixes into master then merge them all into the open release branches. The staggering works by working on the next release in master, and the current release in the one open release branch. Obviously this gets screwed up when you have to make emergency fixes, but that’s just another reason why you want to minimize the chance that ever happens.
And thus we have the classical, non-agile development process:
- Business/marketing carefully plans a large (6-12 month) batch of features to release all at once, and figures out how they’re going to market them.
- Design takes the release roadmap and produces a design document with all the requisite features. Marketing starts working on the marketing campaign.
- Developers receive the design document, and work in master implementing but not stabilizing the features
- A release branch is made off of master, the testing phase is run, with test-fix-deliver cycles repeatedly done on release branch builds until all showstopper bugs are fixed.
- The final build that passed QA is released publicly and the marketing campaign goes live.
This process evolved naturally out of the fact that testing requires full regression but this only needs to be done once per release.
Agile Software Development
The goal of agile software development is to be able to release a small number of features frequently. The logical conclusion of agile development is to release each single feature one after another, and thus do no batching of features in releases at all.
The most obvious practice we have to adopt is test automation.
If you want to release, say, once every two weeks, you simply cannot run this manual test-fix-deliver release build cycle every time. It will become infeasible to do regression in this way and still release biweekly for a greenfield project in probably a matter of months.
The goal is not to eliminate the QA department (as it is often misunderstood to be), but rather to focus manual QA entirely on exploratory testing. All known requirements, either discovered during product development or from exploratory testing, must lead to an automated test.
Quantitatively, the amount of behavior that, if broken, is a showstopper, that is not covered by automation must remain roughly constant. This is the fundamental criteria that eliminates the linear growth of testing effort with number of existing features. The constant amount of uncovered behaviors must remain small enough that test-fix-deliver cycles focusing only on those few uncovered areas can be feasibly done every two weeks, or whatever your release cadence is.
You don’t have to achieve 100% coverage, you just have to keep the amount (not percentage) of uncovered stuff constant. Since the denominator will grow but the numerator remains roughly fixed, that means you’ll asymptotically approach 100% coverage.
The goal, really, is to eliminate the need for a code freeze. We are, in a sense, inverting the process. Instead of implementing but destabilizing the code, we have to prevent the code from ever destabilizing, which moves the stabilization work up to right after implementation of each thing (really, on every modification to the code).
This leads naturally to the inverse branching policy of stable master. Rather than create release branches and stabilize there, master is kept stable, and development work is done in feature branches that are quickly merged back to master. Master gets a new feature one at a time, and it does so with assurance that all existing features still work. This means automation is enforced in such a way that master cannot accept a feature branch unless all automated tests pass.
The presence of automation changes the way developers work. Rather than discover much later that something broke, they are given early news of this by the automated tests, and are required to fix it now. This means bugs will get fixed repeatedly, and very frequently, as high as once per feature. That’s a key point, we’ll come back to it.
Automation eliminates the linear growth of the testing phase. It gets us off that green line and onto either the blue or orange curve in the graph above. Then, all three of the phases have similar looking graphs of effort per feature as a function of number of existing features. In all cases, the effort grows or diminishes independently of the number of existing features, and instead depending on how well each phase is executed. This is the fundamental challenge of maintaining agility: that the effort needed to get a single feature all the way to delivered is roughly constant, and doesn’t grow steadily as the project goes on.
But while we have now decoupled the pipeline from the number of existing features directly, we can still see that poor practices will eventually lead to the per feature effort growing. This will kill agility over time. This means maintaining agility requires adopting best practices in designing, implementing, and building test automation.
But these are not specific to agility. Remember that manual testing effort grows linearly with the number of existing features, but it only has to be paid per release. All the rest, including the effort of building automated tests, must be paid per feature. If you end up on the orange curves, the whole process is going to slow down whether you’re releasing frequently or not.
In other words, poor design, implementation and automation practices will slow down any shop, even the classical non-agile ones. This is essentially a tautology: such practices are deemed “poor” precisely because they work to the detriment of any software development process.
Good engineering practices are, therefore, whole-system optimizations (this is, again, a tautology). Every software shop should be doing their best to adopt the best design, implementation and automation practices. They should be working to make reusable components that can be easily composed, building code in a manner that makes modification easier rather than harder over time, and so on. What exactly these practices are is nontrivial to determine. Discovering and executing them is the essence of being a craftsman in this highly technical industry. A good developer is one who knows what those practices are and how to practice them. Same with designers.
That is irrelevant to agility per se, beyond the obvious fact that failing to adopt good practices will also screw up your ability to be agile, along with screwing up everything else.
Thus, at the end of all of this, the practices that are specifically about optimizing for agility come down to one thing:
Test Automation
That’s it. I could have just told you that at the beginning, but I doubt you would have believed me.
To optimize for agility, you dive headfirst into thorough test automation, and you take extremely seriously the requirement that you must keep the amount of uncovered scenarios roughly constant as the software grows. Basically, you’ll achieve high agility when you’re confident enough in your test automation that you’re willing to release without manual testing a build first.
The Competition with Speed
Now that we’ve discovered the key practice for optimizing for agility, let’s explore how optimizing in this way competes with optimizing purely for speed.
In short, how does making yourself able to release frequently necessarily make you slower overall?
Now, remember, the biggest reason why most shops are nowhere close to agile isn’t simply because they don’t have good automation. They’re rife with poor design and engineering practices. Addressing that is a whole-system optimization and will make them both faster and more agile. Remember, we’re talking about the choice we have after we make all these whole-system optimizations. Let’s say we’re already using top-tier practices across the board. How, then, does optimizing for agility make us slower than we could be if we still kept all those practices top-tier, but were willing to sacrifice frequent releasing?
Obfuscating what I’m about to explain is a big part of Big Agile’s consulting practices. I’ll talk about what their goals are, and what their executive/middle management customers want to be told by process consultants another day.
The most obvious way that we sacrifice potential speed is by spending so much time writing and maintaining automated tests. Now, when I say this, someone is surely going to respond, “but automated tests is a whole-system optimization!” Yes, absolutely… to a degree. Having some automation is surely a whole-system optimization over having no automation whatsoever. Developers experienced with it will tell you that it honestly makes our jobs easier in many cases to whip up a few unit tests. This is because even development without stabilizing requires some form of testing during development (to at least confirm the happy path is functional), and developers can easily waste a lot of time running this cycle with manual testing.
If you’re adding a button to a screen, and it takes 30-60 seconds to open up the app and get to that screen, and you’re doing this over and over, dozens of times, in the process of working on the button, you could definitely be slowing yourself down by not spending 5-10 minutes writing a unit test that performs the same verification in 3 seconds.
I’m not talking about simply having some automation. I’m talking about having so much test automation that it allows you to release a mature software product without manual testing at all.
That’s a s***load of automation, man.
Remember, our metric for success is that the total (not proportional) amount of critical (broken = showstopper, can’t release) functionality that is not covered by automated tests is constant over time. We have to asymptotically approach 100% test coverage… and I mean real, quality tests that will genuinely prove system integrity.
It takes a lot of time to both create and maintain that level, and that quality, of automated tests. You simply don’t have to do all of this if you only want to release infrequently. It’s going to slow you down significantly, relative to classical development, to invest so much time and effort into test automation. What you lose is raw development speed. What you gain is extremely fast, and reliable, assurance that the system still works, and can therefore be released again.
Next, let’s talk about what all those tests actually do for us. Tests don’t fix our code, they just announce to us that it’s wrong. Who fixes code in response to a failing test? Developers!
To release more frequently, you inevitably have to fix bugs more frequently. Assuming a certain quality of code and developers, things will tend to break (and re-break) at a particular frequency. I’m not saying things break more frequently in an agile process… they break exactly as frequently. But, you have to fix things more frequently… at least once per release cycle, and agility means more frequent release cycles.
Remember that I emphasized in classical software development, developers working on new features only implement but don’t stabilize stuff in master. Then they only stabilize at the end (ideally once, but in reality maybe two or three times, the test-fix-release cycle gets run and some things regress during this phase). They don’t have to keep re-stabilizing every week, or every day, but that’s exactly what all that automation makes you do.
This may seem like it evens out because, being more frequent, releases in agile let less time pass by, and therefore (for a given frequency of breakages) less gets broken. Having to fix dozens and dozens of bugs “all at once” in a classical shop may feel daunting, while in an agile shop you only ever produce a couple of bugs before you fix them, and that’s less intimidating. But this is deceptive (in the same way it “feels” less destructive to your budget to buy tons of cheap stuff compared to buying one big expensive thing). You’re ultimately spending more time fixing bugs in the agile process, because bugs are often repetitive (the whole point of regression testing is to address this). You end up fixing a bug every week instead of every 3 months, which (unless it’s such a severe bug it interferes with development work) is wasted effort if the code isn’t getting shipped out every week.
There’s other stuff you need to build to effectively release frequently, including robust rollback mechanisms, but those are smaller issues. The big one is that you have to write automated tests for every little tiny nook and cranny of the app, and the presence of those tests are literally just going to slow you down by making you fix stuff as soon as you break it, and fix it again as soon as you break it again. That’s not a bad thing… if you want to release frequently. But it’s going to cost you in raw speed.
Conclusion
If you decide that agility really is important, I hate to be the one to tell you, but your goofy team names, weekly “demos” (the quotes there are very intentional) and “backlog groomings”, and story points are completely irrelevant to that goal. You need to instead go all in on test automation, and also make sure you’re not building spaghetti code that’s going to collapse under its own weight in 6-12 months (the latter is always important, but spaghetti code might collapse an agile project faster than a classical one). And you need to not let yourself get tricked by the agility you demonstrated at the beginning (typically way before the software is ready to be delivered to any real customer). The fact you were able to show increments frequently in greenfield says nothing about your continued ability to do so on a maturing product.
No matter what kind of shop you are, stay on top of the crafts of product design and engineering. That will help you in all aspects and make you better overall. There’s no reason not to (the upfront investment will always pay for itself many times over).
With that out of the way (and emphasizing it’s unrelated to agility), go hard on automation, really hard, and you’ll be able to achieve agility. Whether you want to… that’s for your business to decide.