Why Most Software Rewrites Fail and What to Do Instead

Every developer has looked at a legacy codebase and thought the same thing. This needs to be rewritten from scratch. The code is a mess. The architecture is wrong. It would be faster to start again than to keep patching this thing together. I have had that thought dozens of times across my career, and I have learned the hard way that it is almost always wrong.

The urge to rewrite software from scratch is one of the most dangerous instincts in our industry. It feels productive. It feels clean. You get to use the latest frameworks, fix all the architectural mistakes, and build something you are actually proud of. The problem is that rewrites fail at an alarming rate, and the ones that succeed usually take two to three times longer than anyone estimated.

The second system effect is still catching people out

Fred Brooks wrote about this decades ago and it is still true today. When you rewrite a system, you are not just rebuilding what exists. You are rebuilding what exists plus every improvement everyone has been wanting for years. The scope balloons immediately because the rewrite becomes the opportunity to fix everything that has ever annoyed anyone.

I have seen this happen on Dynamics 365 projects, on bespoke SaaS platforms, and on internal tools at companies of every size. The initial estimate says six months. By month three, someone has added a new integration requirement. By month five, the team realises they underestimated the complexity of the business logic buried in the old system. By month eight, the project is over budget, the old system is still running, and everyone is stressed.

The old system knows things you do not

This is the bit that catches people out every single time. That messy, tangled, ugly codebase you want to replace contains years of accumulated business knowledge. Every weird if statement, every strange edge case, every hack that makes you wince is there because something in the real world required it.

When you rewrite from scratch, you lose all of that institutional knowledge. You will discover the edge cases again, but you will discover them in production, when customers are affected, which is the worst possible time to learn. The old system handled those cases because someone already got burned by them. The new system will need to learn the same lessons all over again.

Your business does not stop while you rewrite

This is probably the biggest practical problem with any major software rewrite. While your development team is heads down on the new version, the existing product still needs maintenance, bug fixes, and feature updates. Customers do not care that you are building something better. They care that the thing they are paying for works properly right now.

So you end up maintaining two systems simultaneously. Your team is split between keeping the old thing running and building the new thing. Neither gets enough attention. The old system starts accumulating more technical debt because nobody wants to invest in something being replaced. The new system falls behind schedule because half the team keeps getting pulled back to fix urgent production issues.

I have watched companies spend eighteen months on a rewrite only to ship something that was barely at feature parity with what they already had. The customers saw no improvement. The team was exhausted. And six months later the new codebase was already accumulating its own technical debt because the underlying organisational problems that created the mess in the first place had not been addressed.

The strangler fig approach actually works

Instead of burning everything down and starting fresh, the approach that consistently works is incremental replacement. You wrap the old system, build new components alongside it, and gradually migrate functionality over time. Martin Fowler calls this the strangler fig pattern and it is the single most reliable way to modernise legacy software.

The beauty of this approach is that you are always deploying working software. There is no big bang cutover where everything needs to work perfectly on day one. You replace one module at a time. If something goes wrong, you roll back that one module. The risk is contained and manageable.

With CampSuite, we have taken this approach multiple times. Rather than rewriting the entire booking engine from scratch, we built a new version of one component, tested it thoroughly alongside the old one, switched traffic over gradually, and then retired the old code. It took longer than a clean rewrite would have in theory, but it actually shipped, which is more than most rewrites can say.

When a rewrite genuinely makes sense

I am not saying rewrites are always wrong. There are situations where starting fresh is the right call. If the technology stack is genuinely end of life and cannot be supported, if the codebase is so small that rewriting it takes weeks rather than months, or if the product needs to do something fundamentally different from what it does today, then a rewrite might be justified.

The key question is whether you are rewriting because you need to or because you want to. Wanting a cleaner codebase is not a good enough reason on its own. If the business is working, if customers are happy, and if the system is maintainable even if it is ugly, then the rewrite is solving a developer experience problem, not a business problem. Those are very different things.

If you absolutely must rewrite, do these things

First, document every single piece of business logic in the old system before you write a line of new code. Not the technical implementation. The business rules. Why does this calculation work this way? What happens when a customer does this specific thing? You need to understand the what and why before you rebuild the how.

Second, set a hard scope boundary and defend it with your life. The rewrite rebuilds what exists. No new features. No improvements. Feature parity first, improvements second. Every feature you add to the scope extends the timeline and increases the risk of failure.

Third, run the old and new systems in parallel for as long as you can. Compare outputs. Find the discrepancies. Fix them before you cut over. The parallel run is where you discover all the edge cases you forgot about, and you want to discover them in testing, not in production.

Fourth, accept that it will take longer than you think. Whatever your estimate is, add fifty percent. That is not pessimism. That is seventeen years of watching software estimates be wrong.

The real lesson about legacy code

The urge to rewrite is usually a symptom of a deeper problem. Maybe the team does not understand the existing code well enough. Maybe there is not enough documentation. Maybe the architecture has evolved without any clear direction. A rewrite does not fix any of those problems. It just resets the clock.

If you are struggling with a legacy codebase, start by investing in understanding it. Write tests for the existing behaviour. Document the business logic. Refactor the worst bits incrementally. It is less exciting than a ground up rewrite, but it actually bloody works.

If you are dealing with technical debt in a small team, the incremental approach is even more important because you simply do not have the resources to maintain two systems at once. And if you are weighing up whether to build custom software or buy something off the shelf, factor in the long term maintenance cost. That is where the real expense lives, and it is the bit most people forget about until the codebase is ten years old and someone is pitching a rewrite.

More from the blog

Technology•8 min read

Build vs Buy: How to Decide Whether to Write Custom Software

Dynamics 365•8 min read

The Most Common Dynamics 365 Implementation Mistakes

Business•8 min read