What Is Legacy Code? All Code Is Legacy Code!

Software developers are superstitious out of necessity. Given the task of keeping a software system functioning properly without the ability to reason about or memorize its entire operation, developers have a healthy fear of unexpected breakages. We often raise the steps intended to mitigate breakages to the level of ritual. One of the most notable manifestations of superstition in software is legacy code.

Legacy code is described as:

Having value, the same way a haunted house has real estate value
Causing fear when refactoring is considered, like changing the rungs on a ladder while you’re on it.
Untested, like a rope bridge encountered on a hike
Old or crufty, the unfinished basement of your application - full of cobwebs and dark corners.

But there’s no need to fear code just because it’s legacy code. In fact, code that doesn’t inspire anxiety can be just as much legacy as code that does.

Table of Contents

The Legacy of Legacy Code
- What is legacy code, then?
Legacy Code Myths
How to Handle Your Legacy Code
- Test It
- Document It
- Harden It
- Debride It
- Review It
- Present It
- Unrepeat It
How Bitovi Can Help

The Legacy of Legacy Code

The term “Legacy Code” originally referred to a system that was in the process of being replaced to distinguish it from an in-development system. The modern equivalent would be if a Ruby on Rails system were re-engineered in Go for performance reasons. The Rails system would be a "legacy" system, continuing to function and provide business value while the Go system is under active development.

However, modern software systems tend towards large scope and complexity over time and are less likely to be dependent on a single platform like a mainframe. It's much more common to replace components of existing systems rather than completely replace an old system with a new one. Thus, your legacy code might be commingled with your active development project, and it becomes quite hazy where this boundary between legacy and active code exists!

What is legacy code, then?

One time, this guy handed me a picture of him, he said "Here's a picture of me when I was younger." Every picture is of you when you were younger.
-- Mitch Hedberg, Strategic Grill Locations

Legacy code is just that - code. It isn’t any of the things that surround code, like documentation and assets. By themselves, docs and assets cannot cause a catastrophic house-of-cards collapse from minor changes in the way that code can. The unique challenges involved in code changes and their potential knock-on effects are why legacy code on its own is an important topic.

Legacy code exists - it's not in the programmer's mind yet to be written or in a buffer waiting to be saved. It has meaning and function within an existing system. Code that exists only in a debugger, a REPL, or a command line history might not qualify, as those things are meant to be transient, and code stops being legacy code when it's erased from existence. To use a photography analogy, your face on your viewfinder screen before you press the shutter button isn't a photo of you when you were younger, but as soon as you snap that shutter, the resulting image is. So when you save your code into a runnable system, that code now has a legacy.

Beyond that, there are a few qualifiers that would meaningfully distinguish legacy code from any other code, which will be expanded on below. This definition, then, is so broad as to encompass practically all code.

That's the point - all code is legacy code!

What makes code legacy is its ability to be replaced or modified, with the added implication that the decision to replace or modify it is a value judgment. The value gained from making code more organized, more optimized, more legible, etc. is always balanced against the effort required to make it so. In an actively developed software project, the value gained from improving legacy code must also be weighed against the value gained by writing new code. After all, developer time is a finite resource.

Legacy Code Myths

The definition of legacy code as "all code [that exists in a running system]" runs counter to other common characterizations of legacy code. These other definitions of legacy code rely on myths about software development and organization.

Myth 1: "Code only becomes legacy after a few weeks or months"

It's understandable why this myth persists. Developers often look at the ease of refactoring as a proxy for determining whether code is fresh or entrenched, and of course, brand-new code hasn't yet had a chance to be used in multiple consuming code paths.

However, the more useful code is, the more it will be used. Another integration could crop up as soon as your work gets merged to mainline and becomes available for use by other developers. If published as a library, every release cements code and API changes in a way that requires extra work to undo or further modify, especially if you are properly documenting and following Semantic Versioning standards.

Myth 2: "Tested code is not legacy code"

In my career in React consulting at Bitovi and elsewhere, I've worked with dozens of old code bases. I've worked with code that's tested to varying degrees and code that's not unit-tested at all. Among the unifying characteristics between the two is that they're both very clearly legacy code.

I recently ran up against a problem with a library not passing its unit tests when it was run on a more recent version of Node.js. However, the software still worked for my use cases without error in the application I needed it for. Asserting that testing makes code non-legacy seems to fall flat in a case like this, where the code may be harder to refactor because of the existence of tests rather than because of their nonexistence. If I wanted to do anything to improve this library, I would first have to fix all of the tests, but if I wanted to merely apply it to a use-case where later Node.js was involved, I’d just need to avoid making the incorrect assumptions about later Node.js that the test battery did. I made the legacy code value judgment that my development efforts were better spent charging ahead with new features on the application.

Tests themselves are not tested code, so even by this more restrictive idea, two-thirds of a well-tested codebase could still be legacy! Testing batteries are implicitly trusted in a way that live code is not. This is especially true when mocking is involved - mocks can drift from the underlying code unit that they're mocking out, and it's difficult to notice deteriorating mocks until you’re running an integration test that does no mocking or until trying the live code.

Myth 3: "Someone else's code is legacy code [to me]"

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.
-- Hyrum's Law, as described by Hyrum Wright

This myth is technically true but incompletely descriptive. Namely, it doesn't mean that your code is not legacy code! As soon as it's available for integration in some way, you have met the standard of your code, providing value that must be weighed when planning to improve it in some way. As Hyrum's Law states, you may unintentionally break somebody's usage of your code simply by changing its behavior in a subtle way, and you may be surprised to learn that the somebody is you.

Bitovi's CanJS library spans over 200 packages, and not every feature is documented on canjs.com because some things need to be exposed between components but are not meant for users. Even when it came to patch releases, I still had to consider the time it took to track down consuming libraries and discover what, if any, updates they needed. Simply adding a patch increases the scope of adding or fixing features and takes developer time.

How to Handle Your Legacy Code

Even for those who espouse a more restrictive definition of legacy code than the one presented here, it's still good practice to treat all code as if it is legacy and take steps to mitigate any difficulty in further working with it. The following suggestions are listed in rough order of importance to your system's continued good operation, but most of them are some variant on "familiarize, or refamiliarize, yourself with the system, how it works, its design decisions, its domain requirements, and its history, then go from there."

Test It

Even though tested code is still legacy code, testing code is still valuable! When you want to rewrite, refactor, or optimize code, unit testing is invaluable. However, be mindful of the fact that any mocks of your code will have to be updated when your code is, and unit testing is not the end of testing but merely the beginning of a process that includes human-powered and automated testing working together.

Though outside the scope of this article, Bitovi recommends choosing a testing triangle or triangles appropriate to the realms in which your software system resides. A testing triangle generally involves a large amount of testing written at a high level of isolation (unit tests) and fewer, more carefully chosen tests at greater levels of integration.

Document It

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.
-- Donald Knuth, Literate Programming

Code often doesn't capture intent, and it's important to have a written record of why code is written in a certain way. Long-form commentary is still the best way to capture the why of code, and having to learn enough about a piece of code to understand why it exists in its current form is a strong motivator for documenting that knowledge for future ease of re-familiarization.

Harden It

A QA engineer walks into a bar and orders a beer. Orders 2 beers. Orders 0 beers. Orders -1 beers. Orders a lizard. Orders nothing. The bar opens for business. The first real customer comes in and asks where the bathroom is. The bar shuts down immediately.

There's always the possibility that your code will be exposed to a misbehaving agent, whether that's intentional, as in a malicious actor, or unintentional, as in a bug-ridden API consumer. In the Internet age, not creating security holes or patching them as soon as they are discovered becomes all the more critical. When you audit your code for security or other defects, you refamiliarize yourself with how the code works, what it does, and why, and you're then empowered to make other fixes or improvements.

Debride It

Dead code creates unnecessary maintenance tasks. It's important to identify when code is no longer used and remove it from your system. You can always pull it out of source control later if you discover that it's needed and someone remembers that it used to be there.

Review It

Some older and less well-known parts of your code base may benefit from a collaborative review. A group review of older or forgotten parts of a system invites conversation and discussion, fosters better understanding, and mitigates the fear of the unknown. Dev teams may want to schedule reviews as a lunch and learn or similar less-structured meeting time.

Present It

if I can’t explain something I’m doing to a group of bright undergraduates, I don’t really understand it myself.
-- Daniel Dennett, Intuition Pumps and Other Tools for Thinking

The developer who has to write something novel for a software team should present new and notable features, hacks, time-saving devices, useful abstractions, library updates, and lessons learned to enlighten and enrich the team as a whole. At Bitovi, we encourage greater shared understanding through both full department meetings with presentations of interesting topics and less structured office hours within each dev team.

Unrepeat It

DRY code (code wherein you "Don't Repeat Yourself") is generally considered preferable to code that is copied and pasted with minimal changes. If you find an opportunity to combine two functions or modules with similar purposes into one, then you are less likely to let that one function or module fall into obscurity or disrepair. But don’t go overboard on this; it’s OK to leave your code slightly DAMP (wherein you “Don’t Aggregate Many Parameters” into one function’s parameter list).

How Bitovi can help

We have more than a developer-century of experience in working with legacy codebases, polishing and modernizing them, and leaving them better for the next generation of developers. Whether you have old code, new code, or no code yet, Bitovi can help all of your code become ready to be your valuable legacy. Our React consulting page summarizes our expertise and services, links to some of our favorite tools and training, and even includes a consultation form to get in touch with us quickly and easily.

What do you think?

How does your organization handle legacy code? Do you have a different definition of legacy code to share? Join our Community Discord and let us know!