How much duplicated code do you tolerate? [closed]

Refactoring: Improving the Design of Existing Code

The Rule of Three

The first time you do something, you just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.

Three strikes and you refactor.


Coders at Work

Seibel: So for each of these XII calls you're writing an implementation.
Did you ever find that you were accumulating lots of bits of very similar code?

Zawinski: Oh, yeah, definitely. Usually by the second or third time you've cut and pasted
that piece of code it's like, alright, time to stop cutting and pasting and put it in a subroutine.


I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.

Saying that it'll hurt readability only suggests that you are bad at picking names :-)


Personally, I prefer keeping code understandable, first and foremost.

DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.

That being said, I do agree that DRY is a good goal to follow, when practical.


If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.

Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.

Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).

A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).

I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.