"Duplicate data" or "duplicated data"?

I'm making a presentation and I need to know if I should use "duplicated data" or "duplicate data". Is there any difference? I'm talking about removing observations of a database that are duplicates.


Solution 1:

The difference is subtle, but in this case, Duplicate Data would be preferred. I would interpret the two phrases as follows:

Duplicate Data: Entries that have been added by a system user multiple times, for example, re-registering because you have forgotten your details.

Duplicated Data: Someone has deliberately taken a precise duplicate of the data - or a proportion of it - maybe for backup or reporting purposes. It may have been accidentally added to the original.

In the context of what you are talking about, the difference is important, because the second implies exact duplicates, whereas the first is a much more complex issue.

And yes, "exact duplicate" and "partial duplicate" are misnomers - it is either a duplicate or not - but these are the terms used.

Solution 2:

Other answers are good, but in my words:

Duplicated data used the old data as the source of the data.

process(X) -> A
A-> B
A is duplicated as B

Duplicate data had the same input and therefore are matching records.

process(X)  -> A
process(X)  -> B
B is a duplicate of A

Solution 3:

Duplicate data.

What you discover and remove are instances of duplicate data.

What you or the processes create, mostly for a purpose, is duplicated data.

Solution 4:

When I hear (or read) duplicate data, I presume duplicate is an adjective that modifies the word data.

Most of the dictionaries I consulted listed duplicate as a noun, verb, and adjective, but duplicated as a verb. For that reason, I'd use duplicate data.