How come we say "data set" instead of "datum set"?

Given that "data" is the plural form, and it's playing the role of an adjective here, how come we don't use the singular form? Other kinds of sets, for example "point set", "skill set", "stationery set" all use the singular form of the noun.


There are several things that probably contribute to the use of "data set".

First, some background. It's a bit of a simplification to say that "data" is playing the role of an adjective here. It acts like an ordinary adjective (for example, "large") in some ways: it's being used to modify another noun attributively, and its position is before the noun. But in other ways, it doesn't act like an ordinary adjective: for example, it cannot be modifed by an adverb (we can say "a very large set" but not "a very data set"). For this reason, many modern grammarians would say that it should not be classified as an adjective. Instead, a word in this kind of construction is often considered to remain a noun, and called an "attributive noun".

Furthermore, as VampDuc points out, the fact that we can write "dataset" without spaces indicates that it may in fact be a single word grammatically. In other words, it may have turned into a compound word. In that case, it may not make sense to assign "data" a part of speech here: it would be like assigning a part of speech to "black" in "blackbird".

Plural nouns can be used attributively/as the first element of compounds

While attributive nouns are usually singular (as in your examples of "point set", "skill set", "stationery set"), it is not always impossible for them to be plural.

A few words seem to only have a plural form, and no singular, even when used in compounds (I mention a few in my answer to "Are there nouns that are always plural — have no plural counterpart?"). Two words like this are clothes (a basket for clothes can only be called a "clothes basket", not a "cloth basket") and glasses (a case for glasses can only be called a "glasses case", not a "glass case"). It's probably relevant that "cloth basket" and "glass case" are valid phrases that have another meaning. For words such as eyeglasses and scissors where such confusion isn't likely, it seems to be possible to use either the usual (plural) form, or a singularized form in compounds: "eyeglass(es) case", "scissor(s) kick".

Nouns that have the plural suffix -s but take singular verb agreement also seem to strongly resist singularization when used attributively or in compound words. Examples of nouns of this type are news (which forms newspaper, not *newpaper), mathematics (mathematics professor), and physics (physics textbook).

These are just the examples of plural attributive nouns that I think are the most obvious and uncontroversial; there are more (see for example the following question: "Irregular plurals in noun adjuncts"). In some cases, speakers of different dialects tend to have different judgements about whether a plural use is acceptable.

My point here is, we can't conclusively rule out the use of "data" in this position just because it is a plural form.

Irregular plurals seem to sound less bad as the first element of compounds

An observation commonly made in the literature, although I don't actually think it's very significant, is that irregular plurals generally sound a bit less bad when used appositively/as the first element of compounds than regular plurals. (I do agree with this idea to a certain extent, but I think the acceptability of expressions like "mice trap" or "teeth cleaning" is often overstated. They both sound pretty bad to me despite using irregular plurals.)

And of course, "data" is not a regular plural.

"Data" is used as a singular non-count/mass noun by many people

See this question for more information about this: "Is "data" treated as singular or plural in formal contexts?"

Summary

It's possible for plural nouns to be used attributively, or as the first element of compound words. It is more common to use a singular noun, but two factors that seem to facilitate the use of a plural noun in this position are irregularity (and "data" is not a regular plural) and significant difference in meaning between the singular and plural ("datum" is not often used, and it often is used differently from the plural "data").

Additionally, for many people "data" is used as a singular mass noun, which would make "data" a singular form, explaining its use in "data set".