Is "data" treated as singular or plural in formal contexts?

I have actually considered this quite a bit, being both a linguist who studies these things, and a scholar who publishes papers.

Etymologically speaking, the word data is the plural of datum in Latin. In Latin, data would get plural verb agreement. Now, languages borrow words and do whatever they want with them, so this historical fact about data has no relevance in judging what is "correct" in English. There is significant evidence that data has established itself as a mass noun in English, suggesting that, for most people, "data is" is the most natural way to speak.

However, in a university/scholarly paper, I would recommend using "data are", rather than "data is".

The reason: some stickler professors and pedantic scholars believe that, logically, if datum is an English word for a single piece of data (which it is), that data must logically be plural. The fact that most people do things differently only means, to them, that most people are doing it wrong. Whether you agree with that or not is somewhat irrelevant.

So you have two choices.

  1. If you use "data is", then reasonable people (yes, I am biased) who read your paper will not bat an eye, but stickler professors might judge you on your perceived ignorance or inappropriate level of informality.

  2. If you use "data are", then the stickler professors will not judge you to be ignorant, and the reasonable people will think "that's an acceptable variant" or "this person is a stickler for language" (or if they are me, will think "this person is pandering to the sticklers — a necessary evil"), but nobody will think you are ignorant.

So, choosing (2), "data are" is clearly your safest bet, and is what I always do (and what I find nearly all of my colleagues do).

This is intended as a clarification of the "correctness" of using data as a mass noun, for those strict-minded sticklers (there's plenty of them) who might be unconvinced by Kosmonaut's "languages borrow words and do whatever they want with them":

1 - "Datum" and "data (plural)" are historically correct, so "data (mass noun)" must be wrong. How can "data" have a mass noun form as well as a singular and plural? You'd never say "Oh, I spilled rice on the floor. Wait, it's okay, I only spilled 4 rices". There's a separate noun phrase for the singular and plural ("grains of rice").

Consider potato. It has a singular form, meaning one distinct root vegetable, a plural form, meaning multiple distinct root vegetables, and a mass form, meaning an amount of foodstuff made from potatoes. Imagine a dinner table, where each diner has a baked potato on their plate (singular), and everyone is sharing a platter of roast potatoes (plural) and a bowl of mashed potato (mass) (hopefully among other things...). If you ask someone to "pass the potato", they'll understand that you mean the bowl of mass mash, not the tray of plural potatoes or the singular potato on their plate.

2 - There can be such a thing as "a datum" in a way which is not true for "a water". Imagine someone looking at a database full of data and saying, "There is so much data in this, I can't see where to start". Surely this is like standing in a migration of birds and saying "There is so much bird in the sky, I can't see the sun..."? Since data can be countable, surely "data" can't be primarily a mass noun?

Data is not necessarily countable. Data in a neat Excel sheet might have countable cells, but what about the data that is lost when photo editors talk about "data loss" when increasing the contrast of a digital photo made of binary machine code data? There's no clear way of defining where one datum starts and the next one stops — would a datum in this context be a bit, a byte, or the data defining one pixel? Such a line would be arbitrary, like looking for units of rice in a processed flat rice cracker. It's an amount measured in units of mass — 67kb of data in a jpg, 2 grams of rice in a rice cracker.

Even seemingly trivial cases aren't so trivial. What's one datum in a modern relational database? One value, one row? What about where there are table joins and foreign keys? Is a structural definition a datum? You can create a convention-specific definition, but it's not a universal definition like one bird.

3 - Following that pattern, shouldn't the mass noun of data be datum (the singular), like how the mass noun of potatoes is potato?

No. It's rare, but not completely unique, for a count noun to develop from a plural, in cases where the singular over time becomes less and less universally meaningful. "Physics" used to mean the set of countable, defined, distinct natural sciences - until the field developed such that it became clear that the lines between one physic and another wasn't as sharp or universal as previously thought.

You could answer "What's happening at CERN?" with "A lot of physics", but you wouldn't expect the reply "How many?". This is because there's no longer a clear established universal dividing line between one physic and another. Your answer would interpret the question as, "How much?" and would be a measurement of amount: "Enough to occupy 4,000 physicists". In the same way, you could answer "What does this supercomputer store?" with "A lot of data", but the reply "How many?" would incorrectly assume that all data has one clear common countable unit and that there is a clear universal dividing line between one datum and another across all contexts. Even if this data did happen to have a consistent countable convention, replying "7 million data" would be ambiguous unless the asker already knew this convention. A more useful answer would be to interpret it as "How much?" and give an answer in terms of a measurement of amount: "Nearly 220 petabytes".

I'm a strong proponent of data as a mass noun, taking the singular in grammatical usage ("the data shows us something"). Use of data as a plural ("the data show us...") seems pretentious and pedantic, as if to make a show of your knowledge that in Latin, data is a plural form of datum.

I have several reasons for being stubborn about data as a mass noun:

  1. Datum is a reference line in a mechanical drawing. More than one of these may be called data, if you must show off your knowledge of Latin, but I think in this case they'll usually be referred to as datum lines.

  2. If you can tell me how many data you have, then I will use plural verbs to refer to your data, but as long as you need quantitative units to tell me the size of your data, then I will call it a collective singular: e.g. "There is too much data to load into memory." I can't imagine anyone being comfortable saying, "There are too many data to load." Likewise, we say, "My hard drive holds up to 1 TB of data." It's nonsensical to talk about there being "1 trillion data in there."

  3. Even semantically, there is not an easy concept of singular data, as we currently use the term. No data point can stand on its own, but rather it derives meaning and significance from its context. What were the conditions of its measurement? What were the other measurements? Etc. It doesn't make semantic sense to refer to a single datum unless it has that specific meaning, as a reference point or baseline. What we mean by data as a plural is semantically different from what we mean by data as a collective singular.

It's a great example of a word in transition.

"Traditionally" it was the plural form of datum.

The fact is though, more and more "authorities" are using it as a singular.

"The Oxford English Dictionary defines it like this:

In Latin, data is the plural of datum and, historically and in specialized scientific fields , it is also treated as a plural in English, taking a plural verb, as in the data were collected and classified. In modern non-scientific use, however , despite the complaints of traditionalists, it is often not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which cannot normally have a plural and takes a singular verb. Sentences such as data was (as well as data were) collected over a number of years are now widely accepted in standard English."

In contrast to that:

"The official view from the Office for National Statistics takes the traditional approach. The ONS style guide for those writing official statistics says:

The word data is a plural noun so write "data are". Datum is the singular."

It's worth remembering the priceless words from an introduction to the OED: "This book is descriptive, not prescriptive."

Once again, "data" is a great example of a word in transition. A reminder that with questions of spelling and grammar, the concept of what's "right" is a difficult one. All truth is social, and all the more so with language correctness.

enter image description here

As addressed in the question linked, it depends if you use the uncountable noun, meaning "a collection of data", or the plural form of datum. If it is the former, then the verb would be singular, otherwise it would be plural.

Now I would say, that in most university papers, you would use the uncountable singular form. The exception would be when data would describe an ensemble of measurements or when data is used in philosophy paper. (According to Wiktionary's definition.)