July 24, 2016 / jpschimel

Language and language change: what are the “data”?

Language changes. How else do you explain George Bernard Shaw’s famous quote “England and America are two countries divided by a common language.”

Language changes with time and distance. Words are created, lost, and alter meaning. Particularly when words are adopted from another language, they often shift meaning and usage. In English, a common battle is whether the rules from that other language still necessarily apply to the word as used in English. If we adopt a Latin word, must we still use Latin rules?

“Data” is the word that has probably been fought over the most in science. Many (though a decreasing number) feel that those who would ever use the word “data” as a singular noun are ignoramuses who are debasing the language.

“Data” (in English) is derived from a Latin word and in Latin it is the plural of “datum.” In Latin, therefore to ever use “data” as a singular would be a complete and gross error. But is it an equal error in English? According to the Oxford English Dictionary (the OED), the Latin word means “given, that which is given, neuter past participle of ‘dare’ to give.”

That isn’t the meaning we apply to the word in English, and particularly not in science.

So is the English “data” the same word as the Latin word “data”? No, it isn’t. The OED gives our definition as “In pl. Facts, esp. numerical facts, collected together for reference or information.” So, should the same rules apply?

Some argue yes—that since data is originally a Latin word, then Latin rules should always apply. But standard English usage often treats “data” as a mass, or collective, noun—it is the collection of facts.

In English usage, collective nouns are treated as individuals. “The population is…,” is correct English usage; to say “The population are…” would be incorrect.

So in dealing with the word “data,” we are left with two issues. The first is whether it is ever correct to use data as a singular, collective noun? The second, however, is whether you should?

Based on the OED, Chicago Manual of Style (the CMS), and other sources of grammatical wisdom, you can correctly use “data” as either singular mass noun or as a plural, depending on your meaning:

The plural form: “The data indicate…” implies that it is through evaluating each datum and then synthesizing that information that you establish what is indicated.

The singular form: “The data indicates…” implies that after aggregating the data into a single mass, the whole data set acting as a single entity indicates something.

Don’t forget though, that if you have a single fact, it remains a datum (or a data point). You can’t have “a data.” Don’t use “data” as a true singular.

But, then there is the issue, not of what is grammatically correct, but of what people think is grammatically correct. There remain those who reject the mass noun use, and they tend to be senior colleagues—people you might want to impress. Although the OED and CMS acknowledge and accept the mass noun usage, the OED notes “However, in general and scientific contexts it is still sometimes regarded as objectionable.” and the CMS says “In formal writing (and always in the sciences), use data as a plural.” For me, the ability to use “data” as a mass noun is a tool too useful to ignore,  but it is one that you should use thoughtfully and deliberately, and some conservatism is wise.


Footnote: The OED does note the use of “data” as a count noun with a 2010 citation stating “These datas were likely not missing at random.” But please don’t do that. Not only does it sound horrible and wrong, but almost every reader will be sure that it is. 


  1. Alon / Jul 26 2016 5:33 am

    In Latin, therefore to ever use “data” as a singular would be a complete and gross error.

    Or a Hellenism 🙂

    (In Classical Greek, plural neuter subjects take a verb in the singular. AFAIK, nothing like that ever happens in Latin, though.)

