Digital Spell-Checking May Be Killing Off Words
The death rate of words has apparently increased recently while new entries into languages are becoming less common, both perhaps because of digital spell-checking, according to a Google-aided analysis of more than 10 million words.
More than 4 percent of the world's books have now been digitized, a trove that includes seven languages and dates back to the 16th century. All of this text offers new opportunities to study how language evolves.
Researchers analyzed English, Spanish and Hebrew texts from 1800 to 2008 that had been digitized by Google.
"We are now able to analyze language comprising not only the common words, but also the extremely rare words, and not just for yesterday but for yesteryear, and not just for yesteryear, but back to a time before most people can track their family lineage," said researcher Alexander Petersen, a physicist at the Institutions Markets Technologies Lucca Institute for Advanced Studies in Italy.
The scientists concentrated on fluctuations of how often words were used and how often they "died," or fell out of common use. [Dead Languages Reveal Lost World]
"Words don't actually die — they only disappear in a statistical sense," Petersen said. "Unlike animal species, which undergo irreversible extinction, words can come in and out of use. Thus, any reader that goes back and likes a word or phrase that is out of style, can conceivably resurrect its use. After all, our society is notably prone to fads which wax and wane."
The investigators found words began dying more often in the past 10 to 20 years than they had in all the time measured before. At the same time, they discovered languages were seeing fewer entirely new words emerging. They suggest that automatic spell-checkers may be partly responsible, killing misspelled or unusual counterparts of accepted words before they see print.
Another factor might be a natural bias toward shorter words for ease of communication, as well as the adoption of English as the leading language for science. Either or both of these considerations might help to explain why, for example, "X-ray" has outcompeted its synonyms "radiogram" and "roentgenogram."
At the same time, any new words that were born in the past 20 to 30 years have seen increased usage compared with newborn words before then. The researchers suggest modern newborn words likely correspond to recent popular social or technological innovations, such as "email," and thus probably will become core words that see wide use.
The research team also found that after 40 years, new words either saw enough use to get accepted into their language or largely got abandoned. This corresponds with the typical amount of time a word spends before it gets included in a standard dictionary. This is also close to the length of a human generation, supporting evidence that languages require only a single generation to drastically evolve.
This two-century span of data also revealed the way that international conflicts and other major social, cultural and political events can impact language. For instance, during World War II, the languages of participating countries apparently underwent significant changes, brought together as they were by a common event. In contrast, regions mostly isolated from the war, such as Spain and Latin America, were minimally affected. [The History of Human Aggression]
Petersen is fascinated by the possibility of teasing out new patterns from these data.
"Such word-frequency patterns might serve as a sociopolitical thermometer, to look back in time, and to also monitor current events," Petersen told LiveScience.
The scientists detailed their findings online March 14 in the journal Scientific Reports.
Copyright 2012 LiveScience, a TechMediaNetwork company. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.