word of the month
A new word is born: How do new words make it into dictionaries?
New words are entering English all the time, as people creatively exploit the building blocks of the language in order to find ways of representing new concepts. With electronic communication and the World Wide Web now integral features of everyday life, new words and expressions occurring in both written and spoken English have a bigger platform for usage and propagation than ever before. Some of these new words will of course wither away as the concepts they represent fade in significance. Others stay the course and become long-lasting or even permanent features of the language.
Perhaps the greatest accolade for any new word, is its formal recognition through entry into a dictionary. For many, the perception is that any word which has gained enough currency to be officially recorded is a 'proper' word, here to stay for the use of future generations. So just how many hoops does a new word or expression have to jump through to be awarded the honour of a dictionary entry? In this month's article we investigate some of the criteria dictionary makers use to decide at what point a word should make it into their official description of English.
In the 1930s, editorial staff at the dictionary publisher Merriam-Webster were responsible for one of the most famous errors in lexicographic history, when they created an entry for the word dord in the second edition of their New International Dictionary.
A chemistry editor had submitted a slip which read "D or d, cont./density." What the editor meant was that both lower and upper case 'd' can stand as an abbreviation for 'density'. The information was misinterpreted however, when it was misdirected to a department that dealt with words rather than abbreviations. A subsequent editor thought that the phrase 'D or d' was a single, run-together word dord, which meant 'density', and created a corresponding entry. The word appeared in the dictionary in 1934, and it was not until five years later, when an editor noticed that dord lacked an etymology and investigated, that the error was spotted.
In fact, over the history of lexicography, these kind of errors are incredibly rare, even in the days when lexicographers worked primarily on paper, a testimony to their thoroughness and precision. What the case of dord does highlight however, is that when writing a dictionary there is absolutely no substitute for researching evidence of a word's actual usage. For a word to enter a dictionary, it must be seen to be used, and widely. Early lexicographers had to rely on their own knowledge, often including things that, quite literally, caught their eye. The world of lexicography is very different in this electronic age, where dictionary-makers have access to vast quantities of searchable language data, and can therefore base their criteria for inclusion on significant amounts of hard evidence.
The Oxford English Dictionary (OED), begun in 1860 and currently containing over 300,000 main entries, is universally regarded as the definitive description of the English language. For many, the perception is that any word which has made it into the OED is a bona fide member of the English lexicon. So just how do the OED's editors assess new words for potential inclusion in the dictionary?
Candidate new words are picked up by the OED's Reading Programme, a group of around fifty readers who are employed to look at a range of contemporary printed (paper and electronic) material, including novels, television scripts, song lyrics, newspapers and magazines. These sources are searched for words which are entirely new, or new uses of existing words, and findings are recorded in a database of citations. The Reading Programme is supplemented by sources such as Internet databases, subject-specific glossaries, and even the dictionary's so called 'paper files', which include submissions from members of the public throughout the English speaking world. A candidate word is searched in all these sources with respect to frequency of occurrence, time-span (i.e.: whether the word has cropped up over a number of years) and variety of sources (i.e.: whether the word appears in printed, electronic and spoken form in a range of text types and contexts). The rule of thumb is that a word can be included in the OED if it has appeared at least five times, in five different sources, over a period of five years.
The five-year constraint can lead to there being a significant time lag between the widespread appearance of a word and its formal recognition in the dictionary. Exceptions are occasionally made (e.g.: chav, which first emerged in 2004, has already made it into the Oxford dictionaries.) Other Oxford dictionaries, such as the Concise Oxford Dictionary and the New Oxford Dictionary of English, undergo regular revision for the release of new editions and so are more flexible about the inclusion of new words. One of the most recent publications is the latest edition of the one volume Oxford Dictionary of English, published in August 2005. This already contains entries for some of the new words we've discussed in MED Magazine and the Macmillan Dictionary Word of the Week, including chugger, Hinglish, podcast, supersize and wiki.
So how do other dictionary makers decide what words to include in their publications?
The method of what is referred to as "reading and marking", the process of scouring a range of published material (including books, newspapers, magazines and electronic publications) for information about new words and new uses of existing words, is pretty much standard practice among dictionary makers. Some publishers also use spellcheckers based on their own dictionaries, which when applied to new texts will provide a list of items not included in their dictionary. Words of interest are then stored in a database of citations, including the word itself, an example of how it is used, and bibliographic information about the source from which the word and the example were taken. Citations are also available in the form of searchable language corpora drawn from a large variety of sources.
Before a new word can be added to the dictionary, editors must be sure that it has enough citations to show that it is widely used. But the number of citations is not the sole criterion for entry into a general dictionary. Citations must come from a wide range of different types of publication, so that a word may be rejected if all of its citations come from a single source or only from specialized publications that reflect the terminology of a particular subject field.
A word must also be seen to be used over a considerable period of time, with citations spanning a number of years, usually somewhere between two and five. This last criterion is probably the most flexible however. Occasionally, a word springs up which is instantly prevalent and seems likely to last, and in this case exceptions are made. A good example is the term AIDS, which emerged in the mid-eighties and entered the dictionary almost immediately.
The size and type of dictionary that is being produced or updated may also have an influence on the number and range of citations a word needs in order to be admitted. If a dictionary has more limited space, only the most commonly used words can be entered, and so in order to find a place, a word must be supported by a very significant number of citations. If a dictionary is larger and therefore has room for more words, or only covers words from a particular subject field, terms with fewer citations or with citations from a more restricted range of sources might still be included.
There are no hard and fast rules about what kinds of words dictionary makers will include or exclude. In practice, these considerations are largely dependent on the size of the dictionary and its intended audience. General dictionaries however, do not routinely include proper names (although some include geographical terms, often in the form of an appendix). They don't, on the other hand, deliberately exclude rude or offensive words, but highlight them as such. A good general dictionary does not aim to be prescriptive, but to simply describe the language as it is being used at any particular time. As well as charting changes in meaning (e.g. in 1911, the word gay simply meant 'happy' but would never be defined like this today), dictionaries may also deliberately include words that reflect current thinking and changing attitudes to language, particularly in sensitive areas such as race, disability and gender (e.g.: several years ago, dictionaries would have recorded the word handicapped as the standard term for describing someone with mental or physical disabilities, but this has now been superseded by words such as disabled).
The OED is the only dictionary which retains entries for every single word that has ever existed in the English language. Most other dictionaries routinely exclude words that are dated or obsolete. For instance, the word Chunnel, an informal term for the Channel Tunnel dating from 1928, has pretty much disappeared from use now the tunnel is a reality, and so although still appearing in the OED, would not find a place in the latest edition of the Concise Oxford Dictionary.
Publishers of learner's dictionaries monitor the development of new words in English in much the same way as is done for native speaker or other specialist dictionaries, capitalising on the vast amounts of searchable language data on the Internet. They gather data from printed and online texts and a range of other sources, such as special-subject glossaries and online encyclopaedias, to create a huge database of potential headwords.
However, since the main objective of learner's dictionaries is to deal primarily with the core vocabulary of English, the frequency of occurrence of a particular word is a very important factor in determining its potential inclusion. Equally important is what is referred to as 'dispersion', i.e.: how well spread the candidate word is across different text types. If a word is frequent in one particular field but barely visible elsewhere, it is unlikely to make it into a learner's dictionary.
Learner's dictionaries do sometimes however include words from special subject domains if user research has identified these areas as particularly useful for learners. Typical areas are business and new technology. The Macmillan English Dictionary includes words from the linguistics and grammar domain that have special relevance to people involved in language teaching and learning.
Perhaps the main consideration in deciding what words to include in a learner's dictionary, is the level of the learner the dictionary is intended for. An advanced learner's dictionary will offer broader coverage of English relative to lower level dictionaries, both in terms of the range of words selected for inclusion and the level of detail of the entries. For any particular word, an advanced level dictionary usually includes more derivatives, more idioms and fixed expressions, and gives a more detailed classification of senses. This means that generally speaking, new words are more likely to make it into higher level dictionaries, where there is more scope for extending coverage and the primary aim goes beyond the description of core vocabulary. One approach adopted by the editors of the Macmillan English Dictionary for maximising the amount of headwords they could include, was to draw a distinction between entries for 'decoding', (understanding spoken and written English) and 'encoding' (writing/speaking English). Rarer words, words which are more likely to be encountered in reading rather than used by students in their own writing or speech, are given more compact entries, thus enabling the dictionary writers to include more of them.
For more information about new and topical words and phrases, read Kerry's Word of the Week articles on the MED Resource Site.