How the MED was Written
by Michael Rundell

Introduction

1 Linguistic data

2 Guiding principles

3 People

4 The editorial process


 

Introduction

Creating a completely new dictionary is a great opportunity. It’s a chance to ask fundamental questions,
such as:

  • why do students consult dictionaries?

  • what reference skills and linguistic competence do they bring with them?

  • what types of information do they really need from their dictionaries?

  • what can we do to make this experience more rewarding for them (and as painless as possible)?

Writing a new dictionary is a great challenge, too: there are plenty of good learner’s dictionaries around, so we knew that there was no point in simply producing another "me-too" book.

There are three things you need in order to create a really good dictionary:

  • linguistic resources: corpus data, and the software needed for analyzing it

  • ideas: a coherent set of principles and objectives to guide the whole process and give it a clear focus

  • people: creative and highly-skilled lexicographers, to synthesize the data according to the agreed objectives

Each component is vital: you may have the best corpus and the best lexicographers in the world, but you won’t produce a good dictionary unless you apply well-thought-through principles to the complex process of analyzing corpus data and converting it into useful, relevant, and easy-to-use dictionary text.

Let’s look at these three aspects in turn.

1 Linguistic data

Like all good dictionaries, the MED is based on a large corpus. Our editors had access to around 200 million words of authentic English, representing a wide range of text-types (including novels, newspapers, academic writing, and recorded conversation) from all the main varieties of English.

Analyzing a corpus is normally done with "concordancing" software, and this approach was used by our lexicographers too. Concordances help us to discover the most typical ways in which words behave and combine with other words.

But lexicographers are facing "information overload": nowadays, when we search our corpus for a common English word (such as agree, bright, or consequence) we have access to thousands (sometimes tens of thousands) of instances of the word. Since it is really not possible to reliably analyze so much data for a single word, we need new programs that will do some of the work for us. The MED’s editorial team were fortunate in having access to the most advanced language-analysis software currently available: a "lexical profiling" program that searches the corpus and compiles a summary (or "Word Sketch") of the key features of a word’s behaviour. The resulting profiles were supplied to our team through a collaboration with the University of Brighton’s Information Technology Research Institute, an internationally known centre for computational linguistics. Read more about 'Word Sketches' here.

This combination of corpus data and state-of-the-art software has given us the basis for an unrivalled description of the way words behave and combine with one another. Reliable word frequency data underpins decisions about which words, meanings, and phrases to include in the dictionary and about the order in which information is supplied in the more complex entries. And the Word Sketches have enabled us to provide a uniquely rich account of collocations in English: essential collocates are shown (as in all good learner’s dictionaries) in the body of the dictionary entry, but the MED also lists thousands of strong collocates in its 450 special "Collocation Boxes".

2 Guiding principles

We knew that we had two major tasks if we were going to produce an even better dictionary. First, we would have to do all the "ordinary" things extraordinarily well: that is, providing really good, easy to use definitions, example sentences that are both natural and pedagogically useful, and an account of syntactic and collocational behaviour appropriate to learners at the advanced level. But secondly, we knew we would have to give users much more than this, by going into areas where dictionaries had not yet ventured.

The starting point for the MED team was the belief that "the customer is always right" – or to put it another way: if students go to their dictionary and they can’t find and use the information they need, then it is not the students’ fault but the dictionary’s fault. Arising from this fundamental belief come a number of guiding principles:

  • in paper dictionaries, space is very limited so it should not be wasted telling students what they already know or telling them things they do not need to know

  • consulting a dictionary in another language is a form of reading comprehension, with a certain amount of "guessing from context": therefore, we as editors must ensure that using the dictionary is as simple and straightforward as possible

  • most students are not interested in dictionaries per se, but in using the dictionary to find solutions to individual communicative problems: therefore, we need to have a clear idea of what students are looking for when they go to their dictionary

  • students use dictionaries for different purposes at different times, depending on whether they are in "receptive" or "productive" mode: therefore, it is helpful to make a distinction between information that meets users’ receptive needs and information useful for successful language production

What makes the MED really different – and, we hope, uniquely useful for learners – is this clear distinction between productive and receptive information types. The words in the dictionary are clearly divided into two main classes: core vocabulary and more peripheral items. To elaborate:

core (or productive) vocabulary

  • the 7500 or so most central and frequent words in English

  • words that are likely to be found in most types of text (e.g. in fiction, newspapers, academic discourse, or conversation)

  • words that are likely to be needed in productive tasks (writing and speaking) as well as in reading and listening

These words are shown in red in the dictionary, and each headword gets a frequency rating: three stars indicates a word of very high frequency, in the top 2500 most common words in English (e.g. big, place, or change); two stars means the word is still very common but slightly less central than a three-star word (e.g. confident, favourite, or grasp); and one-star indicates a word of medium frequency that is nevertheless worth learning (e.g. confidential, feasible, or gamble).

non-core (or receptive) vocabulary

  • about 40,000 words

  • many are found in only one specific type of text (e.g. technical writing, literary fiction, or informal conversation)

  • words that are needed mainly for receptive purposes

  • words that the learner needs to find out about for a specific task, but does not usually need to learn

  • words such as: amplitude, bifurcate, cantankerous, and degenerative

The core vocabulary items (the "red" words) are given detailed treatment, with these main characteristics:

  • "Meaning Menus" at complex words, to help users find the right meaning fast

  • clear, open layout, with each new meaning shown on a new line

  • full information on syntactic behaviour, using clear codes backed up by numerous examples

  • detailed information about collocates, often with a "collocation box"

  • detailed description of meaning, with subsenses showing semantic or pragmatic nuances

  • usage notes giving information on synonyms and common errors

The non-core (or "black") words aim to give learners the information they need – and no more. Typical black words provide help with:

  • spelling and pronunciation

  • basic grammar (word class, and transitivity or countability)

  • meaning, with short simple definitions

These black entries are usually very short, and this has two great benefits for the learner: first, we can include a lot more vocabulary (so students’ chances of finding the words they need are much higher), and secondly the entries are easy and quick to use – allowing users to spend as little time as possible in the dictionary.

3 People

In the days of Samuel Johnson, dictionaries were often compiled by a single writer. Nowadays, producing a new dictionary is a highly complex operation that can involve literally hundreds of people.

The MED is the work of a skilled and creative editorial team, backed up by proven expertise in project management and IT resourcing, and benefiting from the advice of academics, teachers, and language learners from all over the world.

  • the editorial team: the MED is the first new learner’s dictionary of the Internet age. New technology, especially the arrival of email, enabled us to staff the project with editors working mainly from home – and "home" could mean anywhere from San Francisco to Edinburgh to Sydney. We had two main teams, one in the US and the other in the UK, and they included many of the best lexicographers in the business. Editors worked on their own computers, using a specially customized program for writing dictionaries onscreen, a powerful corpus-querying program, and a set of Word Sketches. Click here to read about a day in the life of a lexicographer.

  • project management and IT: this side of the project was taken care of by Bloomsbury Publishing Plc, whose reference division had already produced – in association with Microsoft – the highly-praised Encarta World English Dictionary (1999). The complex process of sending text files all over the world, monitoring their progress, and integrating them into the developing database benefited enormously from Bloomsbury’s expertise and experience in this field.

  • advisors: at every stage of the project, the editorial team was in close consultation with our panel of advisors: this group, led by Professor Michael Hoey, was made up of practising teachers, writers of ELT materials, and academics specializing in relevant disciplines such as second language acquisition, language teaching methodology, and discourse analysis. We benefited too from advice, suggestions, and constructive criticism from numerous teachers and students from all over the world – many of whom tried out sections of the dictionary text as it was still evolving.

4 The editorial process

Finally, a few words about how the dictionary was actually compiled. Writing the MED took a little over three years. The process can be divided into four main parts:

  • research and development

  • piloting

  • main writing and editing stages

  • finalization

Research and development: during the first three months of the project, a small group of experienced editors developed and tested ideas, with the objective of finding better ways of meeting students’ reference needs.

Piloting: next, we started producing samples of text to test the feasibility of the ideas we had developed, and shared these with our advisors and with selected groups of teachers and learners in several different countries.

Writing and editing: this is the main stage in any dictionary project, and usually involves dozens of people in several different roles. On the MED project there were several innovations in the way we worked: (1) We used a separate team to write entries for the 7500 core vocabulary items – these are usually the most complex words to describe, and require high levels of lexicographic skill.  (2) We also had a small team – just three people – working separately on the "function words": the 200 or so basic grammatical words in English (words like out, of, but, and would).  (3) We used a set of about 75 "Template" entries to help our geographically-dispersed team of editors achieve a high degree of consistency when writing entries for words in specific categories (such as Animals, Trees, Body Parts, and Illnesses).  (4) Most importantly, for every stage and every component of the project, there was a British English phase and an American English phase: dictionary text went back and forth across the Atlantic repeatedly, so that a "dual-track" database was gradually built up as the project progressed.

Finalization: this is the stage where all the components of the dictionary are brought together – not just the A-Z text, but also the various usage notes, the Collocations boxes, the unique boxes on metaphor and academic writing skills, the Language Awareness pages written by academic experts, and the hundreds of illustrations and cartoons. All of these elements had to be moulded together by Bloomsbury’s IT team – and the dictionary was at last ready for the printers.