Corpus linguistics for ELT

Research and Practice


Ivor Timmis


Routledge Corpus Linguistic Guides

London: Routledge, 2015

Paperback. xv+213p. ISBN 978-0415747127. £29.99


Reviewed by Claire Tardieu

Université Sorbonne Nouvelle – Paris 3



Timmis’ s Corpus linguistics for ELT : Research and Practice appears as a milestone in the academic literature that addresses Corpus linguistics and Foreign Language Teaching. As a matter of fact, the main concern of teachers in the know about this relatively new field of linguistics is to establish connections with their teaching. Although they are well aware that some applications of Corpus Linguistics could be particularly efficient in the language class, they often feel they lack the ‘How to use’ guide.

The purpose of Timmis’s book consists precisely of presenting the state of the art on Corpus Linguistics and of explaining how it works before making proposals for the language class. “This book, he says, does not propose, then, to ‘revolutionize’ language teaching through Corpus Linguistics. It does, however, seek to show how the long proclaimed potential of corpora can significantly contribute to the evolution of language teaching” [8]. It should also be noted that three kinds of activity are offered to the reader throughout the book: Corpus search, Corpus question, and discussion, which makes the reading even more instructive.

The 213 pages are divided into 9 chapters including the introduction and the conclusion. In the introduction, Timmis gives definitions and presents different types of corpus. He also explains what we can do with them. The second chapter focuses on the very building of one’s own corpus. Chapter 3 deals with corpora and lexis, chapter 4 with research and grammar, chapter 5 with spoken corpus research, each of these chapters comprising a specific section about implications for pedagogy. Chapter 6 examines teaching-oriented corpora and data-driven learning (DDL). Chapter 7 deals more specifically with English for Specific Purposes (ESP) with a triple focus on Academic English, Engineering English and Business English. Chapter 8 called Corpora in Perspective tackles the issue of the limitations of corpora and English as a Lingua Franca as well as classroom options for models of English. Chapter 9 concludes by addressing a series of questions the book aims to help us answer in a positive way. There are two appendices (the second one consisting of a list of existing corpora) and a very useful index for easier reading.

The Introduction provides the reader with striking mentions such as that of Sinclair (1991) “who likened the value of corpora for linguistics to the value of the telescope for astronomy” [7]. Timmis quotes Brazil’s definition of a corpus: “a collection of used language” (1995 : 24) and insists on the fact that “language in a corpus is naturally occurring” [2]. He then specifies corpora such as the BNC and COCA (large general corpora) or CANCODE (spoken corpus), MICASE, CANBEC, and the Hong Kong Engineering Corpus (ESP corpora), Voice or ICLE (learner corpora). Corpora can benefit FLT in three ways:

First of all they can inform ELT reference works and ELT materials, syllabuses, tests, and course books. They can also be used directly to teach English in the classroom and finally, when it comes to Learner corpora, to influence the very teaching of the language.

As for building one’s own corpus (Chapter 2), Timmis ponders over seminal questions we should have on our minds: What for and how? What language use are we trying to represent? [15] What kind of genres and contexts? And above all, who are the potential users of our corpus? He refers to de Cock (2010) who distinguishes between two types of transcriptions: broad and narrow depending on the more or less precise and complex use we intend to make of them. Encoding can specify metadata such as speakers, date and source of text. Special tagging can also be made for lexical or grammatical search for instance with online tools such as CLAWS. Timmis then explains what the three basic analytical operations are: frequency counts, concordance and collocation [17].

Frequency lists are easy to generate. Once you have generated the frequency list of an item you may want to know more about the use of the item using the concordancing tool which “displays all the instances of the word or phrase you are looking for in the corpus with a limited amount of co-text either side of the target word” [18].

Finally, you may focus on specific collocations and check whether they are strong or weak. More is said about this particular functionality in Chapter 3: Corpora and Lexis. What is at stake in this chapter is the reappraisal of the status of lexis and above all of the relationship between lexis and grammar [23]. A lot of teachers will appreciate this middle way positioning and agree with Wilkins (1972 : 111) that “without grammar little can be conveyed; without vocabulary, nothing can be conveyed“ [23] or with Lewis (1993 : iv) : “Language consists of grammaticalised lexis, not lexicalised grammar” [23].

All teachers will certainly be interested to know that according to Nation (2013) 80 to 90% of language in conversations or books are composed of the 2,000 most frequent words [42]. O’Keeffe, McCarthy & Carter (2007 : 48-49) consider that a receptive vocabulary of 5-6,000 words corresponds to upper intermediate level [42].

These significant findings in terms of pedagogical implications have been made possible by the use of frequency analyses. In the same line, Martinez & Schmitt (2012) have drawn a list of 505 frequent phrasal expressions, which happens to be part of the list of the 2,000 most frequent words [51]. Timmis works out the fact that corpus-informed teachers would certainly understand the relevance of an integrated view of vocabulary and accept to move towards a syllabus made of words, collocations and lexical chunks [53]. The expression “corpus-informed” rather than “corpus-driven” approach chosen by Timmis reveals his sincere pedagogical concern as well as his thorough knowledge of the profession and of the field. It should also be said that the language used by Timmis as well as the explanations he gives are always clear and easy to follow, which is certainly one of the assets of the whole book.

In the continuity of Chapter 3, Chapter 4 directly addresses language curricula and course materials. All teachers who are old enough to know what the audiovisual method was about would perfectly understand Biber & Reppen (2002)’s remark about the overuse of the progressive form in the English class regarding corpus-attested frequency [60]. Conversely, the get passive seems to have received too little attention in ELT materials. One may conclude that Timmis recommends that course books follow the frequency rates observed in natural language corpora, yet, this is not the case. In fact a clear distinction is made: “The corpus-informed approach […] should not dictate what we do. Alongside the frequency of a structure, we need to take into account its difficulty and usefulness for a specific group of learners” [61]. In other words, a corpus-informed approach could have an awareness-raising effect on teachers and help them lay a critical eye on the materials they are provided with. Timmis mentions the ‘used to’ form to refer to the past. Why is this form often emphasised in coursebooks? Timmis suggests that some items lend themselves to PPC (present – practice – produce) better than others. To put it differently, they are pedagogically rewarding, easily “packaged” in the classroom like “Grammar McNuggets” as Thornbury (2000) calls them. Much as some lexical items can be neglected, some grammatical structures may be unduly discarded such as the new quotative verbs (be, like, go, be all) (Barbieri & Eckardt, 2007). In actual fact, if one is to reduce the discrepancy between the language used in everyday life and the language taught in the classroom, one had better consult natural language corpora.

An interesting point is made about connecting Grammar and Lexis [71] when we learn from corpus analysis that verbs like “bet, doubt, know, matter, mean, mind, reckon, suppose, thank” occur over 80% of the time in the present tense while others such as “exclaim, eye, glance, grin, nod, pause, remark, reply, shrug, sigh, smile, whisper” occur mostly in the past tense [71]. Once more, Timmis refrains from adopting a dogmatic position that emphasises conformity by saying: “It seems reasonable to suggest that these corpus-based descriptions should at least inform the grammatical descriptions we give in the classroom, though we acknowledge that there is sometimes a need to simplify and perhaps over-generalise to arrive at ‘workable’ pedagogic rules” [77]. This statement echoes back to the 1990s when the French Educational Authorities implemented enunciative grammar at all levels (with the recommendation to adapt the metalanguage to each level) and advocated that nothing should be taught about the foreign language that would become untrue in the long run.

To conclude on this issue, Timmis predicts three possible evolutions for the teaching of grammar in the 21st century: there will be a shift from “monolithic description of English grammar” to “register specific descriptions”; the teaching of grammar will be “more integrated with the teaching of vocabulary”; emphasis will shift from “structural accuracy to the appropriate conditions of use for alternative grammatical constructions” [77].

Chapter 5 [81] examines a relatively recent phenomenon: spoken corpus research. Spoken corpora are of three types. They may be spoken components of large general corpora such as the BNC or the COCA, or specific: the Limerick Corpus of Irish English, The Santa Barbara Spoken American English, or the Longman Corpus of Spoken American. A third category concerns genre-specific spoken corpora with, for instance, the Switchboard Corpus (recorded telephone conversations from the early 1990s), the Corpus of American Soap Opera. Pragmatic categories have been identified in relation with the most frequent chunks in CANCODE: discourse marking, face and politeness, hedging, and vagueness and approximation. We can easily imagine how our students could profit from integrating such findings when they have to take the floor. They could also be profitably shown the importance of ellipses as well as tails in spoken interactions [94-95].

Timmis does not hesitate to ask the right questions: Is the item “useful, frequent, complex? Socioculturally appropriate? What will the spoken language feature enable us to do communicatively?” [104] or to answer them by suggesting we should prioritise very common, socially unmarked spoken lexis [106]. Having discussed Corpus linguistics and its implications for the classroom in the first five chapters, Timmis continues with a different type of corpus: learner corpora.

Chapter 6 : Corpora and the classroom

How to design a learner corpus and for what purpose? Timmis provides the reader with operating instructions regarding designing appropriate criteria: the learning environment, age, proficiency level of the learners, mother tongue, stage of learning, nature of the task, topic genre, setting, use of reference resources, etc. [120]. One main distinction is made by de Cock (2010) between Mono-L1 and Multi L1 learner corpora. He refers to the Japanese EFL Learner corpus as to a Mono-L1 corpus and to the English Profile (CUP) or the Louvain ICLE or LINDSEI as to Multi L1s. The Multi L1 corpora enable users to identify typical errors made in English according to the L1 and to compare the English of speakers of different L1s. This suggestion is in keeping with what Granger already suggested in 1994 that a lot of the lexical problems encountered by learners were L1 specific [126].

Timmis also refers to teaching-oriented corpora, that is “corpora exploited for pedagogical purposes or designed for pedagogical purposes” [128]. For that matter, he refers to ELISA corpus (interviews of native speakers of different varieties of English), the SACODEYL corpus (a collection of teen talk in seven European languages, including English), and the BACKBONE (video-recorded interviews). These teaching-oriented corpora can indeed provide authentic material adapted to the classroom (especially teen talk). Yet, what teaching method is the most appropriate?

A whole range of pedagogical implications of corpus linguistics is summed up in what is known as Data-driven Learning (DDL). DDL involves a student-oriented approach in which the student becomes himself a researcher and is invited to make discoveries. According to Timmis, “the twin foundations of the rationale for DDL are authenticity and autonomy” [135]. DDL involves inductive reasoning and becomes helpful for lexical learning and can also be used to improve writing. Boulton (2009) advocates the fact that DDL is within reach of lower level learners provided that it is used judiciously [139] and according to Timmis, DDL should be part of the repertoire of teachers and material writers. At this point, Natalie Kubler’s work with her undergraduate and postgraduate ESP students in particular could have been mentioned. Both her seminal research and her pedagogical experience would have provided further concrete examples.

Chapter 7 : Corpora and ESP [146] may be particularly interesting for higher education teachers. Indeed this chapter focuses on three types of English: English for Academic Purposes, (EAP) Engineering English and Business English. Regarding EAP, Nesi (2014) identifies 4 types :

·                    Corpora of ‘expert’ writing

·                    Learner corpora

·                    Corpora of university student writing

·                    Spoken academic corpora

Timmis lists the British Academic Written English Corpus, the Michigan Corpus of Upper Level Student Papers (MICUSP), the British Academic Spoken English (BASE), the MICASE (the BNC Spoken Academic component) , or the English as a Lingua Franca in Academic Settings (ELFA) which comes from Finland. In that case, DDL can help the students to improve their writing skills and gain overall writing confidence. They will be invited to focus on word frequency, lexical collocations and academic formulas. Regarding academic formulas, Simpson-Vlach & Ellis (2010) distinguish between referential expressions, stance formulas and discourse-organizing expressions [156].

In Chapter 8 : Corpora in perspective, Timmis mentions some classroom ideas and quotes Adel (2010)’s questions: “What do Academic writers say when a) they give an example, b) refer to other texts or researchers, c) introduce the topic, d) start their conclusion section?” [177]. Timmis is aware that DDL should be used in an adequate way according to the capacity of the learners, otherwise, learners may get lost in concordance lines… Yet, he has no doubt that giving students the opportunity to grasp at corpora will make them more aware of language uses and more motivated in their learning.

In Chapter 9 [198], Timmis invites us to explore this issue in particular: What methodologies might be suited to help students come to terms with the inherent context-dependence of language use? Such a question is not a new one (see the communicative approach) but corpus linguistics and DDL seem to offer one workable solution.

To conclude, let us say that this book covers a great range of issues in relation with Corpus Linguistics and its pedagogic implications. It is very well documented and illustrated. The reader will no doubt find a lot of information about the topic. He will also be constantly invited to do “exercises”, to ask and answer relevant questions, and to design class activities. It is a sort of “hands-on” reading combining research content and pedagogical issues. Of course, Timmis does not do the entire job for us. And it would be advisable to conduct experiments in the classroom at different levels of the curricula to get some more data on Corpus Linguistics and pedagogical issues. This book offers a solid background for that.


