From ‘car motor’ to ‘fishing boat’:
Tracking intermediate learners’ phraseological development by use of association measures
Kaja Helchie Sletten Østmo Evang
A thesis presented to the Department of Literature, Area Studies and European Languages
University of Oslo
November 2019
II
III
From ‘car motor’ to ‘fishing boat’:
Tracking intermediate learners’ phraseological development by use of association measures
Kaja Helchie Sletten Østmo Evang University of Oslo
November 2019
IV
© Kaja Helchie Sletten Østmo Evang 2019
From ‘car motor’ to ‘fishing boat’:
Tracking intermediate learners’ phraseological development by use of association measures
Kaja Helchie Sletten Østmo Evang
http://www.duo.uio.no/
Print: Reprosentralen, University of Oslo
V
Abstract
Making use of a subset of a new, longitudinal learner corpus, Tracking Written Learner Language (TRAWL), the present study investigates t-score and MI score in bigrams produced by Norwegian intermediate learners of English. Association measure is a way of calculating the collocational strength and certainty of words in a word pair (bigrams). Two such measures, MI score and t-score, have by previous studies been shown to be reliable measures for telling learner language apart from native language. Durrant and Schmitt (2009) found advanced learners to produce more bigrams with a high t-score, and less bigrams with a high MI score, than learners did. Granger and Bestgen (2014) found less advanced learners to use more bigrams with a high t-score, and less bigrams with a high MI score, than more advanced learners. The present study investigates learner language from a lower proficiency level than has previously been available. Learner language from a group of Norwegian pupils is compared longitudinally at three stages of development (8th, 9th, and 10th grade), and pseudo-longitudinally to older pupils, and to Norwegian and native students at university. The pattern of association measures found for the intermediate learners does not match the previous findings for advanced learners. Instead, a U-shaped pattern is
uncovered, in which the youngest pupils produce more bigrams with high MI scores and high t-scores than the same learners do one and two years later. At higher levels of proficiency, the scores increase again, and here, the pattern resembles the previous findings, in that the proportion of high t-scores rises first, and only towards the later stages of proficiency does the proportion of high t-scores fall, and the proportion of high MI scores rise. The study considers the possibility that this U-shaped pattern represents a general L2 development, corresponding to the processability model of grammar, in which case there are direct consequences for teaching, as it would mean that the high frequency, high t-scoring bigrams would have to be learnt first, before they can be substituted by lower frequency, high MI scoring bigrams. However, this remains only a hypothesis, as it has not been shown that the U-shape is caused by general learner development. The hypothesis should be tested in a more rigorous study informed by SLA theory, following the trajectory of individual learners.
Keywords:
L2 English, phraseology, intermediate learners, longitudinal corpus, learner corpus research, second language acquisition, contrastive interlanguage analysis, word pairs, n-grams, bigrams, collgrams, collocation, association measure, mutual information, MI score, t-score, Norwegian learners, Norwegian L1, TRAWL corpus, teaching phraseology,processability model, teachability hypothesis, U-shape
VI
VII
Acknowledgements
First and foremost, I would like to thank my supervisor Signe Oksefjell Ebeling for her patience and thoroughness, for all her invaluable feedback, for making time for me at inconvenient hours, and for the practical help with the POS-tagging of the corpora. To Hildegunn Dirdal, my heartfelt gratitude for letting me take part in the TRAWL project, for granting me access to the texts before completion of the corpus, for answering all my questions of which there have been many, and above all for reading parts of my thesis and providing constructive feedback. I would also like to thank the pupils who have taken part in the TRAWL project, without whom this study would have been impossible.
I would like to thank my employer, Ullensaker kommune, and the Kompetanse for Kvalitet- scheme, through which I have received funding for writing this thesis. I would also like to thank my colleagues and supervisors at my workplace, who have granted me their patience and curiosity throughout this process, as have my pupils. I will finally get around to marking those papers now. Thank you for your patience!
My most heartfelt appreciation to my family – to my father Einar, for inspiring in me the love and curiosity of language, to Hanne and Lise, for letting me have our shared cabin to myself every weekend this autumn, to my children, Sara, Ruth, and Rakel, who have shown
patience and understanding well beyond their years during these last months and weeks.
And to Marius – my editor, my first proof-reader, my chef and janitor and provider of emotional support, my solid rock in times of disheartenment, my keeper-up-of-spirits during long nights and early mornings. My best friend. My love.
Thank you.
VIII
IX
Table of Contents
Abstract ... V Acknowledgements ... VII List of Tables and Figures ... XIII Tables ... XIII Figures ... XIII List of Abbreviations ... XV
1 Introduction ... 1
1.1 Motivation ... 1
1.2 Aims and Scope ... 1
1.3 Research Questions ... 2
1.4 Background: English in Norway ... 3
1.5 An Overview of the Study ... 5
2 Theory, Definitions, and Previous Studies ... 7
2.1 Introduction ... 7
2.2 Corpus Linguistics and Learner Corpora: Definition, design, and desires ... 7
2.3 Second Language Acquisition ... 10
2.3.1 Interlanguage ... 11
2.3.2 Second Language Acquisition and Learner Corpus Research ... 12
2.4 Lexis ... 13
2.5 Phraseology ... 16
2.5.1 Words & co. ... 16
2.5.2 Two traditions... 16
2.5.3 Definitions of phraseological items ... 18
2.5.4 Previous studies on phraseology in learner language ... 20
2.5.5 How to teach phraseology ... 22
2.6 Three Previous Studies ... 23
2.6.1 Durrant and Schmitt 2009 ... 23
2.6.2 Granger and Bestgen 2014 ... 24
2.6.3 Bestgen and Granger 2018 ... 25
3 Material and Method ... 29
3.1 Material ... 29
3.1.1 The TRAWL corpus ... 29
3.1.2 The writing situation... 30
3.1.3 An overview of the texts in this study ... 32
3.1.4 The writing prompts ... 33
3.1.5 Reference corpora ... 35
3.2 Method ... 36
X
3.2.1 Contrastive Interlanguage Analysis ... 36
3.2.2 Variability ... 39
3.2.3 Variables and comparability of corpora ... 40
3.2.4 Effect of writing prompts ... 42
3.2.5 Statistical methods ... 43
3.2.6 Corpus Linguistics and Learner Corpora revisited: Method and material in this study 46 3.2.7 Tagging, text collection, preparation, and recall ... 50
3.2.8 Extracting bigrams ... 51
3.2.9 Precision ... 54
3.2.10 The curious case of the BT category ... 54
4 Results ... 57
4.1 Initial Observations: Before comparing scores ... 57
4.1.1 Lexical richness ... 57
4.1.2 Number of bigram types sorted by category and learner group ... 57
4.1.3 Tip of the iceberg: How much of the material has been investigated ... 59
4.2 Categories of MI scores and t-scores ... 60
4.3 Adjective + Noun (JJ NN) ... 61
4.3.1 MI scores ... 62
4.3.2 t-scores ... 63
4.3.3 Finer grained categories ... 64
4.3.4 Grids ... 65
4.4 Noun + Noun (NN NN) ... 66
4.4.1 Below threshold (BT) ... 68
4.4.2 MI and t-scores ... 68
4.4.3 Grids ... 70
4.4.4 Finer grained categories ... 71
4.5 Degree Adverb + Adjective (RG JJ) ... 72
4.5.1 Below threshold (BT) ... 72
4.5.2 MI scores ... 75
4.5.3 t-scores ... 75
4.5.4 Grids ... 76
4.5.5 The case of fun: An example of L1 transfer ... 77
4.5.6 The case of very: How to avoid general modifiers ... 80
4.6 General Adverb + Adjective (RR JJ) ... 83
4.7 Pupil P01041 ... 85
4.8 Impact of the Writing Prompts ... 88
4.8.1 Writing prompt score (WP) ... 88
4.8.2 WP scores of all the pupils and of P01041 ... 89
XI
4.8.3 WP scores and BT percentages combined ... 90
4.8.4 P01041 BT bigrams ... 93
5 Conclusion: Limitations, Implications, and Further Research ... 95
5.1 Independent and Confounding Variables... 95
5.1.1 Intra-learner variables ... 95
5.1.2 Inter-learner variables ... 96
5.1.3 General variables ... 96
5.2 U-shapes ... 98
5.2.1 A tentative explanation of the U-shapes ... 99
5.3 Revisiting the Research Questions: A summary of findings ... 101
5.3.1 RQ 1 ... 101
5.3.2 RQ 2 ... 102
5.3.3 RQ 3 ... 103
5.4 Recommendations for Further Research ... 103
References ... 105
Corpora and software... 113
Appendix 1: Complete pupil text, tagged ... 115
Appendix 2: Concordance plots... 116
a) 8th grade texts: “so fun” ... 116
b) 10th grade texts: “mother tongue” ... 117
Appendix 3: Pupil’s answer with two parts, complete ... 118
Appendix 4: 8th grade assignment (TEEN) ... 120
Appendix 5: 9th grade assignment (JIPS) ... 123
Appendix 6: 10th grade assignment (ONOF), ENG0012 exam 2017 ... 125
Appendix 7: Assignment codes and topics ... 127
Appendix 8: P01041 NN NN and JJ NN bigrams ... 130
Appendix 9: P01041 RG JJ and RR JJ bigrams ... 131
Appendix 10: Pupil P01041’s 8th grade NN NN bigrams... 132
XII
XIII
List of Tables and Figures
Tables
Table 1: Sub-sections of material, including reference corpora. ... 32
Table 2: List of assignment codes with number of answers. ... 34
Table 3: Intervals for categories of MI and t-scores (Granger and Bestgen 2014). ... 53
Table 4: Word types, tokens and types pr. 100,000 words. ... 57
Table 5: Number of bigram types ranging ≥5 texts and in total, absolute and normalised pr 100,000 ... 58
Table 6: Tokens of bigrams ranging ≥5 texts. ... 59
Table 7: Tokens of all bigrams. ... 59
Table 8: Intervals for categories of MI and t-scores (Granger and Bestgen 2014). (Repetition of table 3). ... 60
Table 9: Finer grained categories of MI and t-scores. ... 71
Table 10: 10th grade RG JJ bigrams. ... 81
Table 11: Suggested replacements for bigrams with very, 10th grade RG JJ... 82
Table 12: Bigrams produced by pupil P01041. ... 85
Table 13: Percentage of BT bigrams, all pupils and pupil P01041. ... 86
Table 14: Categories of Writing Prompt scores. ... 89
Table 15: Writing prompt scores, all pupils and pupil P01041. ... 89
Table 16: Overlap between BT percentage and WP score, all pupils and pupil P01041. ... 91
Table 17: BT bigrams with WP score 3 or 4 without obvious origins. ... 92
Figures Figure 1: CIA2 (Granger 2015a). ... 37
Figure 2: AntConc search interface. Results for *_RR *_JJ in 9th grade texts. ... 52
Figure 3: JJ NN bigrams, MI scores. ... 61
Figure 4: JJ NN bigrams, t-scores. ... 61
Figure 5: Scatter plot of 10th grade JJ NN bigrams. ... 64
Figure 6: Grids: 8th grade JJ NN bigrams... 65
Figure 7: Grids: 9th grade JJ NN bigrams... 65
XIV
Figure 8: Grids: 10th grade JJ NN bigrams. ... 65
Figure 9: Grids: Upper secondary, JJ NN bigrams. ... 65
Figure 10: Grids: ICLE-NO JJ NN bigrams. ... 65
Figure 11: Grids: LOCNESS JJ NN bigrams... 65
Figure 12: NN NN MI scores. Old categories. ... 67
Figure 13: NN NN t-scores. Old categories. ... 67
Figure 14: NN NN MI scores. Finer grained categories. ... 67
Figure 15: NN NN t-scores. Finer grained categories. ... 67
Figure 16: Grids: 8th grade NN NN bigrams. ... 70
Figure 17: Grids: 9th grade NN NN bigrams. ... 70
Figure 18: Grids: 10th grade NN NN bigrams. ... 70
Figure 19: Grids: Upper secondary, NN NN bigrams. ... 70
Figure 20: Grids: ICLE-NO NN NN bigrams. ... 70
Figure 21: Grids: LOCNESS NN NN bigrams. ... 70
Figure 22: RG JJ bigrams, MI scores. Old categories. ... 74
Figure 23: RG JJ bigrams, t-scores. Old categories. ... 74
Figure 24: RG JJ bigrams, MI scores. Finer grained categories... 74
Figure 25: RG JJ bigrams, t-scores. Finer grained categories. ... 74
Figure 26: Grids: 8th grade RG JJ bigrams. ... 77
Figure 27: Grids: 9th grade RG JJ bigrams. ... 77
Figure 28: Grids: 10th grade RG JJ bigrams. ... 77
Figure 29: Grids: Upper secondary RG JJ bigrams. ... 77
Figure 30: Grids: ICLE-NO RG JJ bigrams. ... 77
Figure 31: Grids: LOCNESS RG JJ bigrams. ... 77
Figure 32: Concordance lines of so fun in all texts from lower secondary school. ... 78
Figure 33: Concordance lines of very fun in all texts from lower secondary school. ... 78
Figure 34: Collocates for fun in the BNC. ... 79
Figure 35: RR JJ bigrams, MI scores. ... 84
Figure 36: RR JJ bigrams, t-scores... 84 Figure 37: A model of the trajectories of MI and t-scores throughout the informant groups.99
XV
List of Abbreviations
ADJ: Adjective ADV: Adverb
AJ: Adverb + adjective
BNC: The British National Corpus
BT: Below Threshold, i.e. less than 5 tokens in the BNC
CEFR: The Common European Framework for Reference for Languages CIA: Contrastive Interlanguage Analysis
CLAWS7: Constituent Likelihood Automatic Word-tagging System, 7th version COCA: The Corpus of Contemporary American English
ESL: English as a Second Language ESP: English for Special Purposes GiG: Growth in Grammar (corpus) H: High (association score)
ICLE: International Corpus of Learner English
ICLE-NO: The Norwegian part of the International Corpus for Learner English JJ NN: Adjective + noun
JN: Adjective + noun KWIC: Keyword in Context L: Low (association score) L1: First language
L2: Second language LC: Learner corpus
LCR: Learner Corpus Research
LOCNESS: Louvain Corpus of Native English Essays LONGDALE: Longitudinal Database of Learner English M: Medium (association score)
MI: Mutual Information
XVI
MI3: Cubed mutual information
NC: Non collocational (association score) NN: Noun + noun
NN NN: Noun + noun POS: Part-of-speech PU: Phraseological unit
RG JJ: Degree adverb + adjective RQ: Research question
RR JJ: General adverb + adjective SLA: Second Language Acquisition SVO: Subject-Verb-Object
TRAWL: Tracking Written Learner Language (corpus) WP: Writing prompt
1
1 Introduction
1.1 Motivation
Teaching English to pupils in lower secondary school in Norway, I regularly come across pupils who master the rules of spelling, sentence structure, and grammar – and who nevertheless produce texts that do not sound “English”. What these pupils seem to lack is phraseological competence, failing to find words that fit well together. This study is motivated by a desire to help Norwegian learners of English develop their writing skills, in terms of phraseological awareness. Millar (2011) found that native speakers take longer time processing unidiomatic language than idiomatic, concluding that learners should be taught to use formulaic language in order to better be understood. However, in the case of phraseological competence, there are few hard and fast rules, and the best advice on offer at present is spending time in a target language community, a solution not available for everyone. In a classroom setting, we wish to address the pupils’ weak points directly and administer remedies. To improve the phraseological proficiency of language learners, we need a way to describe their current level of proficiency and identify realistic teaching goals.
While the phraseological competence of advanced learners of English has been explored in detail (Granger, Gilquin, and Meunier, eds. 2015), intermediate learners have not been studied to the same extent, chiefly due to a lack of suitable empirical material.
1.2 Aims and Scope
In an attempt to gain more knowledge about phraseological competence among Norwegian learners of English, this study takes a statistical approach to the identification of
phraseological units in texts written by intermediate learners. More specifically, anchored in the field of Learner Corpus Research (LCR, cf. section 2.2), the investigation is inspired by previous studies exploring bigrams (two-word units) (Durrant and Schmitt 2009; Granger and Bestgen 2014; Bestgen and Granger 2018). These studies employ statistical association measures (MI score and t-score, see section 3.2.5), to determine the phraseological
competence of advanced learners of English. Granger and Bestgen (2014), and Bestgen and Granger (2018) found that advanced learners, as they become more proficient in English, use
2
more bigrams with a high MI score, such as immortal souls, and proportionally fewer bigrams with a high t-score, such as hard work. This reflects that as the learners become more proficient, they acquire a more precise and sophisticated repertoire of expressions, as MI score is likely to give high scores to rarer, more fixed phrases, while t-score picks out frequently recurring collocations. Similar differences were found by Durrant and Schmitt (2009) between advanced learners and native English writers. Applying some of the same methodology, the present study explores newly collected longitudinal material, only recently made available for research, aiming to measure the phraseological proficiency and
development of Norwegian learners of English at an intermediate level, and to compare this to some of the previous research.
This is chiefly a quantitative study, which means that most results involve statistical measures and percentages, and only a few select items that appear interesting are
considered more closely. Albeit a statistical study, the methods employed are purely basic statistical measures; in other words, no sophisticated statistics are used. A statistical
approach is taken also in the definition of phraseological expressions, more precisely that of identifying bigrams. Other n-grams are not considered, nor are discontinuous stretches of prefabricated language. The items for study have been retrieved by the use of Part-of- Speech tags (POS-tags), identifying bigrams containing nouns, adjectives, and adverbs. No verbs are considered in the quantitative analyses, and bigrams containing grammatical words have not been included for study. All the texts used in the study are written by
learners of English who have Norwegian as their first language, except the reference corpora of native British and American English. Although texts are included from learners ranging from 12-year-olds to university level students, the focus rests mainly on the lower levels, i.e.
the newly collected material from secondary school in Norway.
1.3 Research Questions
The research questions can be stated as follows:
1. What insights can be gained into the phraseological proficiency of Norwegian intermediate learners of English by measuring the MI and t-score of bigrams in written texts?
3 2. Does the same pattern that has been shown for advanced learners, i.e. an increase in
high MI scores and a decrease in high t-scores, hold for Norwegian intermediate learners of English as their level of proficiency increases?
3. What suggestions can be made regarding the teaching of English vocabulary and phraseology to intermediate learners based on the answers to research questions 1 and 2?
The present study is corpus driven, in the sense that it takes a bottom-up approach, and the main objective of RQ1 is to explore the data and see what patterns can be revealed. RQ2 aims to fill a gap of knowledge by tracing the phraseological development of intermediate learners along the lines of investigations that previously have provided insights about advanced learners of English. With RQ3 I hope to show that the results and knowledge gained in this study can be applied to teaching.
1.4 Background: English in Norway
This study concerns Norwegian learners of English at different levels of proficiency. In order to provide a context for the material in this study, the status of English in Norway will be given an introduction in this section.
Whether English should be considered a foreign or a second language in Norway is an ongoing discussion. According to traditional definitions it is a foreign language, as it is not the mother tongue of a substantial minority of the population, nor is it used in an official capacity in administration or has an official status in any way. Stig Johansson, in the presentation of the ICLE-NO corpus (section 3.1.5), asserts that English “[f]rom being a foreign language, […] is approaching the status as a second language,” (2009, 192). Gilquin (2015, 16) distinguishes between a second language which is spoken in the learner’s environment and a foreign language which is exclusively found within the classroom. The situation in Norway today is arguably somewhere in between these two situations.
The general proficiency in English in Norway is very high. Norway scores 68.38 on the EF English Proficiency Index for 2018 (https://www.ef.com/wwen/epi/ , accessed 19 October 2019), enough to secure fourth place behind Sweden, The Netherlands, and Singapore.
English is indispensable in commerce, trade, and business, and is the everyday language at
4
work in many technical, economic, and academic professions. English (especially American) cultural influence is massive. Most films and TV series are in English, whether watched in cinema, on TV, downloaded, or streamed. Of 33 films showing in cinemas in Oslo 16 October 2019, 15 were in English, 12 in Norwegian, 5 in other languages, and one animation film was available in both Norwegian and English (https://www.filmweb.no/program/?location=Oslo). Films are usually never dubbed, except when intended for small children. Most pop music is in English, both by native English-speaking artists, and by Norwegians singing in English. On the top 40 chart 01 August 2019 (https://acharts.co/norway_singles_top_20), 32 of 40 songs had an English title. Norwegians pride themselves on the fact that their English is superior to the level they encounter when travelling abroad, yet we might not be as accomplished as we believe.
English has a different status from other foreign languages in the education system in Norway. English is taught from an earlier age, and with a higher level of ambition for
proficiency, compared to other foreign languages such as Spanish, German, or French. There is both a written and an oral exam in English in lower secondary school, the only subject to have this besides Norwegian and Mathematics, and students get two separate marks for English, one for written and one for oral proficiency. A third, foreign language which is introduced in 8th grade is awarded only one single mark and is, at the moment of writing, only assessed by an oral examination in lower secondary school (The Norwegian Directorate for Education and Training 2006).
In teacher education, English as a subject is treated differently from other foreign languages.
A distinction is made between “second language didactics” (English) and “foreign language didactics” (Spanish, German, French). Teachers of English in secondary school are
encouraged to speak only English in the classroom. Whether this goal is realistic is another question entirely.
Norwegian pupils start school the year they turn 6, and English is introduced from the first year. In the beginning when they cannot yet read or write it is taught by oral instruction, starting with colours, numbers, and days of the week. By the time they reach secondary school, the pupils have written texts of varying length where they describe familiar situations and events, express opinions, narrate, and retell stories (The Norwegian
Directorate for Education and Training 2013, 7-8). Through secondary school, the pupils are expected to produce texts of increasing length, complexity, and level of reflection (ibid.,10).
5 It is from this level of education that texts have recently been collected for the compilation of a new corpus of learner English (see section 3.1.1), and these texts form the basis of the present study.
1.5 An Overview of the Study
Chapter 2 provides an account of the traditions of Learner Corpus Research and Second Language Acquisition, as well as the research fields of lexis and phraseology. It provides definitions of central concepts and gives an account of some previous studies. Chapter 3 describes the material and method used in the present study. Chapter 4 presents the data and some observations that can be made from them. Chapter 5 attempts an explanation of some central findings, revisits the research questions, discusses limitations and implications of the study, and rounds off with suggestions for further research.
6
7
2 Theory, Definitions, and Previous Studies
2.1 Introduction
This chapter aims to describe the theoretical framework within which this study is situated, to give definitions of central aspects and terminology, and to review a limited selection of previous studies considered relevant to the present one. The fields of Learner Corpus Research, Second Language Acquisition, and phraseology will be presented in succession.
Sometimes considered theory, and sometimes methodology, corpus linguistics will be considered in both this chapter and in the next (section 3.2.6).
This study is situated in the tradition of Learner Corpus Research (LCR), while also inspired by Second Language Acquisition (SLA, section 2.3). A fairly young discipline about 30 years of age, LCR has already produced a vast number of studies, only a few of which will be treated separately in the following sections. An excellent overview of the field exists in the form of the Cambridge Handbook of Learner Corpus Research (Granger et al. 2015). Covering central topics such as corpus design, methodology, analysis, and its relation to the adjacent fields of Second Language Acquisition and Natural Language Processing, the Handbook offers corpus compilers, researchers, teachers, and students alike a comprehensive examination of the field of LCR both in historical terms, citing numerous discussions and developments in the field throughout its (at the time) 25 years of existence, and as a reference tool. This chapter is largely based on the Cambridge Handbook. Where appropriate, other sources are referred to. A few studies that are of particular significance for the present study are presented in a separate section (section 2.6).
2.2 Corpus Linguistics and Learner Corpora: Definition, design, and desires Learner Corpus Research is situated within the broader field of Corpus Linguistics. The following section aims to give a brief account of Corpus Linguistics, to define a few central terms in the field, and to expand on the particular circumstances pertaining to Learner Corpus Research, a tradition within which the present study is conducted.
In its simplest form, a corpus is a collection of texts. As McEnery and Wilson remark, “In principle, any collection of more than one text can be called a corpus” (2001, 29), however, a
8
stricter definition is usually intended when referring to a linguistic corpus. McEnery, Xiao, and Tono (2006) list four requirements: A corpus should be machine-readable (1) and contain authentic text (2) which is sampled (3) to be representative (4) of the language to be studied. In addition, they discuss the notion of balance, arguing that, depending on the focus of the research, a corpus need not be balanced. Discussing their own definition, McEnery et al. (2006) also question the definitions and indispensability of sampling and
representativeness.
Authenticity involves several aspects. A common conception is that a corpus should include naturally occurring language, not produced with the corpus in mind (Gries 2009). However, what is meant by ‘naturally occurring’ can be problematic in the context of learner language, especially in a formal educational context (section 3.2.6 b). As for sampling and balance, McEnery et al. (2006) state that sampling is inevitable to achieve a balanced and
representative corpus. Sampling involves choosing which parts of available text to include in the corpus, with the aim of reproducing in the samples any characteristics that are true for the population the corpus is supposed to represent. Balance is defined as the range of text categories included in a corpus. A corpus should offer a “small scale model of the linguistic material” intended for study (Atkins et al. 1992, 6).
Additional considerations to the more general ones mentioned above apply to learner corpora. Gilquin (2015, 16-19) describes the considerations that should guide the
compilation of a learner corpus. (For a discussion of how the data for the present study meet these requirements, see section 3.2.6.) Learner corpora (LC) can contain written or spoken data, or both, and any genre learners produce could be included. Supply and accessibility have made advanced learner argumentative essays and novice academic writing the most common material in LC to date. LC can be classified by which target language they cover, 60% of which is currently English (Granger et al. 2015, 2), as well as by the learners’ native language (L1), and whether there are more L1s represented.
The age and proficiency of the learners should be considered. If more than one level of proficiency is desired, this can be achieved either cross-sectionally, sampling material from learners at different levels of proficiency at the same time, or longitudinally, by collecting material from the same learners at separate intervals. Longitudinal corpora should ideally be
“dense”, meaning they should have many points of data collection, in order to track the trajectory of the learner output (Meunier 2015, 380-381). Detailed metadata are also of
9 interest when forming and testing hypotheses of language development, which is the
primary concern of Second Language Acquisition (SLA, see section 2.3). Longitudinal corpora are scarce for various reasons. They are costly to compile: funding needs to be committed for a long period of time, and there is a hazard that informants can withdraw from the project. Pseudo-longitudinal corpora, where different stages of development are
documented at the same point in time, by collecting data from different groups of learners, have been a popular alternative. Internal variation can be measured in these groups, but not the trajectory of individual learners, which, by the method of multi-level modelling, can be analysed in truly longitudinal corpora (Gries 2015, 161). A comprehensive overview of the longitudinal corpora available to date is offered in chapter 17 of the Cambridge Handbook (Meunier 2015, 382-384). While harder to collect, true longitudinal corpora are arguably more valuable to the researcher than cross-sectional corpora.
One caveat of cross-sectional corpora is the measurement employed to assess proficiency.
Being in the same year in school, of the same age, or having had the same amount of teaching are not necessarily reliable measures of proficiency (Meunier 2015, 396). While time spent learning a language usually is a reliable indicator of proficiency level, many factors are at play when learners learn – some of which might be hard to measure when compiling large corpora. Motivation and attitude have been shown to have great impact on learning (Imsen 2005, 375-410), and can be difficult to measure reliably when collecting data from large numbers of learners (see section 3.2.2).
Studies exploring phraseology in learner language report diverging results. Some studies have found learners to produce nativelike prefabricated language, others have not. Myles (2004) in a longitudinal study of French L2 learners found a correlation between level of proficiency and the ability to produce “lexical chunks”. (See section 2.5.3 for definitions of phraseological items.) She also found that the learners produced chunks of greater
grammatical complexity than what their level of grammatical proficiency allowed, and that they used earlier acquired chunks as a basis for abstracting language rules. A study of learners of Japanese found a similar tendency of a formulaic-creative continuum, where abstraction and lexical bundles affected each other reciprocally towards higher proficiency (Sugaya and Shirai 2009). Ellis et al. (2015, 357) comment on the apparent contradiction that learners use less prefabricated language than natives, and at the same time seem to rely on lexical chunks from early on in the process of acquiring grammar. A question to ask in this
10
context is whether the acquisition of formulaic language is highly individual, or subject to other variables. There is a need for more controlled corpora in terms of level of proficiency, register, and genres, in order to explore these questions more rigorously.
Ebeling and Hasselgård (2015b) call for longitudinal corpora, and the need for corpora to be differentiated based on register and proficiency level, and they recommend the expansion of learner corpora to cover lower levels of proficiency. Myles (2015) and Gilquin (2015) express the desire for more corpora of spoken language, as the immediacy of spoken language is better suited for documenting the internal structures SLA aims to prove (cf. section 2.3). To be suitable when testing generalised hypotheses, corpora need to be varied enough to encompass many different types of output, some of which may be infrequent and hard to bring out in experimental situations. Learners from a wide variety of L1 backgrounds, especially non-Indoeuropean, are desirable for the same reason (Myles 2015, 329). A better division of proficiency levels is desirable, since school year does not necessarily coincide with proficiency level. Gilquin (2015, 28-32) calls for a more standardised notation of metadata, and more rigorous and preferably external control of proficiency levels, and she welcomes the possibilities offered by new technology in collecting different types of data with less intrusive methods.
While the present study leaves many of these needs unanswered, the material for this study does include longitudinal data, as well as learner language of a lower proficiency level than what has usually been considered in Learner Corpus Research. (The material is presented in section 3.1.)
2.3 Second Language Acquisition
Second Language Acquisition (SLA) is defined by Ortega as “the scholarly field of inquiry that investigates the human capacity to learn languages other than the first, during late
childhood, adolescence or adulthood,” (2009, 1). While the teaching of second and foreign languages are sometimes thought of as very different situations (see section 1.4), SLA by Ortega’s definition covers both second and foreign language acquisition (FLA). Research does not support a clear distinction between a second and a third (or subsequent) language learnt once a learner reaches adolescence (Cook 2008, 12 and 170-193).
11 2.3.1 Interlanguage
Ortega (2009, 110) recounts how the establishment of SLA as an independent research field was closely linked to the emergence of the concept of ‘interlanguage’. ‘Interlanguage’ refers to “the language system that each learner constructs at any given point in development”
(ibid., 110). A mental entity, interlanguage by this definition is a representation of a learner’s competence, as opposed to performance, which is the language a learner actually produces.
Sylviane Granger, on the other hand, in her revision of the Contrastive Interlanguage Analysis (2015a, section 3.2.1), seems to use the terms ‘interlanguage’ and ‘learner language’ interchangeably, referring to the learner output, i.e. the performance of a language learner. The term ‘interlanguage’ will in the following refer to the mental entity, while ‘learner language’ or L2 will be used to refer to the learner output, except when the term ‘interlanguage’ appears in quotations.
Central to the development of the concept of interlanguage was the observation that different language learners seemed to go through the same stages when acquiring the L2, independently of their native language (L1) background, situation of learning (classroom vs.
immersion), and even independently of whether the learner was a child or an adult (Cook 2008, 27). Cook (ibid., 29-30) describes for example six stages of acquisition for word order in English. In stage one, a learner is only able to produce one-word utterances, or complete formulaic expressions. In stage 2, learners acquire the standard word order of the target language. Questions or negations, which occasion a shift in word order in English, are accommodated into this pattern, so that the standard SVO order of English is used here as well, resulting in learner output of the type No me live here (ibid., 29).
Interlanguage is characterised by the four processes ‘simplification’, ‘overgeneralisation’,
‘restructuring’, and ‘U-shapes’. Simplification, “a process that is called upon when messages must be conveyed with little language” (Ortega 2009, 116), leads to language production with a less complex grammatical structure than the target language possesses, such as all nouns in Norwegian being given the indefinite masculine article en. Overgeneralisation involves the application of rules in contexts where they do not apply, such as the overuse by learners of the -ing-form in English. Restructuring of rules governing the learner output takes place continuously as learning happens, leading to potentially big shifts in learner output. It does not, however, imply a one-way development towards increased accuracy. The ‘U’ in ‘U-
12
shape’ is a visualisation of the process in which a learner initially uses target like expressions correctly, but without processing them, then abandons them in the stage where emerging processing takes place, and finally re-establishes the target like production. Inverted U- shapes can also be found.
Ellis and Barkhuizen (2005) identify a list of premises central to interlanguage theory.
Interlanguage is a system in its own right, in the same way that the L1 grammar is a system.
The interlanguage is a constant process of restructuring and subject to change over time as learning takes place. The interlanguage constitutes mostly implicit knowledge and is the product of general learning strategies. If the interlanguage stops developing, it may fossilise, which means that some language elements remain in a learner-stage and are not replaced by target like structures.
While not a defining trait, the term crosslinguistic influence is often mentioned in connection to interlanguage. It describes a set of characteristics of learner language induced by the L1 or other languages the learner already knows. This includes, but is not limited to, transfer, which involves translating directly or otherwise using in the target language traits which are correct in the source language. Transfer can lead to learner language idiosyncrasies, or to correct language output (Cook 2008, 31-39; Paquot 2013).
2.3.2 Second Language Acquisition and Learner Corpus Research
The relationship between Second Language Acquisition (SLA) and Learner Corpus Research (LCR) has been less intimate than one would expect. Historically, SLA has focussed on language learning (the process), and LCR on learner language (the product). SLA has studied the development of language often by close observation of individual learners. The data sets have been small, often based on one or a few individuals, data collection methods at times artificial or intrusive, the main independent variable being time (Meunier 2015, 394). Central to SLA has been the idea of “developmental errors” – stages every language learner goes through, independently of L1 (see previous section on interlanguage).
LCR on the other hand has used data from a great number of learners as the main resource, limited focus on individual development, often with different L1s as the main independent variable. Where LCR mostly has been content with describing similarities, differences, and
13 patterns, SLA researchers have shown a keener interest in explaining the phenomena they observe. There are obvious synergy effects obtainable if these two traditions can be
reconciled. LCR can provide the large sets of naturally occurring language data necessary to make broad generalisations, and the sophisticated statistical tools to make sense of it. SLA can provide a theoretical framework, ask better questions, and offer more systematic interpretations and explanations (Myles 2015, 309).
A couple of interesting observations to come out of the combined efforts of SLA and LCR are that syntax generally seems to develop before morphology (Bonilla 2015), and that children learning a second language in school are aided by literacy and a better working memory in learning vocabulary and grammar. These traits are associated with a higher age, leading the researchers to conclude that the youngest school children (5-6 years old) are at a
disadvantage when learning language in a classroom setting. The same study showed that
“[f]requency in the input is the single most important factor for vocabulary learning,” (Myles and Mitchell 2012, 2). The present study, while mainly a corpus-driven LCR investigation, does look to SLA as well, as will become apparent (section 5.2).
2.4 Lexis
Although the present study focusses on phraseology, phrases are made up of words, and many results to come out of the research on words and vocabulary also hold true for phraseology. The two related research fields will therefore be considered in order. This section defines ‘lexis’, ‘lexical frequency profiling’, ‘lexical diversity’, ‘lexical “teddy bears” ‘,
‘register confusion’, and ‘keywords’, and while doing so reports on a few interesting results to come out of the research in this field. ‘Lexis’ is defined by McEnery and Hardie as “words and other meaningful units … stored in a kind of mental dictionary, the lexicon,” (2012, 246, original emphasis).
Tom Cobb and Marlise Horst (2015) discuss what it means to know a word, observing that there is a cline from understanding the meaning of a word you hear or see in context, to being able to use it correctly in speech or writing. Learner corpora display the latter, i.e.
what the learners are able to produce. By comparing to reference corpora, a measure of what is missing, what they do not produce, and what they over-, under-, and misuse, is obtainable.
14
‘Lexis frequency profiling’ is a way of measuring the lexical richness, that is the lexical complexity and sophistication, of texts. The method is presented in a 1995 article by Laufer and Nation. They proved the frequency of ‘advanced words’ in learner texts to be a reliable measure of level of proficiency. The higher number of advanced words in the text, the higher is the proficiency of the learner. The frequency of a word in a target language reference corpus was used to determine which words were considered ‘basic’, i.e. most frequent, and which were considered ‘advanced’, i.e. less frequent. The measure proved reliable, provided the learner texts were at least 200 words long and of similar genres.
The opposite, corresponding observation, that learners of low proficiency produce texts with little lexical richness, has been used to predict and explain the success, or failure, of pupils and students in education. A study by Morris and Cobb (2004) was able to predict drop-outs (failed exams, failure to complete the training programme, or failure to stay in the
profession) from English as a Second Language (ESL) teacher training based on which of the teacher trainees failed to produce essays containing words outside the list of the 1000 most frequent words in English. Possessing a wide vocabulary coincided with success as an ESL teacher. Similarly, studies in Canada and the Netherlands (Roessingh and Elgie 2009;
Verhoeven and Vermeer 2006) have shown that immigrant children employ a much narrower vocabulary than native children when retelling stories. With a finer grained list of word frequencies adapted for children, native Canadian children were shown to produce words from a set of 2500 different words, while immigrant children rarely produced words outside of the 250 most frequent words, and then only from lists up to the 750 most frequent words. These findings went a long way towards explaining why many immigrant children in Canada struggled with reading comprehension, and consequently with achieving higher education (Roessingh and Elgie 2009, 40).
‘Lexical diversity’ or variation is another way of measuring lexical richness. It is in its simplest form a type-token measure, where all the different word (types) in a text are counted and divided by the total number of words (tokens). This measure is highly sensitive to text length, leading to various attempts at working around this obstacle, the more
comprehensible of which include dividing the number of word types by the square root of the number of tokens, or calculating the ratio of grammatical to content words, or excluding the grammatical words, only calculating the type-token ratio of the content words. Ellis and Barkhuizen (2005) suggest the alternative Mean Segmental Type-Token Ratio, which involves
15 dividing the text into equal segments, calculating the type-token ratio for each segment and finding the mean of these ratios for a score for the whole text. An interesting and somewhat counterintuitive study by Ong and Zhang (2010) showed that Chinese learners of English produced more lexically rich texts when they had less time and less access to assistance.
Although outside the scope of this thesis, it would be interesting to see whether this study could be reproduced with learners from other L1 backgrounds.
In an often-cited article from 1994, Angela Hasselgren shows how language learners tend to rely on a limited number of trusted words in the target language, repeating them for a range of situations where a native speaker would vary, and use different words. Hasselgren coins these much-used words ‘lexical teddy bears’. These ‘teddy bears’ are employed in a range of sentence environments where a native speaker would use more precise vocabulary, suited to the genre and topic of writing. Another often-made observation is that learners tend to rely on vocabulary associated with spoken language even when writing, a phenomenon referred to as ‘register confusion’ (Lorenz 1999, Paquot 2010).
The genre and topic of writing affect which vocabulary items the learners need. When deciding what vocabulary to teach learners, one method is the ‘keyword’ tool in corpus software such as AntConc and Wordsmith. This tool compares the vocabulary of two (sets of) texts, and identifies which words are over- or under-represented in one or another of them. If a target language text includes specific words that are under-represented in learner texts, these words may be the focus of instruction. Lee and Chen (2009) take this a step further, identifying ‘key-keywords’ by measuring the dispersion of keywords across texts, establishing which keywords have a widely distributed under- or overuse by the learners.
The keywords tool is highly sensitive to subject matter and assuring the comparability in terms of genres and topics in the reference corpus is critical for the keywords tool to be useful for developing teaching material.
While the keywords tool is sensitive to topic and genre, there are other ways to compare the vocabulary of learners and natives, somewhat independently of topic, and to some extent, genre. One way of doing so is by way of association scores (see sections 2.6 and 3.2.5).
16
2.5 Phraseology
2.5.1 Words & co.
While research on lexis typically involves the study of single words, ‘phraseology’ concerns the co-occurrence of words. A central idea in phraseology is that words do not behave as isolated items. Words have friends. They prefer the company of some to the company of others that on paper appear to have equal characteristics. Knowing a word implies knowing its friends, or as J. R. Firth stated it: “You shall know a word by the company it keeps,” (1957, 11). Conversely, if you do not know the preferred company of a word, you do not really know that word. Several studies (Ellis 2002; 2008; 2012; Jiang and Nekrasova 2007; Conklin and Schmitt 2008; Ellis et al. 2008; Ellis and Simpson-Vlach 2009) have shown that both learners and natives process formulaic language quicker than non-formulaic language, supporting the recommendation that learners be taught phraseological expressions.
One of the major challenges facing learners of a language is choosing the right company for words they use. Failing to do so might result in language that sounds ‘foreign’ or ‘clumsy’.
Pawley and Syder refer to this problem as “[t]he puzzle of nativelike selection” (1983, 192).
For language learners, the puzzle is how to achieve nativelike selection. For linguists, it is how to understand it, and for teachers, how to teach it. Researchers have been faced with the additional puzzle of how to talk about it, one of many solutions being for each
researcher to develop their own vocabulary, a practice that has led to prolific terminology, and may have resulted in different researchers researching different phenomena (N. Ellis et al. 2015, 372). Ebeling and Hasselgård (2015b, 225-226) emphasise the indisputable need for standardisation of terminology in the field of phraseology.
2.5.2 Two traditions
There are arguably two main traditions in phraseology. One is referred to as the Eastern European tradition, the Russian lexicological tradition, or the phraseological tradition, and the items of interest for phraseological research are defined by degree of idiomaticity.
Another more recent tradition, alternately referred to as the ‘statistical’ approach, the
‘distributional’ approach, the ‘probabilistic’ approach, or the ‘frequency-based’ approach, employs statistical measures to define these items of phraseological interest. Howarth
17 (1998) builds on Russian lexicological tradition, describing a continuum of relations between words, from free combinations, via restricted collocations and figurative idioms, to pure idioms. Nicholas Groom (2017) criticises this ‘taxonomic approach’, associated with the generativist view of language. Some problems listed by Groom are the arbitrariness of the phraseological categories in the traditional approach, and the dependency on native speaker acceptability judgements, which he deems unreliable and which cannot account for
language change. An example from Groom is the idiomatic expression to kick the bucket (a euphemism for to die), which, according to the traditional approach, should not be possible to divide or restructure, nor to use in a passive construction. Encountering example 1 online, Groom observes that it undermines the traditional definition of phraseological units as
“fixed expressions”:
Example 1: I didn’t kick the bucket… The bucket was well and truly kicked by the time it got to me!!
(Groom 2017, 34)
The main alternative to this traditional approach is the statistical or in Groom’s terms
‘probabilistic’ approach. In the probabilistic approach, phraseological units are defined by statistical methods. In a collection of texts such as a corpus, computer software can count all the words and determine how likely they are to occur in a certain order. Several types of units can be identified, depending on the research interest and the method of calculation, and this is the origins of much of the terminological disparity in the research field. There are problems also with the ‘probabilistic approach’. N. Ellis et al. (2015) observe that a “blind”
measure of frequency will include sequences such as and of the, which intuition dismisses as a possible phraseological unit, the point being that these statistical measures are not enough to understand what is going on. A requirement of semantic unity for phraseological units is a possible response. Groom (2017) solves the problem by redefining the field of phraseology as a sociolinguistic discipline, concluding that what constitutes a fortuitous phraseological expression depends entirely on the genre and register of the text.
While Groom’s criticism is certainly valid, it does not imply that the study of phraseology is fruitless. Granger and Paquot (2008) attempt to reconcile the two approaches. They give an account of the terminology of both traditions, and comment that the researchers from the two traditions seem not to appreciate each other’s achievements. By way of reconciliation, Granger and Paquot (2008) suggest keeping some terminology from both traditions, but for distinct and different usage. The terminology from the traditional, ‘phraseological’ approach
18
is suggested reserved for descriptions of linguistic function, sorted into three categories of
‘phrasemes’ (referential, textual, and communicative). Terminology from the more recent,
‘distributional’ approach is suggested connected to the method of retrieval, divided into n- gram or ‘cluster’ analysis for continuous sequences of words, and ‘co-occurrence’ for discontinuous ones (Granger and Paquot 2008, 39-40; and see section 2.5.3). They suggest avoiding the term ‘collocation’ when referring to the method of retrieval, for which they argue the term ‘co-occurrence’ is sufficient, retaining the term ‘collocation’ for the functional, traditional approach. As the terms from the two traditions describe the phraseological items from different perspectives, i.e. that of linguistic function for the traditional terms, and that of retrieval method for the statistical terms, Granger and
Paquot’s (2008) point is that the two traditions can co-exist. While the retrieval can be done statistically, the resulting phraseological units can still be analysed in terms of linguistic function, using terminology from the traditional approach.
As the statistical approach offers an objective measure of what can be considered a phraseological unit, it is well suited for non-native speakers researching phraseology, and will form the basis of the present study.
2.5.3 Definitions of phraseological items
As noted, the field of phraseology is rich in terminology. Some statistically identified phraseological items are explained in this section.
‘n-grams’ is a collective term for sets of words not separated by other words. A 2-gram is a set of two words, that is, a ‘bigram’. A set of three words is a ‘trigram’. The n can be any numeral, so there can be 4-grams, 5-grams and so on. Corpus software such as AntConc and WordSmith can generate lists of n-grams in a text or collection of texts. Already a source of confusion, n-grams are sometimes referred to as ‘clusters’ or ‘lexical bundles’ (McEnery and Hardie 2012, 240-247). These terms are defined differently e.g. in AntConc, which requires one of the words in a ‘cluster’ to be defined (Anthony 2019).
A ‘collocation’ is a word that appears together with another word. They ‘collocate’, or, as previously observed, they are friends. Collocations need not be continuous, the way n-grams are. Corpus software can identify collocations within a defined ‘span’, e.g. 3 places to the left
19 or right. If one word is the point of departure, and collocations are identified for that word, it is considered a ‘node’, and the identified collocations its ‘collocates’. A probability of words being collocations can be calculated by statistical methods, the result referred to as an
‘association measure’ or an ‘association score’ (Gries 2015). Words with a high probability of occurring together will have a high score, and hence constitute items of interest from a phraseological point of view. There are different ways of calculating this probability, such as log-likelihood, t-score, z-score, mutual information (MI), and gravitational G (cf. section 3.2.5). Bestgen and Granger (2018) have proposed the term ‘collgram’ for a bigram with an assigned association score.1 Granger and Paquot (2008) propose reserving the term
‘collocation’ to descriptions of linguistic function, and suggest ‘co-occurrence’ be used instead (cf. section 2.5.2).
‘Colligation’ is a term for the co-occurrence of a word with a specific grammatical category, for example a verb usually occurring in the passive, or a verb taking an infinitive compliment rather than an -ing-form (Aarts 2014).
As collocations need not be continuous, they can form ‘collocational frameworks’ or
‘frames’, discontinuous sets of words with one or more open slots for words to fit into.
These sets often consist of grammatical words, such as too + ADJ/ADV + to, and be + ADJ + to, as in be able to and too late to (Renouf and Sinclair 1991). When frameworks occur with specific grammatical constructions, they are sometimes referred to as ‘colligational
frameworks’ (Hasselgård 2016).
When discussing phraseological phenomena in general, without committing to one particular definition, or when referring to several types at once, researchers refer to them alternately as e.g. ‘lexical chunks’, ‘ready-made chunks’, ‘phraseological items’,
‘phraseological units’, ‘multi-word units’, ‘prefabricated units’, ‘PUs’, or ‘prefabs’. As the present study does not aspire to sort out the complicated and sometimes intertwined relationship between the different expressions, the respective authors’ original choice of terminology is retained as much as possible in discussions and references. Consequently, both lexical bundles, ready-made chunks, and prefabricated units are reported on, without attempting to specify the exact reference of these expressions.
1 The present study investigates the same type of items as those labled ‘collgrams’ by Bestgen and Granger (2018). However, the term ‘bigrams’ is used in this study, following the terminology in e.g. Durrant and Schmitt (2009) and Granger and Bestgen (2014).
20
2.5.4 Previous studies on phraseology in learner language
Pawley and Syder (1983) observed how phraseology was an area where learner language differed markedly from native language. Simultaneously, prefabricated stretches of language are central to language learning. Ortega (2009, 114) describes the process by which
formulaic language may be learnt wholesale early on, and then later undergo the processes associated with the development of interlanguage.
Studies revealing under- and overuse by learners include Hasselgård (forthcoming), who found learners to overuse a limited set of lexical bundles, while the dispersion of bundles was broader in native texts. She called the overused bundles “phraseological teddy bears”, echoing the terminology of Hasselgren (1994). Parallel to the observation that learners overuse features of spoken language in writing (section 2.4), Chen and Baker (2016) found that lower-level Chinese learners of English used more lexical bundles associated with conversation, while higher-level learners used bundles typical of academic prose.
Combining manual screening with searching for specific words, morphemes, or POS-tags (see section 3.2.7) has shown learners’ underuse, overuse, and misuse of verb-noun combinations, delexical constructions, and phrasal verbs. Vyatkina and Cunningham (2015) mention that learners overuse intensifiers. Studies by Hasselgren (1994), Granger (1998), and Lorenz (1998) have shown native speakers to employ a wider range of adverb
intensifiers than do learners, the latter relying on general purpose intensifiers, such as very, completely, and totally (cf. results in this study, section 4.5).
While some studies have shown learners to produce fewer and less varied bundles than do native speakers, a study by Ebeling and Hasselgård (2015a) found discourse field to be more significant than L1 background when analysing novice academic writing. One conclusion drawn from this study was that the proficiency level of Norwegian learners of English is advanced (cf. section 1.4).
Studies focussing on the development of phraseological proficiency include Thewissen (2008; 2013), Crossley and Salsbury (2011), and de Haan and van der Haagen (2013), who show a well-established link between levels of proficiency and the production of nativelike phraseology. Learners produce fewer phraseological errors, develop broader repertoires of phraseological expressions, and frequency and distribution become more nativelike as they become more proficient in the target language. Idioms are rare in everyday speech or
21 writing and are consequently acquired slowly. As with everything else in language, the learners can recognise more formulaic expressions than they can produce.
Bartning and Forsberg (2006) study prefabricated language produced by Swedish learners of French. They found that while learner language still differed greatly from native language even at advanced stages, there was a clear development of prefabricated language as the learners became more proficient, especially from intermediate to advanced levels. Different functional categories of prefabricated language behaved differently and were developed at different stages. They also concluded that the development of prefabricated language is subject to individual differences and sensitive to input.
A word of warning from N. Ellis et al. (2015, 362): Ready-made does not necessarily mean nativelike. The learner language may contain “chunks” that are the result of L1 transfer, and
“correct” chunks can be subject to register confusion, resulting in non-nativelike
phraseology. Different types of transfer from L1 to L2 affect phraseology, and can lead to misuse, overuse, but also to nativelike performance. Paquot (2013) found that bundles, grammar, colligational patterns, and cultural preferences can be transferred. In as study of prepositions used by native speakers of English and German learners of English, Reshöft and Gralla (2013) found conceptual transfer, that is, transfer of the way events are perceived and described, e.g. whether to use a verb of perception or a verb of existence.
Laufer and Waldman (2011) found evidence of transfer in verb + noun collocations used by Israeli learners, pointing out that it was often the verb that was chosen that differed between the natives and the learners. The study investigated learner language from three groups of Israeli learners of English at different levels and compared them to LOCNESS.
Interestingly, this study showed a lower rate of collocational errors in a group of learners with a ‘basic’ proficiency, than in the intermediate and advanced groups who, on the whole, produced many more collocations, more or less successfully, leading Laufer and Waldman to conclude that “learners who attempt to produce more collocations are likely to err more often,” (2011, 665). (For a discussion of similar findings in this study, see section 5.2.)
22
2.5.5 How to teach phraseology
Some studies of phraseology in learner language have produced useful results with regard to teaching. Combining co-occurrence with a lexical bundle approach, Groom (2009) showed that spending time in a target language country (‘immersion’) had a positive effect on learners’ ability to produce nativelike collocations. (This study is also discussed in section 3.2.5.) Serrano et al. (2012) measured the effect of time and attitude on the development of oral and written proficiency, finding that a few months abroad had more effect on oral than on written production.
McEnery et al. (2006) explore the word sweet in the British National Corpus (BNC), doing a case study on dictionary entries. They compare the collocations for sweet picked out by four different calculation procedures (z-score, MI score, MI3 score, and log-log, see section 3.2.5), arguing that the “cubed” MI score, that is MI3, is superior to the two others in terms of identifying “collocates […] useful for second language learners at beginning and intermediate levels,” (McEnery et a. 2006, 217). They base this on the observation that MI3 score favours more frequent words, while MI and z-score tend to pick out rare words. While it is not difficult to agree that peas, smell, and tooth (picked out by MI3 score) are arguably more useful collocations for sweet than Afton, nothings, and marjoram (picked out by MI score), McEnery et al. (ibid., 217) seem to take it for granted that a higher frequency is equal to more usefulness for pedagogical purposes, without providing additional support for this argument.
This is not to say that such evidence is not available, though it may not have been at the time of writing for McEnery et al. (2006). Natives and learners’ processing speed has been shown to be quicker for prefabricated stretches of language than for free combinations (cf. section 2.5.1). Ellis et al. (2008) have shown that what determines the processing speed differs between natives and learners. While the natives’ processing speed depends on the MI score (see section 3.2.5), the learners’ processing speed depends on the frequency of the
expressions in the target language, i.e. that the learners require a lot of exposure to the target language and a high number of repetitions in order to acquire a nativelike L2 output.
Myles and Mitchell (2012) found frequency of the input to be the decisive factor for learners’
acquiring vocabulary (cf. section 2.3.2).
23 Nesselhauf (2005) researched verb-noun collocations used by German learners of English, leading to the recommendation that collocations be taught explicitly to learners, focussing on the verb rather than on the noun which had previously been the norm (cf. Laufer and Waldman 2011; section 2.5.4). Ebeling and Hasselgård (2015b) support the recommendation for teaching lexical bundles, while cautioning that the classroom environment is not ideally suited to create the sense of immersion which has been shown to be the successful way of teaching nativelike phraseology to learners. When deciding on which bundles to teach, it is necessary to consider teachability, learnability, frequency in target language, and the learners’ needs, and goals.
2.6 Three Previous Studies
The inspiration for the present study came from three related research papers, all conducted within the tradition of statistical phraseology. They have inspired the methodology of the present study, as well as the research questions that are asked. The exploration of MI score and t-score is shared by all these studies.
2.6.1 Durrant and Schmitt 2009
Reacting to a claim made by Kjellmer (1990) that adult L2 learners rely on singular words when acquiring a second language (coined the ‘open-choice principle’ by Sinclair 1991), Durrant and Schmitt (2009) set out to investigate the presence, or absence, of formulaic language in advanced L2 learner output. They extracted bigrams of premodifier+noun manually from advanced learner corpora and from English L1 writing and assigned frequency information and association scores (MI and t-score) to these bigrams based on their
frequency in the BNC. A minimum frequency of 5 in the BNC was set for assigning association scores. The scores were sorted into 6-7 categories of collocational strength.
Durrant and Schmitt examined each learner text individually and compared the results using inferential statistics. They found a higher proportion of rare combinations (below threshold, BT, i.e. less than 5 tokens in the BNC) in the native texts than in the learner texts. With regard to high t-scoring bigrams, they found that the learners produce about as many as the native speakers, but with a more limited repertoire, and a higher degree of repetition of a
24
few trusted items (c.f. lexical and phraseological ‘teddy bears’, Hasselgren 1994, Hasselgård forthcoming, section 2.5.4). When comparing MI scores, Durrant and Schmitt found a consistent underuse of bigrams with an MI score ≥7, compared to native writers. They conclude that while Kjellmer’s observation that “there is something missing” (Durrant and Schmitt 2009, 174) from learner writing was not wrong, this does not necessarily support the idea that adult L2 learners construct their writing from singular words alone. What is missing are not multi-word units altogether, but rather low frequency collocations, characterised by high MI scores, while high frequency collocations, identified by a high t-score, are used by learners just as much as by natives.
2.6.2 Granger and Bestgen 2014
In their 2014 study, Granger and Bestgen aimed to extend Durrant and Schmitt’s (2009) methodology to learner corpora with a control for proficiency level. They wanted to see if the patterns observed in Durrant and Schmitt’s study could be found also in the learner output of intermediate vs. advanced learners. Granger and Bestgen had texts from the German, French, and Spanish parts of the International Corpous for Learner English (ICLE, see section 3.1.5 a) assessed and marked for proficiency level with reference to the The Common European Framework of Reference for Languages (CEFR, Council of Europe 2001). The texts were then grouped into two sub-corpora, one intermediate (B) and one advanced (C). The bigrams were extracted automatically by the use of POS-tags (see section 3.2.7). Granger and Bestgen extracted word class specific bigrams of noun+noun (NN), adjective+noun(JN), and adverb+adjective(AJ). In addition, they extracted a category called “All”, with all bigrams in the text. Following Durrant and Schmitt, the bigrams were extracted from each learner text separately, and assigned MI scores and t-scores based on the BNC. The scores were grouped into 4 categories and a group of “below threshold” (see section 3.2.10).
Granger and Bestgen report “a smaller proportion of lower-frequency, but strongly- associated, collocations [attested by MI] and a larger proportion of high-frequency collocations [attested by t-score] in the intermediate learner texts than in the advanced learner texts” (2014, 238). They found these results to hold true for both types and tokens of bigrams in most of the examined categories of bigrams, though some categories showed the pattern more clearly (All) than others (NN, AJ). Based on their findings, Granger and Bestgen