• No results found

2.8 Definitions of collocation

2.8.1 Differences between several types of MWEs

Within a broad perspective of phraseology, there are three types of units:

free combinations, collocations and idioms.

Multiword expression (MWE) is the hypernym which encompasses units such as multiword lexical unit, collocation, idiom, compound noun, lexical bundle, verb-particle construction, verbal expression and proverb (Seretan, 2011, 2013). In this thesis, collocations are understood as a subclass of MWE, in harmony with Baldwin and Kim (2010) and Seretan (2011, 2013).

Evert (2009, 1213-1214) explains a key difference between collocation and MWE:

the former has a Neo-Firthian sense that alludes to lexical units of a semi-compositional and lexically determined nature whereas the latter has become the preferred form in the fields of computational linguistics and natural language processing.

MWEs are defined by Baldwin and Kim (2010, 269) based on Sag et al.

(2002) as “lexical items that: (a) can be decomposed into multiple lexemes;

and (b) display lexical, syntactic, semantic, pragmatic and/or statistical id-iomaticity”.

Sag et al. (2002, 197) themselves reserve the term collocation “to refer to any statistically significant co-occurrence, including all forms of MWE as described above and compositional phrases which are predictably frequent”.

Their definition is not entirely adequate for this work because I take into account the linguistic features of specialized collocations, not only their sta-tistical significance.

All of these subclasses of MWEs exhibit different features and perform different functions. Figure 2.1 illustrates the subclasses of MWEs, the place that specialized collocations occupy in relation to other MWEs, their loca-tion regarding terminology and phraseology and how specialized collocaloca-tions stand in the midst of both disciplines, indicated by the smaller inner hexagon in Figure 2.1.

Over the years, several names have been used to refer to this variety of multiword types. Within the field of NLP, researchers employ the term n-grams to refer to strings of two or more consecutive words calculated by means of statistical AMs.

Biber et al. (1999, 58) offer some clues to distinguish multi-word lexical units from collocations and from lexical bundles. According to these authors,

Figure 2.1: A diagram representing the subclasses of MWEs and how special-ized collocations are related to terminology and phraseology

a multiword lexical unit is a lexicalized “sequence of word forms which func-tions as a single grammatical unit”, e.g. look into which is used much the same way as investigate. Biber et al. (1999) group phrasal verbs (e.g. point out); prepositional verbs (e.g. appear on); complex prepositions (e.g. except for, aside from); correlative coordinators (e.g. both . . . and, either . . . or, neither . . . nor) and complex subordinators (e.g. as far as; given that) as different types of multiword lexical units.

2.8.1.1 Lexical bundles

Lexical bundles are sequences of three or more words that tend to co-occur statistically in a register, irrespective of their idiomaticity and whether or not the sequence constitutes a grammatical unit (Biber et al., 1999; Cortes, 2004). In contrast, collocations consist of two or more lexical words with a tendency to co-occur. A lexical bundle is therefore a type of adjacent MWE considered as an extended collocation.

Cortes (2004) mentions two patterns that typically form lexical bundles in English, among others: Preposition + Determiner + Noun + Preposition and Determiner + Noun + Verb + Determiner. Thus, lexical bundles can provide valuable information about the lexis of a particular genre and its formulaic language but differ from collocations and idioms in several respects: lexical

bundles perform a grammatical and cohesive function, are adjacent MWEs and are syntactically fixed (Benson, 1985; Casares, 1992).

2.8.1.2 Differences between collocations and idioms

The criteria set to distinguish collocations from other types of MWEs are not clear-cut but are instead sometimes vague, confusing or contradictory among several researchers. Evert (2004) even holds that “the distinction between collocations and non-collocations is ultimately based on the intuition of a lexicographer, for instance, in contrast to the formal and unambiguous definitions that linguistic research aims for”, which makes the scenario even more complicated.

Some authors (Thomas, 1993; Manning and Sch¨utze, 1999) blur the line that separates idioms from collocations by using the two terms interchange-ably. However, idioms differ from collocations and are either ‘pure’ phraseo-logical units or relatively frozen expressions which exhibit distinct linguistic features. The most salient features that differentiate idioms from collocations are their degree of morphosyntactic fixedness, idiomaticity (also known as se-mantic opaqueness or fossilization) and non-compositionality. In contrast to idioms, collocations can be semantically transparent and semi-compositional.

Manning and Sch¨utze (1999) list non-compositionality, non-substitutability and non-modifiability as criteria for the linguistic treatment of collocations.

However, accepting this view would contradict phraseologists, who assign the same features to idioms.

According to Saeed (2003), collocations can undergo a fossilization pro-cess until these lexical units become fixed expressions. Bahns (1993, 57) contrasts collocations with idioms and with free combinations. In his view, the “main characteristics of collocations are that their meanings reflect the meaning of their constituent parts (in contrast to idioms) and that they are used frequently, spring to mind readily, and are psychologically salient (in contrast to free combinations)”. Figure 2.2 illustrates the degree of fixedness of free combinations or units, collocations and idioms, with total flexibility on the left and less possibility of flexibility on the right.

Collocations are not as syntactically fixed or semantically opaque as

id-Figure 2.2: A diagram representing free combinations or units, collocations and idioms

ioms but are non-predictable (Biber et al., 1999) and are found in a “tran-sitional area approaching idiom” (Cruse, 1986, 41). Collocations, being more exible, admit some transformations or operations while idioms, due to their xedness and rigidity, only admit these morphosyntactic processes in exceptional cases. These are some examples taken from the FTA corpus (Pati˜no, 2013) to illustratehow the collocational relation iskeptdespitemor-phosyntactic changes: aplicaci´on de medidas no arancelarias, ‘application of non-tari measures’, adoptar medidas arancelarias, ‘adopt tari measures’, aplicar medidas de salvaguardia, ‘apply safeguard measures’, adoptar medi-das provisionales oportunas, ‘take / adoptprompt interimmeasures’, adoptar medidas tributarias, ‘adopt taxationmeasures’. These examples are di erent morphosyntactic realizations of a collocation found through aGoogle search:

adopci´on de medidas tributarias, ‘adoption of taxation measures’, medidas tributarias adoptadas, ‘adopted taxation measures’. In these cases, the

collo-cational relation is still kept among the intervening constituents, even though some morphologically-related constituents occupy different grammatical cat-egories, for example the deverbal noun adopci´on and the verb adoptar. To sum up, in addition to their semi-compositionality and frequency, colloca-tions are found in a continuum, amidst free combinacolloca-tions and idioms.

2.8.1.3 Differences between collocations and free combinations Koike (2001) presents several features as the most salient ones to distinguish collocations from free combinations. According to Koike, collocations exhibit the following features:

1. Frequent co-occurrence of lexical units.

2. Combinatory restrictions imposed by traditional use (sharp distinction and trenchant analysis form collocations whereas trenchant knife is an anti-collocation.)

3. Formal compositionality which allows for a certain formal flexibility.

For exampleadoption of taxation measuresandtaxation measures adopted hold the same collocational relation.

4. Semantic precision of the combination. For examplesafeguard measure where the adjective adds semantic precision to the type of measure being adopted.

2.9 A look at collocations from different