• No results found

3 Methods

3.5 Validity and reliability

Several considerations were made to ensure that the present study produced results that were both valid and reliable (Dörnyei, 2007).

Validity concerns whether a study’s research methods accurately measure the phenomenon in question and whether the ensuing results can be generalised beyond the confines of the given research context.

Reliability refers to whether the research methods produce consistent results that can be replicated in future studies. By combining text analysis and interview methods, this partially mixed methods design aims to offer a more complete understanding of metadiscourse in upper secondary essay writing in the educational contexts (Bryman, 2006).

Validity is often divided into two main types: internal and external.

Internal validity is considered to be realised when “the outcome is a function of the variables that are measured, controlled or manipulated in the study” (Dörnyei, 2007, p. 52). In order to obtain internal validity, the taxonomy of metadiscourse was adapted to the content of the essays. In order to test the taxonomy, two second raters and I analysed a sub-set of 10 essays and interrater reliability was calculated (Burke Johnsen &

Christensen, 2017). Each genre and each country was represented in the sub-sample, but essays were otherwise randomly selected. Prior to analysing the sub-set of essays, a preliminary sub-set of three essays was analysed to pilot the methods. This was done to ensure that both parties agreed upon the criteria for identifying metadiscourse features belonging to each (sub-)category. For some sub-categories, a low level of agreement was retrieved. In those cases, we discussed our coding practices in order to identify the criteria or search terms that were the source of these disagreements.

91

One second rater was employed to analyse the majority of the sub-categories. However, after completing the analysis, several editions were made to the taxonomy. More specifically, five sub-categories were added to account for features that were observed during the analysis of the full corpus. These were the four attitude sub-categories and the topic shift sub-category (see section 3.3.2). For practical reasons, the second rater was unavailable to test these sub-categories, so another second rater was hired. These sub-categories were tested using the same 10 essays. The comparisons between the second raters’ analyses and my own were used to calculate Cohen’s kappa statistic (Hallgren, 2012). The relevant statistics, alongside the main considerations that were devised during the discussions, are reported in each article.

In order to ensure that the interview data were internally valid, the interview guide was created by consulting previous metadiscourse-related studies that used interview methods (Aloyousef & Picard, 2011;

Tavakoli et al., 2012). Of those reviewed, Hyland (2004) was the only study that reported the interview guide, which provided a useful starting point. However, since this project involved interviewing English teachers instead of authors, the interview guide had to be modified to suit this purpose. Specialised terminology was avoided in order to prevent misconceptions, and the semi-structured nature of the interviews ensured that possible misunderstandings could be clarified.

External validity refers to the extent to which findings can be generalised to “a larger group, to other contexts or to different times” (Dörnyei, 2007, p. 52). As opposed to creating a single assignment prompt for all pupils, the essays collected were written for prompts and under conditions set by English teachers, or by exam boards (e.g. AQA, 2019), which is considered to increase the generalisability of data. Although this approach may have somewhat compromised the comparability of the essays across educational contexts and genres, the collected data represent the typical assignments and writing conditions with which pupils would usually work. While the conditions under which pupils

92

write may vary considerably from school to school, the results of this study are thus more likely to reflect the role of metadiscourse in upper secondary essay writing beyond the present corpus. Furthermore, the teacher interviews provided insight that benefitted the interpretation of the textual data. For example, several teachers encouraged the use of boosters, which indicates that the pupils’ reliance on these features may be related to teacher advice (see section 4.3).

Dörnyei (2007, p. 50) describes reliability as “the extent to which our measurement instruments and procedures produce consistent results in a given population in different circumstances”. Although the taxonomy was adapted for the specific purposes of the present study and no other study has used a taxonomy identical to this one, the categories and sub-categories were amalgamated and adapted from previous studies.

Furthermore, the list of search terms was compiled based on previous studies and based on the content of the present corpus. While reliability refers to the degree to which analytical tools can produce similar results among similar populations (Dörnyei, 2007), the present approach illustrates how a metadiscourse taxonomy can be adapted to the content of the target corpus. Although similar populations writing texts belonging to a single genre are likely to use similar metadiscourse features, directly applying a taxonomy and a list of search terms used for a previous study may lead to overlooking certain metadiscourse features.

Maintaining reliability in interviews can be challenging, considering that meaning is situationally co-created between the interviewer and the interviewee (Kvale & Brinkmann, 2015). I endeavoured to maintain reliability by entering each interview with an unbiased view and by trying to create a situation in which the participants felt they could speak candidly about their teaching practices. The interviews were held at their respective schools, which are spaces in which they are familiar and represent the contexts in which they teach. It is believed that these factors are conducive to achieving more accurate recall (Mackay & Gass, 2016).

Due to the semi-structured nature of the interviews, participants were

93

able to digress from the interview guide as they saw fit. These measures were taken in aiming to collect results that accurately represented the teachers’ practices and generate findings that could potentially be transferable to similar contexts.

94

95