• No results found

4 Methods and Research Design

4.4 Analysis

51 Oslo, and Helsinki were used in this thesis. In total, 270 teacher candidates, distributed across the three programs (see Table 6), completed the survey.

Table 6. Distribution of respondents

Program N Gender (male)

n %

Stanford 72 25 34.7

Oslo 122 51 41.8

Helsinki 76 24 31.6

Total 270 100 37

The response rate of two programs was close to 100%, probably because of the distribution method. The only exception to this was Helsinki (23%), due to the flexibility of the program.

In Helsinki, students can decide to take specific courses in either their third or fourth year, and there are few compulsory classes. Still, nearly all candidates that were present in the class where the survey was distributed completed the survey. The sample is representative in age and subject distribution. If we had used a digital version of the survey, we might have reached different candidates, but this would probably not have increased our sample size as these types of surveys have been shown to have response rates as low as 10% to 25% (Sauermann &

Roach, 2013). Across all three programs, the total group of student teachers consisted of 33%

males, which is similar to the average gender distribution in the teaching population in OECD countries (OECD, 2013).

52 of codes for analysis of video data (Blikstad-Balas, 2016; Klette & Blikstad-Balas, 2016; Snell, 2011). A discussion of the construct validity of the dimensions and challenges of reductionism with theory-driven analysis follows in Section 4.5 on research credibility.

As a second step of data analysis, we developed a coding book (Jenset, Klette, &

Hammerness, 2014) to score the dimensions. We started out with a simple coding scheme, only stating whether one category was present or not, but we ended up developing a more advanced coding book. The first attempt to create such a coding book had a generic

description of the scores from 1 to 4, to capture all eight dimensions of opportunities grounded in practice. However, we finally described the scores for each individual dimension. In the final protocol, each of the dimensions is operationalized on a 1–4 scale using utterances, interaction patterns, and specific observable behaviors.

The development of the coding book was influenced by other protocols, like the

Classroom Assessment Scoring System (CLASS; Pianta & Hamre, 2009), and the Protocol for Language Arts Teaching Observations [PLATO] (PLATO 5.0.). The score in our protocol measures quantity, or time spent, on an opportunity grounded in practice, ranging from very seldom and brief (score 1), to more frequent or with a duration constituting a main portion of the lesson (score 4). The time estimates during analysis were based upon time stamps the research assistants made in the fieldnotes, approximately every 10 minutes. The protocol also measures the quality of the opportunity, tapping how general or vague (score 1) these

opportunities were as opposed to specific, in depth, or connected to theory (score 4). We took this choice of preference for specific, rather than vague opportunities, based on similar decisions in other protocols like PLATO 5.0, and also because other research pointed to this qualitative difference (e.g., Little & Horn, 2007). Nevertheless, it is an empirical question whether the instructional practices described on a score 4 do indeed contribute to better

learning for the teacher candidates—and in turn, better teaching, and finally learning outcomes from the pupils—than would practices described on a score 2. Instruments such as the scoring book is still in their infancy, and further research will contribute to their development.

Appendix 2 presents the coding book, with definitions of all scores on all dimensions.

The whole lesson was the unit of our scoring. Lessons lasted from 45 to 60 minutes or more, but the duration was consistent within each program. This meant that each dimension was assigned a score in every lesson. This is contrary to protocols using intervals of 10 or 15 minutes for a score (e.g., PLATO 5.0.). We made this choice because of the current scarcity of research regarding teaching practices in teacher education, in addition to an assumption that

53 the teaching in teacher education classrooms may be less repetitive than in K-12 classroom.

Finally, the time stamps in the fieldnotes were not sufficient for a finer-grained scoring.

This PhD project relied heavily upon qualitative data, represented as qualitative excerpts in all articles. However, they have also been represented quantitatively in Article I, which

traditionally is uncommon for qualitative research (Silverman, 2006). Qualitative researchers have denied the usefulness of measurement, based on the belief that qualitative data are

naturally occurring data, not susceptible to measurement (Hammersley, 2008, p. 32). However, qualitative researchers are nevertheless making claims (e.g., about frequency and degree) that are quantitative in character and it is thus impossible to do research well without

measurements of some kind (Hammersley, 2008). We chose to represent the observation data as scores because the scoring of the data enabled us to see patterns across the programs.

Furthermore, this kind of representation made it easier to spot the highlights of our findings, and it encouraged me to look at specific aspects more in depth, as was the case when I decided the focus for Articles II and III. The explicitness of the representation also forced me to revise my own assumptions and analyses of our data (Silverman, 2006), which enhanced the validity of my assumptions and conclusions. Finally, I would argue that this way of representing the data also facilitates communication with the public, as it makes it easier for the reader to get an overall impression of our findings (Nespor, 2006).

Our conceptual framework constituted the bases of this thesis, but was primarily used as an analytical framework in Article I. In Articles II and III, we used slightly different

approaches for examining the same dataset. In these articles, I looked at two of the dimensions grounded in practice (i.e., talk about field placement and analyze pupils’ learning). The

analysis was less theory-driven, and the categories partly evolved from the empirical data. In Article II, we found that the framework by Little and Horn (2007) closely resembled our data.

This framework examined in-service teachers’ talk at the workplace, and was adapted to our context of pre-service teacher education. In Article III, we investigated the same dataset with an even more inductive approach, letting the empirical data reveal the categories it entailed, through a thematic analysis. These categories were supported by existing research (e.g., Thompson et al., 2013).

4.4.2 Analysis of survey data

To analyze the survey data in Articles II and III, we checked per item whether the variances of the programs (Stanford, Oslo, and Helsinki) were similar using Levene’s test (Field, 2009).

This was the case for all three items in Article III (p > .05), giving us the opportunity to

54 compare the three programs through a regular ANOVA. In Article II, Levene's test showed that variances were not equally distributed in the case of item 2E, and we therefore used Welch F for the overall comparison and Games-Howell as a post-Hoc test. We replaced missing data with the series mean (Dong & Peng, 2013). Missing value analysis indicated that none of the items had 5% or more missing cases. Variable 1c had the highest percentage of missing data (1.5%), and items 1g and 1i had the lowest percentage (.4%).