Linking programmes, practice and outcomes: considering impact

2. School leadership development programmes in Norway and England

2.5 Linking programmes, practice and outcomes: considering impact

Discussion with regard to the quality of evaluation has been clearly evident in discussions surrounding training and development programmes for school leaders. School leadership training and development programmes are, as has already been referred to, becoming a more common part of national public policy reforms aimed at developing the quality of educational provision (Hallinger, 2003). Educational legislation and statutory guidance has been more widely focused upon the necessity for ―high quality professional development‖

that should improve school leadership (Guskey, 2003), which is visible in recent policy documentation in England, for example ‗Every Child Matters‘, (UK Treasury, 2003) and as was noted above beginning to appear in rudimentary form in Norway. As was also noted this development is often linked to the contested belief that pupil learning outcomes will improve as a result of a better leadership and management skills base particularly in the formal leader of the school (Bell et al., 2003; Bush, 2005c; Leithwood & Levin, 2005). Within a system that has traditionally focused on accountability and assessment, evaluation in the United Kingdom⁴⁵ has been characterised by even greater visibility in recent times (Gray & Jenkins, 2002). As has already been noted, there has been a general shift in public policy focus from the evaluation of management of policy and resources to the management of outcomes.

A ‗tension‘ exists over the fact that little is really known about the impact of these programmes. Even amongst training providers there is great variation in the acceptance of causal linkage between training and improved pupil outcomes.

This to a great extent appears to reflect the way the programmes are evaluated.

As Leithwood and Levin noted,

―In a recent analysis of leadership preparation programmes across the United States, McCarthy (1999) concluded that we do not actually know whether, or the extent to which, such programmes actually achieve the goal of ―…producing effective leaders who create school environments that enhance pupil learning?‖ (p. 133). This gap in our knowledge is not because leadership preparation programmes are never evaluated; rather, the vast majority of such evaluations do not provide the type and quality of evidence required to confidently answer questions about their organizational or pupil effects. Most evaluations are limited to assessing participants‘ satisfaction with their programmes and sometimes their perception of how such programmes have contributed to participants‘

work in schools (McCarthy, 2002)‖ (Leithwood & Levin, 2005: 10-11).

The commissioned reports by Leithwood and Levin (2004, 2005) furthered developed a model to clarify the linkage between leadership and student outcomes, despite recognising the complexity of such a process and the

45 Which the authors apply to the English context.

45

―methodological challenges‖. They note that ―highly sophisticated frameworks… potentially include all of the variables at the school and classroom level that are themselves the focus of independent lines of active research with the usual debates and uncertainties about their effects on pupil learning‖ (Leithwood & Levin, 2005: 10). The authors recognised weaknesses in current evaluative approaches and research, which would require greater comprehensiveness in measurement of leadership practices, formation of ―an expanded set of dependent (outcome) variables‖, description of how leadership influences ―the condition of variables mediating their effects on pupils‖, understand moderators of leadership effects, while using more varied methodological approaches (Leithwood & Levin, 2005: 4-5). Their six-step

―framework to guide evaluations of leadership programs‖ is included below, where within a hierarchy of increasing complexity the models building upon previous levels working towards level 6 where the evaluative criterion is improved student outcomes:

Figure 3: Evaluation framework for leadership programmes. Source:

Leithwood and Levin, 2005: 36

The models face additional challenges from variation in school type, transferability of data, changes of measures in longitudinal data, missing data in addition to complexity with defining the unit of analysis (2005: 40-2).

There are, however, recognised problems with such models which were also dealt with to some extent by Leithwood and Levin. Whatever processes are set in motion, they offer little to the aid the discovery of whether a programme is

‗good or bad‘, and are said to require greater ‗effort‘ (Goldstein & Ford, 2002).

Most evaluations focus upon trainee / participant reactions, but appear to say little about learning or improved outputs / performance (Goldstein & Ford, 2002; Guskey, 2000, 2002). Additionally, as will be seen below, these are considered short term (Bush: 2010). A vast majority of evaluation models for

46

formal training are observed to adopt rational perspectives (Holton III &

Naquin, 2005). For example, Kirkpatrick‘s (1998) 4 level model continues to be widely adapted, despite considerable criticism (Alliger & Janak, 1989).

Kirkpatrick‘s (1998) four level model is a hierarchical model ascertaining a programme‘s impact on participants in terms of their reactions to it, their learning, transfer of behaviour and the impact upon results in the workplace.

Criticism of Kirkpatrick‘s model in particular, is that it is little more than taxonomy of outcomes, where the implicit causal relationships remain

‗unoperationalised‘ (Alliger & Janak, 1989; Bates, 2004), while too many intervening variables that are ignored (Holton III, 1996). Perhaps Holton‘s strongest criticism is that the model relies upon ‗participant reaction‘ as a

―primary outcome of training‖, supporting Alliger and Janak‘s reflections that reactions are not linearly related to learning, but may moderate or mediate it (Alliger & Janak, 1989). Holton III followed up these research findings that demonstrate ―little correlation between reactions and learning‖ (1996: 10), and therefore no direct link, but recognised that reactions have been shown to reinforce interest and enhance motivation acting as a moderator function (after Patrick, 1992, in Ibid.), whilst mediating other relationships (after Mathieu, Tannenbaum, & Salas, 1992). Holton‘s conclusion, like that made to some extent in the educational field by Guskey (2000) and Leithwood and Levin (2005), is that less focus should be placed on reactions to the process and more on performance outcomes. Reactions should be considered as an evaluation measure of the learning environment instead, moderating motivation to learn and learning⁴⁶. It is often experienced that evaluations do not provide the information necessary to support evidence of their effects. Such findings further add to the necessity for research into underlying attitudes amongst decision makers responsible for evaluation, requiring models that moderate or adjust the more prescriptive rational models.

Kirkpatrick‘s 4 level model, is still a strong influence within educational evaluation activity (Guskey, 2000), and even more widely in Human Resource Management (HRM) (Holton III & Naquin, 2005). Guskey modified Kirkpatrick‘s model in response to criticism that it did not ―reflect training‘s ultimate value in terms of organization success criteria‖ (2000: 55). This led to the inclusion of a level focused upon ―organization support and change‖, to investigate which factors might moderate the impact of any development initiative (2000: 83). This is the level where the school can ―support or sabotage‖ a professional development initiative (Bubb & Earley, 2007: 69). The information that organisations are claimed to base decisions on is thus declared to be flawed. The greatest problem appears to be the evaluation models applied to programmes, and the conceptualisation of what the organisation is attempting to achieve. It has been recognised as fairly straightforward to analyse programme structure, ―potential utility‖ and participant perceptions of

46 In meta-analysis Alliger and Janak (1989) found an overall correlation of .07 between reactions to training program and learning outcome.

47

effectiveness without ascertaining whether they ―produce effective school principals‖ and without good measures of effectiveness (Cowie & Crawford, 2007: 133). Guskey‘s model did appear to offer a wider view of student outcomes to include the cognitive, affective and psychomotor, where the data should be gathered by mixed and multiple methods (2000: 212ff). Commenting on this Bubb and Earley offer an interesting note to focus in England when they partly describe the cognitive as being understood in this context as ―the most obvious – pupil attainment (the dreaded performance tables!)‖ (2007: 69).

Bush affirms the complexity of this model in ascertaining transference from programme to school (2008b: 123). He outlines two of the major problems with programme evaluations, firstly that they rely ―mainly or exclusively on self-reported evidence‖, seldom with an a priori element and secondly that they focus on ―short term‖ impact (2008b: 114). This latter observation recognises that most changes will take place over time. These weaknesses are significant, as are Bush‘s further comments with regard to difficulties of attribution. He refers to Bush et al (2006) adaptation of Leithwood and Levin‘s model when interpreting the findings of their evaluation of the NCSL New Visions programme, noting the ―diminishing influence of the programme as the model moves through each phase‖ (2008b: 120). The authors had concluded that

―[p]roving a straightforward link‖ between a programme and evidence of school improvement ―is fraught with difficulty‖ (Bush et al., 2006: 197). Simkins et al (2007, 2009) also adopted a model influenced by Leithwood and Levin‘s, when evaluating 3 NCSL programmes. The authors also agreed that Kirkpatrick‘s model is too linearly focused and omits key variables, particularly contextual (Simkins et al., 2009: 34). As a result they focused on factors influencing participant learning in relation to ―in-school components‖ (2009: 29) developing from programme input through intermediate to final outcomes dependent on antecedents and moderators (2009: 35-7). Surveys were used to ascertain evidence of longer term impact, but in line with Leithwood and Levin‘s outlined challenges data quality was weakened by poor response rates (2009: 31). The

―poverty of theory in the evaluation of learning and development interventions‖

also creates the problem of investigating how ―individual development might translate into organisational transformation‖ (Bush, Glover, & Harris, 2007:

15).

Møller considers another weakness of this model to be that it is grounded within the ―rationalistic paradigm‖ and therefore ignores critical, institutional and political theories (Møller, 2006b: 35). Additionally it is mainly based upon Anglo-American studies and therefore limited in terms of generalizability, particularly as it ignores outcomes such as democratic and social development and outcomes, key areas in Norwegian education. Nordic research has focused much more upon educational frameworks, particularly from cultural and micro political perspectives and considering leadership from a relational perspective rather than based upon role but also perception of identity amongst leaders (Møller, 2006b: 37-8). As a result there is little research on school effectiveness

48

and scepticism to attempting to link school leadership and pupil outcomes, even though these areas are being increasingly referred to by politicians and within government documents (Møller, 2006b: 40).

The perceived link between training and development and improved outcomes is evident across the educational spectrum, where the terminology in governmental documents highlights the necessity of ascertaining ―impact‖ of initiatives (e.g.:

OfSTED, 2004; TDA, 2007). With particular regard to educational leadership Bush notes the importance of connecting investigations of impact of professional development with the nature, purpose and intended outcomes for the initiative in focus (2008b: 107). Bush recognises the importance of linking programme impact with the intended outcomes of the initiatives, but is an area that he claims has received only limited discussion (Ibid.). Bush goes on to outline how this discussion has been limited; mostly surrounding ―student outcomes‖ and ―school improvement‖, which he considers to be a ―vaguer notion‖. For example, Flecknoe‘s study of a CPD programme for teachers confirmed the difficulty of ascertaining a link to direct effects upon pupil outcomes and subsequently whether ―all teachers‖ could be enabled to raise achievement on completion, ―once the importance of the easily measurable has been exposed for its inadequacy‖ (Flecknoe, 2000: 455). Flecknoe further highlighted the challenges of controlling for halo effects (Flecknoe, 2002). It is an area still supported by ―belief‖ rather than ―evidence‖ (Bush, 2008b). Almost writing in terms of faith, González et al declared that ―[h]owever ludicrous to some and uncomfortable to others it may seem, we believe in the existence of a linkage between principal preparation programs and student achievement in schools‖ (2002: 265-6). The authors‘ purpose in studying such linkage was to refocus research onto outcome-based standards that would in turn help develop a model that could ―adjust preparation programs with the intent of improving student achievement‖ building upon the development of internal activities (ibid.). Research in the learning and skills sector in England revealed a relationship between type of leadership development experienced and espoused views of leadership (Muijs, Harris, Lumby, Morrison, & Sood, 2006). Although the authors recognise that this offered no proof of causality, they observed different development forms related to different styles of leadership (2006: 103).

The challenge of which models were most effective, still, however, remained.

Outcomes from leadership programmes can include ―sustained‖ change in leadership behaviour, school conditions, processes of teaching and learning and pupil outcomes (Simkins et al., 2009: 34). The authors affirm the protracted nature of the processes required for to uncover evidence of effects. Leadership development is a long term course of action requiring time for change to take root in others. There are additionally many other variables that will mediate and moderate the quality and timing of change. While this area has been of significant interest to the NCSL, meta-analysis of NCSL evaluations, however, revealed that there was little evidence of how the impact of programmes was understood and measured (Bush et al., 2007).

49

Achieving outcomes, ascertaining impact of programmes

Bush (2008b) interestingly outlines discussion over the ―significance of leadership and management development‖. The general ―purpose of leadership development is to produce more effective leaders‖, which implies the achievement of intended outcomes (Bush, 2008b: 108). The main reported criteria utilised in assessing the value and impact of initiatives include improvement of pupil learning, attitudes and engagement, improved staff motivation, capability and performance, and promotion of equity and diversity, democracy and participation. Bush outlines a series of alternatives and challenges with regard to the design components and focus of programmes which influences their assessment and evaluation. These are presented in the table below.

Table 1: Criteria for assessment of the value and impact of leadership development programmes. Adapted from Bush, 2008b

Design components Alternatives for programme design Main purpose Developing leaders Leadership development Underpinning Succession planning Meeting individual

needs

Focus Standards-based Holistic development

Implementation style Content-led Process-rich

Aims Specific repertoire Contingency

Implementation context Campus-based Field-based

Participation and ethics Generic⁴⁷ Equity and diversity focused

Bush raises important issues with regards to these different components. He notes, for example, how the overall purpose of NCSL programmes has predominantly been directed towards training and developing leaders, particularly the role of the Head teacher. There has been a movement towards more generic programmes in line with the policy initiative of systematised

―succession planning‖, rather than those based upon ―individual needs‖ of participants. The NCSL programmes are tied to the national policy initiatives, and their underlying values, that were outlined above. That these programmes are based on standards appears to constitutively highlight the importance of technical aspects of leadership and management of schools. These points are highly significant for this study as policy attention appears to shift in terms of content, and subsequently assessment, away from development based programmes, more usually associated with Master degrees. Bush notes that the

47 Bush raises the issue of whether leadership learning should ―address issues of equity and diversity‖, the alternative position could therefore possibly be seen as

‗generic‘.

50

apparent purpose of standards is to measure performance against an articulated

―clear set of expectations for leaders‖, whereby successful programme completion is seen to provide ―baseline competence in the leadership role‖

(2008b: 110-1). But, as Bush further points out, such an approach is in danger of reducing the complexity and contextualised nature of this role. This raises issues for the evaluative approach, where constitutively the ―measurable‖ takes precedence over the less quantifiable. While these approaches apply more strictly to the nationally mandated programmes under the control of the NCSL, the wider implications of the public policy initiatives outlined earlier mean that HEIs are also increasingly expected to focus in this way. This might be due to particular programmes being funded or part funded by national bodies, or as will be seen in Chapter 4, because the higher education field also faces parallel demands for evidence of impact, which is increasingly linked to funding. In the next section I consider examples of how evaluation of NCSL programmes has developed along with the ensuing debate.

Evaluation, school leadership and the NCSL

While this study only focuses indirectly upon the NCSL, I consider how it has impacted the wider field of school leadership and particularly with regard to evaluation. The rationale behind this focus is that the evaluations of NCSL programmes are often undertaken by HEI academics under tendered contract and questions related to these exercises have formed on-going dialogue between the College and the wider academic and research field. An example of these debates was the BELMAS/SCRELM symposium in 2004. One issue raised was that of ownership and control of the evaluation process, where the ―terms of engagement‖ were considered ―determined largely by the College‖ (Bush, 2005a: 35). While evaluation aims were described as ―absolute‖, there was noted to be some flexibility over methodology and the possibility to suggest alternative methods or designs (2005a: 34). Such a process was described as a partnership but with controlling interests over some areas (2005a: 35).

Additionally, questions are raised over the assumptions and purpose behind the evaluation act (Simkins, 2005b). Simkins reiterates the centrality of evaluation with NCSL programmes, but also considers how the underlying ―expectations and constraints‖ influence choices about methodology and approach. He states,

―[u]nderlying these choices will be assumptions about the kinds of knowledge that evaluations can or should generate‖, particularly within a wider framework of increased accountability and focus upon improving educational outcomes (Simkins, 2005b: 35). The amount of resources and time given influences the process greatly. The issue of impact, in moving beyond participant reactions, is noted again to have been a difficult area within evaluations mandated by the College. This involves taking into account ―contextual complexity‖ in specific evaluations and ―joining up‖ the different activities for improving design and development (Wright & Colquhoun, 2005). This complexity issue with regard to impact is reiterated by Earley (2005), who isolated some of the shortfalls in models applied, particularly with regard to time and use of mapping techniques

51

and use of more qualitative approaches. He notes that baseline data and follow up studies are often of limited value (Earley, 2005; Earley & Evans, 2004), but that longitudinal designs fit poorly against demands for ―immediate evidence‖

that policy is working (2005: 37).

A response from the NCSL recognised that evaluations had focused more particularly on programme effectiveness rather than on a ―wider critical understanding of the evaluation of school leadership development‖ (Conner, 2005). This issue was also related to that of control, where the College hoped that good relationships with the wider academic field could be maintained.

There was however a sense in which the College has increased its distance from the HEI field and less of a feeling of a partnership. NCSL was also concerned about the impact upon the quality of evaluations due to available time and resources, which had also been raised elsewhere (Southworth, 2004), but there was some suggestion of disagreement over how long this time scale should be.

The College therefore appears to be represented as having a different purpose and agenda for its programmes, and subsequently their evaluation. This suggests a complex field and meeting of minds that needs unravelling. Within a climate of greater control and demand for evidence, it is suggested that analysis should also be directed to the whole development of evaluation models, in particular towards the decisions that guide the choice of approach and development of evaluation model with utilization in mind. Such engagement is considered interesting to explore further with subunit members.

In document Demands, designs and decisions about evaluation: On the evaluation of postgraduate programmes for school leadership development in Norway and England (sider 60-72)