We were sceptical of the submission of reports and examinations for a second opinion (double reading), which appeared to be default at some departments

(1)

D

OUBLE READING IN

N

ORWEGIAN HOSPITAL RADIOLOGY DEPARTMENTS

P

ETER

M

ÆHRE

L

AURITZEN

Department of Diagnostic Imaging and

HØKH Research Unit Akershus University Hospital

Institute of Clinical Medicine, Faculty of Medicine, University of Oslo

Norway

(2)

© Peter Mæhre Lauritzen, 2016

Series of dissertations submitted to the Faculty of Medicine, University of Oslo

ISBN 978-82-8333-180-6 ISSN 1501-8962

Cover: Hanne Baadsgaard Utigard

Printed in Norway: 07 Media AS – www.07.no

(3)

“There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss.”

Douglas Adams, Life, the universe and everything, 1982

To my family. You are everything to me.

(4)

Preface

The starting point of this project was a series of conversations with my senior colleague Gunnar Sandbæk while on call at Aker University hospital. Gunnar is a man who gets things done, and I liked to believe that so was I. These conversations often touched on questions of efficiency, responsibility and purposeful workflow.

We discussed how radiology could best provide the clinicians with what they need:

timely and clear answers to their diagnostic questions. We were sceptical of the submission of reports and examinations for a second opinion (double reading), which appeared to be default at some departments. We wondered whether this practice improved quality, or simply squandered resources and caused delays.

The question was simple enough. After exploring this subject for the better part of five years, I can state with great conviction that: “It is complicated”.

For every step in the diagnostic imaging process, judgement is used repeatedly: the decision to refer, what to include in the referral, how to image the patient, the interpretation of the images, what to emphasize or omit in the report, how to understand the report in the clinical context, and ultimately how to manage the patient.

For every step rights and wrongs exist, but in between the two is a grey area of judgement. What happens in this grey area is not always recorded, and it is often difficult to measure. Even the rights and wrongs are evasive, as most imaging findings are not followed by an undisputable diagnosis – the gold standard.

In the intersection between radiology and quality improvement I have come to realize that it isn´t all about getting it done and getting it right. We also have to learn from our inevitable mistakes and improve our systems to prevent recurrences.

Although my perspective has changed since I started, I still believe that our use of judgement is an important field of enquiry. It is complicated, but not impossible nor futile.

(5)

Acknowledgements

First I would like to sincerely thank my supervisor Pål Gulbrandsen and my co-‐

supervisors Gunnar Sandbæk and Petter Hurlen. They have always given me all the support I needed, and without them I would never have left the starting blocks.

Gunnar gave me the opportunity to turn a question into a dissertation by giving me my first research position. His support as an experienced radiologist, researcher and manager has been invaluable. I envy Gunnar his positive attitude, contagious enthusiasm, and his habit of giving compliments.

Petter’s combined knowledge of management and computer programming has been priceless. He gives precise feedback, asks crucial questions, and has taught me to limit my scope. I admire his ability to cut to the chase.

It is impossible to list the things Pål has taught me and the things I can thank him for.

Pål has provided daily guidance in all questions big and small. It is inspiring that despite being rich in both knowledge and wisdom, Pål possesses such curiosity about everything.

Jack Gunnar Andersen, Mali Victoria Stokke, and Anne Lise Tennstrand, my

collaborators at Ullevål, Drammen, and Bærum deserve great praise. The work they have done is above and beyond the call of duty, and the project could not have been carried out without them.

By contributing their clinical experience Thomas Hegglund, Rolf Aamodt, and Andreas Ødegaard at the department of abdominal surgery and Gisle Bjerke, Knut Stavem, and Vidar Søyseth at the department of pulmonary medicine have provided the basis for the very end point in two of the studies. Knut Stavems extended involvement has further improved our product, and was much appreciated.

A special thanks to Fredrik Dahl for contributing continuing statistical support throughout this project and to fellow statisticians Jurate Saltyte Benth and Jonas Christoffer Lindstrøm for valuable help with calculations and spread sheets.

Thanks to my dear colleagues Heidi Eggesbø and Erik Rud for feedback on several drafts of the questionnaires. The feedback from Ingrid Nermoen and Christofer Lundquist on the first draft of the clinical rating scale was great help in preparing the pilot study. Thanks to Ellen Deilkås for sharing her insight into quality culture and approaches to quality improvement. Thanks to Haldor Husby for retrieving data from Ahus and support with the document comparison software. Thanks to all my

colleagues at the Health Services Research Centre for maintaining an inclusive, supportive, and diverse work environment. Thanks to everyone at the libraries at Akershus University Hospital and Oslo University Hospital for always providing such an excellent service.

Thanks to the Norwegian Medical Association for access to data from the SERUS system and for funding from the fund for Quality Improvement and Patient Safety.

Thanks to the Norwegian Society of Radiology for funding and for allowing our survey to be distributed to all its members. Thanks to all who said yes when they could have said no.

(6)

Funding

The Department of Diagnostic Imaging at Akershus University Hospital, the Department of Radiology and Nuclear Medicine at Oslo University Hospital, the Norwegian Medical Associations’ fund for Quality Improvement and Patient Safety and The Norwegian Society of Radiology, funded this project.

(7)

Annotations & abbreviations

Annotations

The terms “discrepancy” and “discordance” may be used interchangeably in the description of differences between interpreters in reporting their findings. Although some discrepancies constitute “errors”, this is not always the case. The term

“discrepancy” is used when reporting our own findings because it is a more accurate description of the variation between readers than “error”. Also our focus was the clinical consequence of such variations and not the blameworthiness of the readers involved.

The terms “quality assurance” and “quality improvement” are generally used interchangeably. “Quality assurance” refers more specifically to detection and correction of individual errors. Whereas “quality improvement” refers more generally to any endeavour to improve quality of services.

Radiologists are also referred to as readers when it concerns their role as interpreters of examinations. Consultant radiologist refers to any radiologist employed in a senior position (Overlege). This includes acting or appointed consultants (Konstituert overlege) that have not received specialist approval.

Radiologists employed in training positions (Lege i spesialisering) are referred to as residents.

The terms “radiological examination” and “imaging examination” are used interchangeably.

“Radiology conferences” refer to any meeting in which radiologists and clinicians meet to discuss patient management in light of medical imaging results. This also includes Multidisciplinary team meetings (MDTM) and daily radiology rounds.

(10)

Abbreviations

-‐ ACR (American College of Radiology) -‐ Ahus (Akershus University Hospital)

-‐ BI-‐RADS (Breast Imaging Reporting and Data System) -‐ CT (Computed Tomography)

-‐ CTPA (Computed Tomographic Pulmonary Angiography) -‐ EPR (Electronic Patient Record)

-‐ ESR (European Society of Radiology) -‐ FTE (Full Time Equivalent)

-‐ JCAHO (Joint Commission on Accreditation of Healthcare Organizations) -‐ MRI (Magnetic Resonance Imaging)

-‐ NCRP (Norwegian Classification of Radiological Procedures) -‐ NORAKO (Norsk Radiologisk Kode)

-‐ PACS (Picture Archiving and Communication System) -‐ PET-‐CT (Positron Emission Tomography CT)

-‐ RCR (Royal College of Radiologists) -‐ RIS (Radiology Information System)

-‐ SERUS (System for Electronic Reporting of Educational Activity in Hospital Departments)

-‐

SQL (Sequence Query Language)

1 List of papers

I. Lauritzen PM, Hurlen P, Sandbæk G, Gulbrandsen P. Double reading rates and quality assurance practices in Norwegian hospital radiology

departments: two parallel national surveys. Acta Radiol. 2015;56(1):78-‐

86.

II. Lauritzen PM, Stavem K, Andersen JG, Stokke MV, Tennstrand AL, Bjerke G, Hurlen P, Sandbæk G, Dahl FA, Gulbrandsen P. Double reading of current chest CT examinations: clinical importance of changes to radiology reports. Accepted for publication in European Journal of Radiology.

III. Lauritzen PM, Andersen JG, Stokke MV, Tennstrand AL, Aamodt R, Heggelund T, Hurlen P, Sandbæk G, Dahl FA, Gulbrandsen P. Prospective double reading of abdominal computed tomography: the clinical importance of changes to radiology reports. Under review.

(11)

2 Summary

2.1 Background

Diagnostic information is extracted from imaging examinations through a process of interpretation. This process is carried out by humans and involves professional judgement and decisions made under conditions of uncertainty. Therefore variations and even errors will occur.

These variations, often referred to as discrepancies, can be uncovered by double reading. This involves having two different readers interpret the same examination.

It can be done retrospectively, which is often the case with peer review systems and audits. Applied prospectively double reading can be used for quality assurance of radiology reports, and has been shown to reduce errors and increase sensitivity.

Double reading is routine in the training of resident radiologists. In Norwegian hospitals there is tradition for double reading, even of radiological examinations read by consultants. The consultant may usually choose whether to finalize the report directly or submit the examination for a second reading by a colleague. The second reader reviews the examination and the preliminary report, and corrects it if necessary before it is finalized.

It is this sequential, non-‐independent double reading in which both readers are consultant radiologists, which is the topic of this dissertation. This practice varies considerably between departments, and the necessity and feasibility of such double reading is much debated in the Norwegian radiological community. Special emphasis has been made on the delays caused and resources consumed.

There are few reports of the extent to which double reading is practiced internationally, and none that estimate the working hours consumed. Our knowledge of the effect of the practice stems from related but not identical

practices such as peer review, audits, independent double reading in mammography screening and double reading of resident radiologists.

(12)

2.2 Objectives

The objectives of this study were:

• To investigate the rates of double reading in Norwegian hospital radiology departments.

• To identify department characteristics associated with the rates of double reading

• To investigate possible associations between double reading rates and other quality assurance practices.

• To estimate the proportion of radiology reports that were changed during double reading of Computed Tomography (CT) examinations of chest and abdomen respectively and to assess the potential clinical impact of these changes.

• To explore whether characteristics of examinations or radiologists were associated with a higher proportion of clinically important changes.

2.3 Material and methods

Quality assurance practices and rates of double reading in Norwegian hospital radiology departments were explored in two parallel nationwide surveys issued to department management and consultant radiologists respectively (paper I). Both surveys covered practice of double reading, department guidelines and quality improvement work. Management also reported staffing and perceived resource situation.

The responses of consultant radiologists grouped according to workplace were used to validate management responses about working hours consumed by double reading. Departments were categorized according to teaching status. Responses regarding different aspects of department quality improvement work were organized into three quality indices. The items in “Person index” concerned monitoring of personal performance and feedback to individuals. The items in

“System index” concerned systems performance monitoring and collective feedback.

The items in “Appropriateness index” concerned assurance of appropriateness of investigations.

Comparisons of departments were done with Kruskall Wallis’ test. Linear regression was used to assess whether differences in double reading rates remained significant when adjusting for size. All correlations were tested with Spearman correlation.

The clinical importance of changes to radiology reports was estimated in two retrospective, cross-‐sectional, multicentre studies (paper II and III). In paper II we focused on Chest CT examinations of patients from the departments of internal medicine. In paper III we focused on Abdominal CT of surgical patients. In each study we collected pairs of preliminary and final reports from more than 1.000 consecutive double read examinations. We used document comparison software to compare the preliminary and final reports. Experienced clinicians in relevant specialties rated the clinical importance of all changes in content.

(13)

We reported classifications of clinical importance of report changes as percentages with binomial 95% confidence intervals. Exploratory analysis of associations between clinically important changes and characteristics of patients, examinations, and readers was performed with multivariate logistic regression. We also constructed two random effects models to test for clustering of clinically important report changes in separate examinations read by the same radiologist.

2.4 Summary of published results

In paper I we found a mean double reading rate of 33% of all exams read by consultants, consuming an estimated 20-‐25% of consultant working hours. The double reading rates were highest in university hospital departments (59%), intermediate in other teaching departments (30%), and lowest in non-‐teaching departments (11%). By modality double reading rates were highest for Magnetic Resonance Imaging (MRI) (47%) and CT (33%), intermediate for X-‐ray (24%) and fluoroscopy (23%), and lowest for ultrasonography (16%) and intervention (16%).

Among the three quality indices, mean scores were the highest on the

“appropriateness index” (68%), intermediate on the “person index” (56%), and lowest on the “system index” (37%). There were no correlations between double reading rates and scores on any of the three quality indices.

In paper II changes were classified as clinically important in 91 (9%) of 1,023 reports.

Of these: 3 were critical, 15 were major, and 73 were intermediate. More clinically important changes were made to urgent examinations, and less to female first readers. Chest radiologists made more clinically important changes than other second readers. The severity of the radiological findings was increased in 73 (80%) of the clinically important changes.

In paper III changes were classified as clinically important in 146 (14%) of 1,071 reports. Of these: 3 were critical, 35 were major, and 108 were intermediate.

Important changes were made less frequently when abdominal radiologists were first readers, and more frequently when they were second readers and to urgent examinations. The severity of the radiological findings was increased in 118 (81%) of the clinically important changes.

2.5 Conclusion

The practice of double reading in Norwegian hospital radiology departments is extensive, but there are large variations between departments with different teaching status and between modalities. Double reading has a major impact on workflow and output directly by consuming working hours, and probably also indirectly by generating more investigations. The rates of clinically important changes to radiology reports following double reading indicate that some quality assurance of radiological interpretation is warranted. A higher yield of discrepant interpretations may be achieved by targeting a selection of urgent examinations and examinations read by inexperienced radiologists, and using subspecialist second

readers.

(14)

3 Introduction

The purpose of most radiological examinations is to provide diagnostic information, improving the basis for clinical decision-‐making. Radiological examinations are subject to interpretation by humans. The process of interpretation involves judgement and decision making under conditions of uncertainty. Inevitably, there are variations. The results will not be identical if the same exam is interpreted by different readers or at different times. When such variation affects the conveyed result of the interpretation, it may be referred to as a discrepancy.

When a discrepancy becomes an error is subject to opinion. Although not all

discrepancies constitute errors, it is vital to acknowledge that both discrepancies and errors do occur in the practice of clinical radiology. An autopsy study of patients dying in hospital estimated that radiological misinterpretation caused 8% and contributed to another 33% of diagnostic errors in patients with relevant imaging [1].

The reports “To Err is Human, Building a safer health system” followed shortly after by “An organization with a memory” made clear the massive scale and

consequences of preventable medical errors, which were estimated to cause more deaths than motor-‐vehicle accidents, breast cancer and AIDS. The reports raised awareness of quality issues and patient safety, and emphasized the need for a culture and system for error reporting in order to prevent recurrences by learning and system improvement [2, 3].

Double reading is a quality assurance practice in which two different readers interpret an imaging examination. Internationally it is routine in the training of resident radiologists, who submit virtually all their preliminary reports and examinations for a second reading by a consultant. If necessary, the consultant corrects the report before it is finalized.

In Norwegian hospitals there is tradition for double reading, even of radiological examinations read by consultants, who may usually choose whether to finalize their report directly or submit the examination for a second reading by a colleague. This practice predates the digital transformation of radiology departments that occurred around the turn of the millennium with the implementation of Radiology

Information Systems (RIS) and Picture Archiving and Communication Systems (PACS) [4]. Prior to this, images were read on film alternators, and written preliminary reports were placed next to the images. The second reading was often conducted in conjunction with a radiology conference the next morning. Although the

implementation of RIS and PACS led to substantial workflow changes, in many departments this form of sequential, non-‐independent double reading was retained.

The topic of this dissertation is this prospectively applied, sequential, and non-‐

independent form of double reading, in which both readers are consultants. Unless otherwise specified “double reading” refers to this practice, and not double reading of residents or other related practices.

(15)

The practice of double reading has been much debated in the Norwegian radiological community, and it has varied considerably between departments [5]. There have been concerns that double reading cause limitations on output and unacceptable waits, and that final reports are unduly delayed [5, 6]. It has been argued that a beneficial effect is not well established, and that radiologists become less vigilant, and more prone to error when colleagues check their work [7]. Some have worried that radiologists are less willing to make decisions and assume responsibility for them, and that routine submission of examinations for double reading represents a disclaimer of that responsibility [5].

Our knowledge of the discrepancies of interpretation uncovered by double reading stems mainly from three sources: screening programmes, retrospective audits or peer-‐review, and evaluation of resident on-‐call performance.

Some breast cancer screening programmes use independent double reading, in which the second reader is blinded to the interpretation of the first [8-‐10]. Feasibility of independent double reading depends on a limited number of options for

interpretations or ideally a categorization system such as the “Breast Imaging Reporting and Data System” (BI-‐RADS) used reporting mammograms. Therefore independent double reading is seldom used in clinical radiology. For screening mammograms the reported rates of discrepant interpretations are in the order of 5% [11]. There are several reasons why these results are not necessarily valid for clinical radiology. The population consists of screening subjects, and not patients referred for investigation of a condition or symptom. The frequency of pathology is quite low with concordant positive interpretations in 2.1% [11]. The examination is aimed at the detection or exclusion of one diagnosis, cancer.

Information about interpretational discrepancy rates is also found in reports from peer review and audits. Peer review is a continuous process in which all colleagues (peers) take part in review of each other’s examinations, and rate their agreement or disagreement with the interpretation in the report. This is usually done

retrospectively by reviewing previous examinations when they are compared to the current ones being interpreted. Audits also involve retrospective review of

examinations and reports, but are usually one-‐time or periodical, not continuous, and may involve external expert reviewers as opposed to colleagues. In both cases the goal is performance measurement of departments and radiologists, and improvement through shared learning, rather than quality assurance of individual radiology reports.

The use of peer review is widespread in the United States of America, where a continuous random review of 5% of cases is required for credentialing by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) [12]. The most prevalent peer review system is RADPEER, which was introduced by the American College of Radiology (ACR) in 2002 in response to the report “To Err is Human” [2, 13]. The reported rates of interpretational discrepancies from audits and peer

(16)

peer review data have the strength that they usually involve a large number of examinations and readers. However, critics have raised concerns over sampling issues and underreporting, and a more recent survey report radiologists’ conscious manipulation of review data by biased sampling and reporting [17, 18].

The third major source of interpretational discrepancy rates is the abundance of studies evaluating resident performance. These studies are heterogeneous both with regards to design and results. Agostini et al reported that the attending or resident radiologists missed one or more lesions in 71% of whole body CT examinations of polytrauma patients and that 37% of all lesions were missed [19]. Reported

discrepancy rates for emergency angiograms of the head and neck are 10.4% for MRI and 13.6% for computed tomography [20, 21]. For plain chest radiographs the reported discrepancy rates are 1% for the presence of pneumonia and 1.9% for the presence of congestive heart failure [22, 23].

These studies show a large variation in discrepancy rates between modalities, settings, and probably also incidence of pathology. The results, however, are not necessarily valid in the context of quality assurance of consultant interpretation.

There are several factors that may contribute to higher discrepancy rates. The level of experience of the readers has been shown to influence discrepancy rates [21, 24].

Reported discrepancy rates are higher in positive than negative examinations, and one might expect a higher frequency of pathology in emergency after-‐hours examinations [25, 26]. Furthermore, long shifts, increasing caseload and interruptions in the form of telephone calls may all negatively affect diagnostic accuracy [24, 27, 28].

Internationally, there are few reports of the extent to which double reading is practiced. One Swedish university hospital reported double reading all examinations in 2008, while in 1991 the JCAHO requirement of reviewing 5% of cases was met by 74% of imaging groups in the USA [29, 30]. Husby et al reported that 41% of imaging examinations in Norway were double read in 2008¹ [31].

In prospective double reading potential errors are corrected one by one. In contrast, performance measurement, as accomplished by retrospective peer review or audit, does not in it self improve quality [17]. Therefore it is vital to couple performance measurement with a quality improvement initiative. Peer review is usually coupled with “discrepancy meetings” in which colleagues discuss discrepancies for the purpose of shared learning, but also in order to reach a consensus on the final rating of the discrepancies. Little is known about how Norwegian radiology departments couple double reading with performance evaluations and initiatives to promote shared learning.

1 At the time this report was published, work with the present survey was already well under way.

(17)

3.1 Unanswered questions

Although much debated, many aspects of double reading practices in Norwegian hospitals have not previously been reported, and it was the purpose of this study to address and explore some of these issues.

There are no estimates of consumption of working hours. The factors that contribute to variation in double reading rates between departments are not fully described.

There is little or no data on the application of other quality assurance practices and their possible association with double reading rates. The effects of prospective double reading of examinations, which are read and selected for double reading by consultants, is also unknown. We have neither estimates of the proportion of radiology reports that are changed following double reading nor the proportion of such changes that are clinically important.

There are many similarities with practices such as independent double reading, peer review, audit and over-‐reading of residents, and our knowledge of the effect of double reading stems from these, related practices. However, considerable differences in patient populations, reader experience, reading conditions, workflow and data collection limit the validity of results originating from other settings. These differences also make comparison particularly interesting, and offer opportunities for mutual learning and practice improvement.

4 Objectives

The objectives of this study were:

• To investigate the rates of double reading in Norwegian hospital radiology departments.

• To identify department characteristics associated with the rates of double reading

• To investigate possible associations between double reading rates and other quality assurance practices.

• To estimate the proportion of radiology reports that were changed during double reading of CT examinations of chest and abdomen respectively, and to assess the potential clinical impact of these changes.

• To explore whether characteristics of examinations or radiologists were associated with a higher proportion of clinically important changes.

(18)

5 Material and methods

The two main topics covered in this dissertation are the extent of double reading and quality assurance practices in Norwegian hospital radiology departments (paper I) and the clinical importance of changes to radiology reports following double reading (papers II & III). The former was approached by two surveys while the latter was assessed in two retrospective, cross sectional, multicentre studies.

5.1 Two national surveys 5.1.1 Survey design and items

Quality assurance practices and rates of double reading in Norwegian hospital radiology departments were explored in two parallel nationwide surveys conducted between 27 March and 27 May 2012. The two electronic surveys were issued to department management and consultant radiologists respectively. The management survey covered staffing, perceived resource situation, practice of double reading, department guidelines and department quality improvement work (cf. appendix 1). The radiologist survey covered practice of double reading,

department guidelines, and department quality improvement work (cf. appendix 2).

Some of the items in the two surveys were identical in order to obtain similar information from separate sources.

5.1.2 Participants and recruitment

The Norwegian Hospital Reform in 2002 transformed approximately 70 county owned hospitals to 28 government owned health corporations [32]. Many of these corporations operate at several separate locations, because merged neighbouring hospitals were not always collocated. Department management may be separate on each location or merged and located mainly on one location. We decided to define a department as the smallest unit with an immediate supervisor to the radiologists. By this definition there were 45 separate hospital radiology departments in Norway.

Although accounting for 23% of all performed imaging exams in Norway (2008), we decided not to include private imaging centres since a previous study showed that they perform double reading to a limited degree compared with hospital radiology departments [31, 33]. We decided to include the radiology departments of private hospitals owned by non-‐profit trusts such as Martina Hansens Hospital, Lovisenberg Diakonale Sykehus, Diakonhjemmet Sykehus, Haraldsplass Diakonale Sykehus, and Betanien Sykehus since they provide imaging for inpatients and constitute an integral part of public health care in Norway. Whereas for-‐profit providers such as Unilabs, Curato, and Aleris that serve mainly outpatients were excluded.

The target population for the management survey was the management at the 45 hospital radiology departments. We defined management as the chief medical officer and/or the head of the department. All management invitees were contacted by telephone prior to receiving the survey. Non-‐responders were reminded by e-‐mail and again by telephone before closing the survey. At departments where the head of the department and the chief medical officer were not the same person they chose

(19)

either to submit joint or separate responses. Separate responses were merged into one department response (cf. paper I, Fig 1). When merging discrepant responses to individual survey items priority was given according to the responsibilities of the management representatives for each survey item.

The target population of the radiologist survey was consultant radiologists working in hospital radiology departments (excluding management). In order to reach this population, all 726 members of the Norwegian Society of Radiology were invited by e-‐mail to participate in the survey. Non-‐responders were reminded by e-‐mail. We decided to exclude responders not working mainly or exclusively in a hospital department as not belonging to the target population.

5.1.3 Analysis

The analysis of the surveys was conducted in an exploratory manner.

5.1.3.1 Validation of responses

Department staffing information was also acquired for all teaching departments from the SERUS² reporting system, and served to validate management responses on staffing. The validated staffing information was used to calculate the size of the target population in the radiologist survey and the proportion of residents in the radiologist staff. The responses of consultant radiologists grouped according to workplace were used to validate management responses about working hours consumed by double reading.

5.1.3.2 Characterization of departments

We considered several characteristics of departments including size, teaching status, regional health authority affiliation, and presence of subdivisions. Categorization according to affiliation with North-‐, Middle-‐, West-‐ or South-‐East Regional Health Authorities was abandoned when analysis showed that the variation of practice was similar within and between regions. This indicated that regional affiliation did not represent a unifying factor with regards to double reading and quality assurance practices. Categorization according to presence of subdivisions was also abandoned, as it proved difficult to construct exhaustive and mutually exclusive categories.

When present, divides were based on modality, anatomic regions, referring departments, or a mix according to local conditions. Some departments reported being “partially subdivided”.

Teaching status provided distinct and formal categories. In Norway, there are two levels of formal accreditation for teaching departments: Group I departments are usually large, specialized units conducting scientific research. Residents are required to serve at least 1.5 years of their residency at a group I department. Since all group I departments but one (Drammen) are located at University Hospitals, we designated this category “University hospital department” for the benefit of international readers not familiar with the Norwegian system. Group II departments are as a rule smaller units that demonstrate a capability for resident training. Residents may serve

(20)

up to 3.5 years of their residency in such units. We called this group “other teaching departments”. The remaining departments are not licenced to train residents and were designated “non-‐teaching” departments. This categorisation might reasonably reflect real differences between departments, and was used in the analysis.

It was natural to have a more direct measurement of department size in the analysis, and we considered staffing and output to be the most relevant measures. Staffing was covered in the survey, and SERUS provided data both on staffing and on department output. However, the reported output was in different and sometimes undisclosed units (NORAKO³-‐codes, NCRP⁴-‐codes, Number of referrals, number of examinations), and the data was only available for teaching departments. We decided to measure size as staffing in the form of consultant radiologist full time equivalents (FTE’s).

5.1.3.3 Construction of three quality indices

In addition to requesting rates of double reading, the management survey covered quality-‐directed activities and guidelines. Explored separately these items produce fragments of data that are not easily interpreted or conveyed, and a thematic synthesis in some shape or form seemed appropriate and necessary. Among the candidate items we identified 20 separate items that could be thematically organized into three groups, and we constructed our quality indices from these groups.

The department score on each index was the percentage of affirmative responses on index items. The first group comprised four items relating to monitoring of personal performance and feedback to individuals – “Person index”. The second group comprised nine items concerning systems performance monitoring and collective feedback – “System index”. The third group comprised seven items regarding assurance of appropriateness of investigations – “Appropriateness index” (cf. paper I, table 2).

5.1.3.4 Statistical analysis

Management reported double reading rates to each modality in the following categories: 0%, 1g 33%, 34g 66%, 67g 99%, or 100% (cf. appendix 1). For purposes of statistical analysis the intervals 1g 33%, 34g 66%, and 67g 99% were converted to their middle values 17%, 50%, and 83% respectively. Comparisons of departments according to teaching status were made Kruskall Wallis’ test. Linear regression was used to assess whether differences in rates of double reading remained significant when adjusting for size. All correlations were tested with Spearman correlation. The data were analysed using IBM SPSS Statistics⁵. All p values are twog sided, and p < 0.05 indicates statistical significance.

3 Norsk Radiologisk Kode (NORAKO): A Norwegian system for classification of

radiological examinations and procedures.

4 Norwegian Classification of Radiological Procedures (NCRP): A similar classification system, which replaced NORAKO 31 December 2011.

5 IBM SPSS Statistics, Version 22, IBM corp, Somers, NY.

(21)

5.2 Two retrospective cross sectional multicentre studies 5.2.1 Study design

In order to estimate the clinical importance of changes to radiology reports following double reading, we conducted two retrospective cross sectional multicentre studies.

In these studies we compared preliminary and final reports from double read examinations for changes, and clinicians rated the clinical importance of these changes. In paper II we focused on changes to reports from chest CT examinations of patients from the departments of internal medicine, while we focused on changes to reports from abdominal CT examinations of surgical patients in paper III. Except the difference in examination type and department affiliation of the patients, the two studies were similar.

5.2.2 Recruitment of departments

Candidate departments for collaboration were identified using data from the management survey. A prerequisite for the comparison of preliminary and final reports was that all versions are saved in the RIS or the Electronic Patient Record (EPR). More than half of the departments reported in the survey that this was not the case, and were thereby excluded as collaborators. Another prerequisite was the double reading of a sufficient proportion of examinations. To safeguard against possibly exaggerated self-‐reported double reading rates, we set an arbitrary lower limit of 20% reported double reading.

From these criteria six potential collaborators were identified, five of which were affiliated with the South-‐East Regional health authority: Akershus University Hospital (Ahus), Oslo University Hospital -‐ Ullevål and the Vestre Viken hospitals in Bærum, Drammen and Ringerike. The sixth hospital (St. Olavs Hospital) was not approached since there were already three group I hospitals in the selection, and because the distance might have complicated collaboration. Characteristics of the departments of surgery and internal medicine at these five hospitals are shown in table 1, and characteristics of the radiology departments are shown in table 2.

(22)

Table 1: Characteristics of participating hospitals.

Hospital Catchment population No of beds Annual dep. output¹ Medicine Surgery Medicine Surgery Medicine Surgery

Ahus 471,661 471,661 293 130 32,612 18,152

Ullevål 244,676* 244,676* 283 144 41,872 18,093

Drammen 156,076** 209,072** 104 75 9,688 5,930

Bærum 170,936 170,936 76 46 7,110 3,600

Ringerike 77,836 77,836 50 17 5,822 1,913

Sum 1,121,185 1,174,181 806 412 97,104 47,688

1Diagnosis Related Group (DRG)-‐weighted output (no of admissions x DRG-‐index).

*Regional functions for a population of 2.7 million.

** Regional functions for a population of 457.844.

(23)

Table 2: Characteristics and contribution of participating departments

Radiology department

Annual no of examinations all modalities1

Annual no of CT examinations1

No of consultant radiologists2

No of involved consultant radiologists

No of radiology reports collected

Percentage of double read examinations Paper II Paper III Paper II Paper III Paper II3 Paper III4 Ahus 207,365 42,878 36 26 24 319 354 59 42 Ullevål 209,796 43,584 54 34 31 405 414 20 33 Drammen 91,349 13,006 22.5 14 18 185 194 55 31 Bærum 67,284 12,431 7 14 12 71 66 15 12 Ringerike 43,274 5,862 5 5 6 43 43 45 47 Sum 619,068 117,761 124.5 91* 90* 1,023 1,071 N/A N/A 1 Norwegian Classification of Radiological Procedures (NCRP), 2012. 2 Full Time Equivalent (FTE). 3 Chest computed tomography referred from department of internal medicine, double read by consultants. 4 Abdominal computed tomography referred from the department of surgery, double read by consultants. *Two radiologists worked in two of the hospitals during this time.

(24)

5.2.3 Power calculation

Previous studies report varying discrepancy rates. Many of them involve on-‐call residents as first readers, and reported discrepancy rates span from 37% for polytrauma to 2.8% for Computed Tomography Pulmonary Angiograms (CTPA) [19, 21, 25, 34, 35]. For brain CT examinations a rate of 1.3% discrepant interpretations has been reported between specialists and neuroradiology subspecialists [36].

RADPEER data show rates of discrepancies between consultants reading CT of 1.7%

for misinterpretations and 5.5% for disagreements in difficult cases [15]. We assumed that the reported discrepancy rates for consultants were most appropriate and expected changes made only to a small proportion of the reports.

We aimed at detecting a hypothesized rate of clinically important changes of 2.5%

with a 95% degree of confidence that a true population estimate be between 0.5 and 4.5%. We believed that a considerably wider confidence interval would limit the clinical relevance of our findings.

Individual examinations read by the same radiologist were not considered independent observations. We did not have data to estimate expected Intra Class Correlations (ICC)⁶ quantifying the degree of clustering due to repeated

observations, and assumed it to be 0.25. We expected approximately 100

radiologists to be involved. The number of independent observations (N*) needed to achieve a 95% confidence interval of ±2%, given a rate of 2.5%, is estimated by the equation:

𝑁^∗=

𝑝 1−𝑝 𝑧_!+𝑧!

!

∆

!

,

where p is the rate (0.025), Δ=0.02 (2%), 𝑧!=0.84 for 80% power, and 𝑧^!

!=1.96 represents a level of significance of 5%. The resulting estimated N* of 478 has to be adjusted by multiplication by the so called Variance Inflation Factor (VIF), estimated by the equation:

VIF = 1 + (m-‐1) x ICC = 1 + (4,78-‐1) x 0,25 = 1.945,

where m is the mean cluster size (m) is the number of independent observations divided by the number of clusters (radiologists): 478/100 ≈ 4.78.

N = N* x VIF = 478 x 1.945 ≈ 930

By our estimates we would need at least 930 observations to achieve our aim.

Because of the hierarchical structure, with departments, radiologists, and individual

6 The proportion of the variance that could be attributed to the identity of the

reader.

(25)

examinations representing three levels, we collected from each department a number of examinations in relative proportion to the number of consultant FTE’s.

(Cf. table 2, section 5.2.2, p. 23.) 5.2.4 Formal approval

The South-‐East Regional committee for medical research ethics approved the study and granted a waiver of informed consent 18 January 2013. The study was approved by the data protection officer at Ahus 30 January 2013, and at Oslo University Hospital and Vestre Viken 28 March 2013. Data processor agreements with Vestre Viken and Oslo University Hospital and were signed 2 and 3 April 2013 respectively.

5.2.5 Inclusion criteria

• Patient age: 18 years or older.

• Department affiliation of patient:

• Department of internal medicine⁷ (paper II)

• Department of surgery⁸ (paper III).

• Examinations:

• Chest CT (paper II): including standard contrast enhanced CT, non-‐

enhanced CT, whole body CT⁹, and CTPA. Excluding high resolution CT and systemic arterial CT angiography.

• Abdominopelvic CT (paper III): including standard contrast enhanced CT, non-‐enhanced CT and whole body CT¹⁰. Excluding systemic arterial CT angiography, CT colonography, CT ventriculography, CT

enteroclysis, CT urography, low-‐dose urolithiasis CT and isolated upper abdominal CT examinations.

• Primary report composed by a consultant¹¹, and finalized by another consultant (examinations read by residents were excluded).

• If there were several double read examinations of the same patient, the first one would be selected.

• Addendums made after finalizing the report were disregarded.

7 Internal medicine: At the Vestre Viken Hospitals the department of internal medicine was not formally subdivided, and patients in all disciplines were eligible for inclusion. At Ullevål and Ahus the departments of internal medicine are subdivided, and patients from all disciplines of internal medicine were eligible except oncology, preventive medicine, physical therapy and rehabilitation.

8 Surgery: At the Vestre Viken hospitals the department of surgery was not formally subdivided, and patients in all disciplines except orthopaedic surgery were eligible for inclusion. At Ullevål and Ahus the departments of surgery are subdivided, and only patients from the department of abdominal surgery were eligible for inclusion.

9 Whole body CT: Chest CT examinations including other anatomic regions such as abdomen (with or without the pelvis), head, neck and extremities.

10 Whole body CT: abdominopelvic CT examinations including other anatomic regions such as chest, head, neck and extremities.