Quality control of food frequency questionnaire data by follow-up interviews in a clinical trial

(1)

I

Quality control of food frequency questionnaire data by follow-up

interviews in a clinical trial

- The EBBA (Energy Balance and Breast Cancer Aspects) II pilot study

Master thesis in clinical nutrition Evy Szász Nergård

Department of Nutrition Faculty of Medicine

UNIVERSITY OF OSLO

June 2014

(2)

II

(3)

III

Quality control of food frequency questionnaire data by follow-up

interviews in a clinical trial

- The EBBA (Energy Balance and Breast Cancer Aspects) II pilot study

Master thesis in clinical nutrition Evy Szász Nergård

Department of Nutrition Faculty of Medicine

UNIVERSITY OF OSLO

2014

(4)

IV

Quality control of food frequency questionnaire data by follow-up interviews in a clinical trial - The EBBA (Energy Balance and Breast Cancer Aspects) II pilot study

Supervisors: Christine Louise Parr, Anette Hjartåker, Inger Thune Evy Szász Nergård

http://www.duo.uio.no/

Trykk: CopyCat Forskningsparken. Oslo

(5)

V

Acknowledgements

The work in this master thesis has been conducted from August 2013 to June 2014 at the Department of Nutrition, Faculty of Medicine, University of Oslo, and at Oslo University Hospital, Ullevål. All data were provided from the EBBA II pilot study.

I would first and foremost like to thank Christine Louise Parr for your time during this year, all long meetings and helpful advice. Special thanks for helping me disentangle my thoughts when I have gone lost in some problem. I would like to thank Anette Hjartåker for your enthusiasm in the work with this thesis and all comments. Thanks also to Inger Thune for including me in the EBBA II study group. Even though you are a very busy women, you engage fully in all the students you supervise.

Further I would like to thank the research group “Dietary research and nutritional epidemiology” for including the other master students and me heartily. I appreciate that we were most welcome to ask questions and was encouraged take part in discussions.

I have to thank my fellow master students for being such great people. A large special thank to Mathilde Enger for sitting faithfully by my side, for eating lunch with me every day and for all the joking in the lesesal. Without you I would have lost my senses Finally I would like to thank Håvard, for your patience and understanding, and for cooking dinner for me every day.

Oslo, June 2014 Evy Szász Nergård

(6)

VI

(7)

VII

Abstract

Objective: Self-administered food frequency questionnaires (FFQs) are commonly used to assess habitual dietary intake, but may be returned with some questions unanswered.

The present thesis investigates the completeness of FFQs from the pilot phase of the EBBA II study, a randomized clinical intervention study, and how a quality control procedure (follow-up interviews with the participants) to retrieve missing data, affects estimates of dietary intake. We assess to what extent missing answers indicate no consumption and smallest option for portion size and how two statistical methods for filling in (i.e. imputing) missing answers compare to the complete data obtained by interview.

Subjects: Newly diagnosed female breast cancer patients included in the EBBA II pilot study (n=59). Enrolment took place between June 2011 and May 2013.

Methods: After diagnosis, but before undergoing breast cancer surgery, the patients answered a 14-page FFQ. Subsequently all FFQs were reviewed by a nutritionist, and a follow-up interview was conducted resulting in complete FFQ data for all 59 patients.

The initially missing answers were imputed with the zero value, or the median answer.

Results: In total 8 % of all answers were obtained in the follow-up interview. Ten subjects (17%) returned a complete FFQ prior to follow-up interview. The remaining subjects (n=49) had between 1 (0.3%) and 297 (81%) missing answers. Imputation with

“never/seldom” would be correct for 87 % and wrong for 13 % of missing frequencies.

For portion size questions the second smallest option would be correct to impute more often than the smallest option; for 46 % vs. 34 % of all missing answers respectively.

The highest number of missing answers was found in the food groups bread (25%), fat used on bread (22%) and spreads (12%). Dietary intake was higher in the complete data than in the zero imputed data for all food groups, energy intake and nutrients.

Imputation with the median answer gave a better estimation of the complete data than zero imputation.

Conclusion: The follow-up interviews should be continued in the EBBA II main study.

Imputing with median gave the best estimation of the complete data.

(8)

VIII

(9)

IX

List of tables

Table 1 Are quality control measures and method for handling missing data reported in

publications? ... 12

Table 2 Overview of all questions in the EBBA II FFQ ... 20

Table 3 Illustration of how missing answers were counted ... 23

Table 4 Background characteristics of the subjects in the EBBA II pilot study ... 30

Table 5 Number and proportion of missing answers in the different question categories ... 31

Table 6 Distribution of missing answers for all questions ... 32

Table 7 Missing answers in food groups ... 35

Table 8 The twenty food items with the highest number of missing answers and the five items with the lowest number of missing answers in frequency questions ... 36

Table 9 The twenty food items with the highest proportion of missing answers in portion size questions ... 38

Table 10 Presentation of the food items with the lowest proportion of missing answers in portion size questions (all 35 items with zero missing answers) ... 39

Table 11 Response alternatives replied for missing answers in the follow-up interview ... 41

Table 12 All food items for which 100% of the missing answers were reported to be “never/seldom” in the follow-up interview ... 45

Table 13 Mean intake of food groups (g/day) in complete, zero imputed and median imputed data; median difference between complete and zero imputed (C-0) and median imputed (C-M) data for all subjects. Min and max difference in intake ... 48

Table 14 Mean intake of nutrients in complete, zero imputed and median imputed data; median difference between complete and zero imputed (C-0) and median imputed (C-M) data for all subjects. Min and max difference in intake. ... 51

Table 15 Classification of subjects according to dietary intake calculated from zero imputed data ... 53

Table 16 Classification of subjects according to dietary intake calculated from median imputed data ... 53

(12)

XII

List of figures

Figure 1 Example of a quantitative FFQ ... 4

Figure 2 Example of an FFQ with separate questions for all food items ... 5

Figure 3 Options for handling missing answers in FFQ ... 10

Figure 4 Time line for the EBBA II pilot study and main study ... 17

Figure 5 Illustration of the data collection and computation process ... 25

Figure 6 Distribution of missing answers in frequency questions ... 33

Figure 7 Distribution of missing answers in portion size questions ... 34

Figure 8 Scatterplot showing the relationship between number of subjects that have eaten a particular food item and the percent missing for the food item ... 37

The twenty food items with the highest proportion of missing answers in portion size questions are presented in table 9. ... 37

Figure 9 Distribution of the proportion of the missing answers for frequency questions reported to be “never/seldom” in the follow-up interview ... 42

Figure 10 Scatterplot showing the relationship between number of missing for missing for frequency questions and the percent of missing answers reported to be ”never/seldom” in the follow-up interview ... 43

Figure 11 Distribution of the proportion of the missing answers for portion size questions filled in with the smallest portion size option in the follow-up interview ... 44

Figure 12a: Scatterplot showing the relationship between number of subjects that have eaten a food item on the and % of the missing answers for the food item corresponding to the lowest option for frequency of consumption in the follow-up interview 12b: Scatterplot showing the relationship between number of missing answers for a food item and the % of the missing answers corresponding to the lowest option for frequency of in the follow-up interview ... 46

Figure 13 Illustration of how closely the imputation methods estimate the complete data for mean intake of energy and nutrients in term of percent ... 52

(13)

XIII

List of appendices

I. Literature review

II. Invitation letter and consent form III. General questionnaire

IV. The EBBA II food frequency questionnaire V. Instructions to the participants

VI. Correction form

VII. Map over SPSS files and syntax files and their uses

(14)

XIV

List of abbreviations

EBBA Energy Balance and Breast Cancer Aspects FFQ Food frequency questionnaire

DHQ Diet history questionnaire MAR Missing at random

NMAR Not missing at random CCA Complete case analysis MI Multiple imputation BMI Body mass index OC Oral contraceptives

DXA Dual X-ray absorptiometry HRT Hormone replacement therapy WHO World Health Organization

KBS KostBeregningsSystem / Food Calculation System RMSE Root mean square error

NHS II Nurses’ Health Study II OUH Oslo University Hospital AHS-2 Adventist Health Study-2

NOWAC Norwegian Women and Cancer study IWHS Iowa Women’s´ Health Study

(15)

1

1 Introduction

The food frequency questionnaire (FFQ) is a retrospective dietary assessment method developed to assess usual diet over a designated time period. FFQ is frequently used in several study types; mainly in national food surveys and epidemiological studies, but also in clinical trials (1). Researchers are often interested in a person’s habitual diet when investigating associations with disease or other health outcomes. FFQs have been shown to measure this in a better way than short-term records (1). In addition to the time perspective, the FFQ method is often preferred because it less costly and time- consuming to process compared to other dietary assessment methods, like 24-hour recall or weighed records. It is also often less demanding for the participants in the study. A comprehensive FFQ gives the ability to estimate a person’s intake of energy and of a wide range of foods and nutrients.

Self-administered FFQs on paper are most commonly used, especially in large studies, although internet-based FFQs also have been developed (2). FFQs can also be

administered as an interview. One disadvantage of self-administered FFQs is that questionnaires may be returned with some questions unanswered. This creates challenges in the estimation of dietary intake. How the missing answers are handled during intake calculation, is rarely described in published studies, see literature review below and (3), but a common practise seems to be to treat unanswered questions as if the food item was not eaten by the respondents. However, this may lead to

underestimation of intake if the foods were actually consumed. The option of excluding all subjects with missing values may lead to unacceptable loss of sample size and statistical power, or selection bias if missing answers are related to subject

characteristics. Thus, missing values in FFQs must often be handled in a different way.

Clinical studies using FFQs often have better opportunities for quality control of the data during the collection process than large epidemiological studies, due to direct contact with participants and closer follow-up. Complete data can be achieved by obtaining information directly from the participants. This type of quality control demands extra use of staff resources and time from the patients. It is of value to find out how quality control changes (improves) calculated dietary intake. Further, knowledge about what

(16)

2 the missing answers would have been if completed is of importance to methodological research on how to handle missing FFQ responses in other studies where the true values are unknown.

In the Energy Balance and Breast Cancer Aspects (EBBA) II pilot study (2011-2013), a randomized clinical intervention study with physical activity, including newly diagnosed breast cancer patients (stage I-II), a quality control procedure has been carried out on all FFQs handed in by the participants. A follow-up interview to obtain missing answers resulted in fully completed FFQs from all participants. During the interview missing answers are filled in and other ambiguities resolved. This has provided knowledge about the rate of missing answers, and which answer alternatives that were reported for the missing answers.

(17)

3

1.1 Aims and purpose

The purpose of this master thesis is to assess the completeness of a self-administered FFQ applied at baseline in a clinical trial with breast cancer patients (EBBA II), and to investigate the effect of quality control of FFQ data by adding information from follow- up interviews with the participants. The dietary intake calculated from the complete data (reference) is then compared to the intakes that have been calculated after missing values have been filled in (i.e. imputed) using two different methods.

I will investigate:

1) How well were the FFQs completed before the follow-up interview with regard to

 Missing answers for participants

 Missing answers in different food items and food groups

 Total of missing answers in the dataset (i.e. the proportion of answers that was filled in during the interview)

2) The proportion of missing answers corresponding to no consumption and smallest portion size in the follow-up interview

3) The effect of adding information from the follow-up interview on calculated intake of

 Food groups

 Energy

 Selected macro- and micronutrients

 Alcohol

4) How well two different methods for imputing missing answers for frequencies and portion sizes estimate the complete data

5) Whether questionnaire completeness is associated with participant characteristics

(18)

4

2 Background

2.1 Food frequency questionnaires (FFQs)

FFQ: The food list and design 2.1.1

An FFQ consists of a list of selected foods and beverages, accompanied by a frequency response section for respondents to report how often each food item is consumed. Often, a standard portion size is included in the question, called semi-quantitative questions, e.g. how often do you drink one glass (2 dl) of skimmed milk? There may also be a separate section to specify the portion size of the foods, called quantitative questions, as shown in figure 1. The FFQ used in the present study consists of a combination of semi- quantitative and quantitative questions.

Figure 1 Example of a quantitative FFQ (FFQ used in the Swedish Women’s Lifestyle and Health Cohort)

FFQs are mainly designed as grids (as in figure 1) or with separate questions for each food item (as in figure 2).

(19)

5 Figure 2 Example of an FFQ with separate questions for all food items (Nurses’ Health Study questionnaire, long, booklet (4))

Deciding how many foods to include in the food list is a ponderation between including enough foods to get a valid measurement of the intake, and not making the

questionnaire too long, which may lead to fatigue and loss of motivation in the

respondents and thus they might omit or overlook questions (1). The length of the FFQ and the food items included, also vary according to the aim of the study in which the FFQ is used. If only a specific food group is of interest, e.g. dairy products, the FFQ will

contain food items in this group. If the goal is to assess total dietary intake, the FFQ will attempt to cover as many foods as possible in the study population’s diet that are important contributors to energy and nutrient intake. For a food item to be informative it must have three general characteristics; 1) it must be used reasonably often by an appreciable number of individuals, 2) it must have a substantial content of the

nutrient(s) of interest, and 3) it must be discriminating, i.e. the use of the food must vary from person to person (5).

Design factors that can affect missing answers 2.1.2

Studies have investigated how different FFQ designs, number of questions, and closed and open-ended answering options affect the quality of the responses provided by the participants, including missing answers.

Kuskowska-Wolk et al. tested the performance of four versions of an FFQ, each with either increasing or decreasing frequencies, resulting in eight different questionnaires, with regard to response rate, completeness of responses, and food frequency responses.

(20)

6 They found a lower number of missing answers in frequency questions when portion size questions were included (6).

Subar et al. compared the 36-page Diet History Questionnaire (DHQ) with a 16-page FFQ used in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, with regards to missing answers in frequency and portion size questions. When they

developed the DHQ, they took into consideration findings from their previous study using cognitive interviewing to detect any problems the respondents had when filling in in the DHQ (7). In addition the DHQ uses the design in figure 2, but also with portion size questions, while the FFQ uses the grid design. There was a lower percent of missing for frequency answers in the FFQ, while there was a lower percent of missing for portion size questions in the DHQ. A higher proportion of participants evaluated the DHQ as

“very easy” to answer than the FFQ (8).

Caan et al. mailed two different FFQs to a sample of members of a medical program, with different approaches to assess the frequency of consumption for food items. In the first version the frequency response was open-ended (i.e. a number had to be entered by the respondent), while in the other the respondents was supposed to check one of seven frequency categories. A higher number of subjects returned a complete FFQ when check boxes were used. Also the percent of subjects filling in the FFQ entirely incorrectly, so that none of the information was usable, was tremendously lower when using check boxes (1.1 %) compared open-ended frequency responses (15.5 %) (9). Also Subar et al.

found that using options for frequency, compared to open-ended frequency responses, increased clarity and reduced errors (7).

Overall, many factors, like design, length, response categories, and wording may affect the quality of the data returned from the respondents.

2.2 Missing data

Missing data may potentially occur in most studies. Missing data is a term that embodies different forms of non-response described by survey methodologists. In surveys or epidemiological studies, a representative selection of persons from a population of interest is for example mailed a questionnaire. If a person does not return the

(21)

7 questionnaire, it is called unit non-response. A high number of unit non-responses will result in a low response-rate. If a participant in a prospective study answers a

questionnaire at baseline, but fails to deliver it for the next follow-up, it is a case of wave non-response. If a person has returned a questionnaire, there may still be missing

answers for one or more questions in the questionnaire, such cases of missing is called item non-response (10). In a dataset, a missing value means that a data value is not registered for a variable in an observation.

Missing answers in FFQs 2.2.1

In FFQs, item non-response implies that subjects leave questions about the frequency of consumption and/or portion size for food items unanswered. This causes computational problems in dietary intake calculations and statistical nuisance. Publications regarding the magnitude and handling of missing answers in FFQs mainly come from large studies in nutritional epidemiology (6,8,9,11-13). These studies show that missing answers are common, however with a great variation in the proportion of subjects returning a complete FFQ. In a subgroup in the Norwegian Women and Cancer study (NOWAC), 6 % of respondents returned a complete FFQ (11), while in the Nurses’ Health Study II (NHS II), 34% returned a complete FFQ (12). FFQ has been used to assess dietary intake in a number of clinical studies, however, information regarding magnitude of missing

answers and the handling of them are generally not reported in these publications. This will be further elaborated below (literature review, section 2.3.1).

Why are missing answers in FFQ a problem?

2.2.2

In dietary intake calculations, the first step is to calculate food intake in grams per day of each item in the FFQ. Every option for portion size corresponds to a predefined weight, and this weight is multiplied with the frequency. This amount is then divided by the number of days in the frequency option chosen (e.g. divided by seven if the food is

consumed once per week) to find the average intake per day. If frequency or portion size is missing one cannot calculate intake per day for this particular food. This will affect the estimated intake of food groups, foods and nutrients of which this food item is an

important contributor. When calculating the aggregated intake of a food group (e.g.

fruit), the intake can be underestimated if amount for a food in this food group is

(22)

8 missing (e.g. missing for apples). This may lead to misclassification of subjects and

possibly erroneous conclusions about the associations between dietary intake and health outcomes.

Mechanisms for missing answers 2.2.3

Different mechanisms for missing in all types of data collections (including FFQ data) have been defined in the statistical literature by Little and Rubin (14). These

mechanisms are 1) missing completely at random (MCAR), 2) missing at random (MAR) or 3) not missing at random (NMAR). For the MCAR assumption to be fulfilled, the persons with missing answers in their FFQs are a random subset of the whole study sample, and missing is unrelated to other variables (14,15). This is unlikely the case of missing in FFQs. Several studies have shown that the number of missing answers are related to age, race, and level of education (9,11,13). Some imputation methods rely on the assumption that the underlying mechanism for missing is MAR, which requires that the probability that a value is missing is independent of the underlying value that is missing. However, if answers for food items in an FFQ tend to be missing because the food was not eaten, this will be a violation of the MAR assumption, and the mechanism should be described as NMAR. Assumptions about these mechanisms should be made when analyzing data with missing values.

Methods for handling missing answers during data analysis 2.2.4

One option for handling missing values is to exclude the subjects with missing values, called complete case analysis (CCA). This is commonly done and may be appropriate when most cases are complete and the mechanism for missing is MCAR or MAR.

However, if few subjects return a complete FFQ this can reduce the sample size and statistical power tremendously, and CCA might also introduce selection bias if those excluded are not a random sample of the study population. There is also a large

variation in number of missing answers between subjects, and exclusion because of one or a few missing items in an FFQ seems strict if the foods contribute little to the dietary variables of interest, such as total energy or selected nutrients.

(23)

9 As an alternative to CCA, statistical methods for imputation were developed to provide

“complete” data sets even if there were initially missing answers. To impute means to insert a value in place of a missing value. The most commonly used imputation methods for FFQ data are so called “single imputation methods”, meaning that a value is imputed for each missing only once. This can be simple methods, like imputing the zero value (16,17), mean, median (18) or the mode answer (11), and more complex methods like k nearest neighbour (knn) (11). To impute the zero value means to impute the lowest option for frequency (“never” or “never/seldom”), and in this thesis also to impute the smallest portion size option. The underlying assumption is that respondents have omitted answers questions about food items they do not eat. In practice CCA and imputation are sometimes used in two steps, where respondents exceeding a limit for

“allowed” number of missing answers are excluded, and the remaining missing answers are imputed.

Also multiple imputation (MI) methods have been suggested for handling missing values in FFQ data (19,20). MI is a group of probabilistic methods that uses the observed data to estimate several plausible values for the missing data, ideally when data is missing at random. MI generates several data sets (e.g. 5 to 10) for the same dietary assessment data.Unlike single imputation, MI methods take the uncertainty of the imputed values into account.

Options for handling missing answers in FFQ are illustrated in figure 3.

(24)

10 Figure 3 Options for handling missing answers in FFQ

FFQ - food frequency questionnaire, CCA - complete case analysis, Knn - k nearest neighbour

Studies have shown that assuming that an omitted item is not consumed is true to some extent (9,12,13,21). Fraser et al. found that to impute zero would be correct for 60 % of the missing answers, but that there was large variations between foods (13). Hansson and Galanti found similar results; 54 % of omitted food items were actually never eaten, but the variation between foods ranged from 0 to 96 % (21).

2.3 Quality control of FFQ data in clinical studies

Clinical studies often provide good opportunities for quality control of FFQ data, during different stages of the data collection process, due to the in person contact and follow-up of the subjects. Instructions (written and in person) on how to fill in the FFQ and help when answering it may prevent missing answers and other errors to occur. A review of the returned FFQs may detect errors such as missing answers, multiple answers to the same question, erased food items that are replaced with another (e.g. erasing potato and writing sweet potato in its place), and implausible answers. Quality control is of

(25)

11 importance to ensure that collected data are as complete and accurate as possible and that studies provide valid results and conclusions.

Literature review 2.3.1

To get an overview over common practices when it comes to achieve good data quality in FFQs in clinical studies, and how possible missing answers were handled, I conducted a search in PubMed. A description of search terms and selection criteria can be seen in appendix I, as well as the full table of publications. Only titles and abstracts of articles could be searched, so if the search terms were included in other sections of the article, some relevant publications may have been missed. The review has nonetheless given me an overview over common practice.

Findings from the search are summarized in table 1. A total of 77 articles were identified and read. In the table the publications are categorized based on what quality control measures they described to have undertaken. The described steps taken to achieve good data quality in the FFQs were to give instructions to the participants, and in one study all persons who wanted to participate went through a so called pre-screening, where only those who filled out an FFQ and a 4-day food record satisfactory were included in the intervention. Twenty-two studies used an interviewer-based approach to standardize the data collection. Of the studies where the participants filled in the FFQ by them self, eleven performed a subsequent follow-up interview. In the exclusion criteria only

category, studies inform the reader that they have excluded subjects exceeding an upper limit for missing answers, however without specifying which imputation method was used for the remaining missing values. Others have excluded subjects because of incomplete dietary information, without specifying further what this means. Three publications provided a comprehensive description on the amount of missing detected in their FFQs, and how they handled these. Publications in the “unclear” group mostly have a statement saying that the FFQs were controlled after collection, but they don’t report what this control resulted in, or how they solved any problems found. The remaining articles did not contain any information, other than the fact that an FFQ was used to assess diet. A publication may be in more than one category.

(26)

12 Table 1 Are quality control measures and method for handling missing data reported in

publications? Results from a search in the literature on clinical studies using FFQ as dietary assessment method. 77 publications of interest were found, published in 1992-2013. Search performed 18^thMarch 2014

Reported in methods section in

article No. of publications

(n=77)

% Prior to filling in FFQ

Pre-screening 1 1.3

Instructions to subjects 6 7.8

During

Interview-based 22 28.6

After

Follow-up interview 11 15.6

Exclusion criteria only 6 7.8

Comprehensive description of handling of missing values

3 3.4

Unclear* 10 13.0

No description 24 31.2

*Articles in the “unclear” group mostly have a statement saying that the FFQs were controlled after collection, but they do not report what this control resulted in, or how they solved any problems found

In conclusion, only three out of 77 articles provided a comprehensive description of level of missing answers and the handling of these during data analysis. These three articles were all published before the year 2000, which leaves the impression that the methodological descriptions are not improving with time (22-24). One third of the studies used the interview-based approach to collect FFQ-data, perhaps indicating that a smaller study sample makes researchers use this approach more often than in the larger studies.

FFQ validity and missing answers 2.3.2

The FFQ method has been extensively validated (25). Validity is an expression of the degree to which a method gives a true and accurate measure of what it is supposed to measure. The present study does not concern the validity of FFQs per se, but when undertaking quality control steps, the goal is to achieve a measurement as valid as possible.

(27)

13 The FFQ method is criticized for measurement error, and validation studies find that e.g.

energy intake is underestimated (25,26). Missing answers may introduce additional measurement error and make the validity appear worse than it actually is. Also

validation studies have used zero imputation for missing answers (17,27), including two of the validation studies of the FFQ that the EBBA II FFQ has been modified from (28,29) (Monica H. Carlsen, personal communication, June 2014). When the validation studies have missing answers in the FFQs, it makes it difficult to anticipate how data from our complete data set would perform when it comes to validity. However, it might be fair to assume that complete data sets will not decrease validity.

A review found that validation studies using interview-based FFQs report somewhat higher correlation coefficients between intake from FFQ and reference method, than self-administered FFQs, at least for fat (0.55 vs 0.50), energy (0.55 vs 0.46) and vitamin A (0.47 vs 0.37) (1). One study found that the validity of an FFQ, using 4-day food records as the reference method, was improved by conducting a follow-up interview after the subjects had filled in the FFQ on their own (30).

Possible explanations for why interview and follow-up interview show a higher validity might be aspects like a more similar understanding of the questions and better control of the feasibility of the answers given. In addition it is reasonable to expect a lower number of missing answers when a trained interviewer is responsible for registering the answers. In follow-up interviews the FFQs may be reviewed at different levels. E.g. only ensuring that the FFQ is correctly filled in, with no missing answers or double answers, or a more thorough approach can be pursued, e.g. asking the participants about any answers that may seem improbable and using a checklist to see if any key food items of interest have been forgotten. Missing answers are hence only one factor contributing to the explanation of the higher validity.

2.4 Breast cancer

Breast cancer is the most common cancer among women worldwide. In 2012 there was an estimated 1.7 million new cases, which is 25 % of all cancers in women. The

incidence rates are highest in Western Europe and North America, where 43 % of new

(28)

14 cases in 2012 occurred. Developing countries generally have lower incidence rates for breast cancer, the lowest are found in East Asia (31). Also in Norway, breast cancer is the cancer diagnosis with the highest incidence rate among women and it is the most common cause of death among women aged 45-60 years (32). In 2012, 2956 new cases were registered (33). Since 1950 there has been a steady increase in the incidence rate for breast cancer. From about 2005 the incidence-rate started to level off, whereas in the last five-year period (2008-2012) there has been a slight decrease compared to the previous five-year period. Five-year relative survival has increased from 68 % in 1973- 77 to 89 % in 2008-12 (33).

Breast cancer is a heterogeneous disease with 20 different subtypes recognized by the World Health Organization Classification of Tumours of the Breast (34). The etiology of breast cancer is multifactorial. Approximately 20-30 % of breast cancers are familial breast cancers (i.e. family history of breast cancer) (35). For a small proportion of patients (5-10%), particularly two genes with high penetrance have been identified;

BRCA1 and BRCA2 (36). The remaining familial breast cancers are proposed to be caused by interactions between intermediate or low risk gene variants and

environmental and lifestyle factors (35).

Reproductive and hormonal factors that are known to increase the risk of developing breast cancer are nulliparity, having first child after the age of 30, early menarche and late menopause (37,38). The use of oral contraceptives (OC) and hormone replacement therapy (HRT) during menopause increase risk (37). Breastfeeding may have a

protective effect, for every 12 months of breastfeeding, the relative risk of breast cancer has been observed to be reduced by 4.3 % (39). Obesity is known to increase the risk of developing breast cancer (31,40), while physical activity has been found to have

convincing or probable protective effect (41). The best established dietary risk factor for breast cancer is alcohol (42,43). This association has also been found in the European Investigation into Cancer and Nutrition (EPIC) with ten countries (including Norway) (44), and when looking separately at the Norwegian EPIC cohort (45). Schütze et al.

estimated that 5 % of all breast cancer cases in Europe could be attributed to alcohol consumption (46). Several other dietary factors have also been investigated in relation to breast cancer risk, including fat, fiber, fruit and vegetables, and soy products.

(29)

15 However, for these factors there are too few studies to conclude or findings have been inconsistent (42,43).

Only patients with primary operable tumors are included in the EBBA II study, which means that all women undergo breast cancer surgery. Other treatments can include radiotherapy, chemotherapy, and hormone therapy, depending on characteristics of the cancer. The treatments give different side-effects and are of varying duration. The women are hence facing different treatment regiments (47). Having a cancer diagnosis and stand afore treatment is still expected to be perceived as a burden for all individuals.

(30)

16

3 Subjects and methods

This master thesis is based on data collected in the EBBA II pilot study. A description of the pilot study as well as the main study follows below.

3.1 The EBBA II main study – design & aim

The Energy Balance and Breast Cancer Aspects (EBBA) II study is a randomized clinical intervention study. The aim of the EBBA II study is to investigate the effect of a 1-year physical activity intervention, during the first year after surgery, on metabolic profile after 1 year, and breast cancer prognosis and recurrence during 10 years of follow-up.

Further, the study aims to investigate how diet prior to diagnosis, during the

intervention period, and in the years after the intervention influences metabolic profile and breast cancer recurrence and prognosis. During the intervention period the women both in the intervention group and in the control group receive standard treatment for breast cancer, including surgery and one or more of the following; radiation therapy, chemotherapy, and endocrine therapy. The timeline of the study and the data collected can be seen in figure 4. Dietary data is collected at several time points and with two different dietary assessment methods. A validated FFQ is administered at baseline before surgery, and again after the intervention period, while 7-day food diaries are completed before and at the end of the intervention period. The EBBA II main study is in the startup phase and aims to recruit 600 patients.

EBBA II is a national multicenter study, with main site Oslo University Hospital (OUH).

The study is a Norwegian Breast Cancer Group (NBCG) study (NBCG-14), and is part of the work of the research group Translational Research on Energetics and Cancer (TREC), rooted in the Division of Surgery and Cancer Medicine at OUH. The study is approved by the Regional Ethics Committee and the Data Inspectorate.

(31)

17 Figure 4 Time line for the EBBA II pilot study and main study

FFQ – Food frequency questionnaire, QOL – Quality of life, DEXA - Dual-energy X-ray absorptiometry

3.1 The EBBA II pilot study

This master thesis is based on data collected at baseline in the EBBA II pilot study. The baseline measurements were undertaken in the time period between the patients received the breast cancer diagnosis and accepted to participate in the study, and until the day of operation. The time from diagnosis to operation was 1-3 weeks. The main aim of the pilot study was to test the feasibility of the study, particularly considering

inclusion criteria and adherence to the physical activity intervention. The recruitment for the EBBA II pilot study was carried out between June 2011 and May 2013.

3.2 Subjects

The breast cancer patients in the pilot study received cancer treatment at the Cancer Center at OUH (n=50), St. Olavs Hospital in Trondheim (n=5), and at Drammen Hospital (n=4). The study was initiated in Oslo, and later on the two other centres were included, therefore a higher number of patients are from Oslo.

(32)

18 To be included the women had to be between 35-75 years of age and be newly

diagnosed with histologically verified breast cancer stage I-II (48). They needed to be able to participate in a 1-year physical activity intervention, and have good Norwegian speaking and writing skills. Exclusion criteria were verified heart disease (myocardial infarction, valve disease, reduced heart function), difficult-to-control diabetes or thyroid disorders, body mass index (BMI) < 18.5 kg/m²or > 40 kg/m², surgical treatment of obesity, and travel time to site of treatment exceeding 1.5 hours.

Invitation to participate in the study was given both orally and in writing by trained medical doctors and study nurses at each hospital. The invitation letter and consent form are enclosed in appendix II. The invited women were called by the same medical doctor or nurse the next day so that the women could ask any questions they might have and decide whether they wanted to participate. The women who chose to participate signed the consent form.

The enrolment of patients to the EBBA II pilot study continued until 60 subjects were eligible for randomization. Of the sixty women included in the study, 59 women completed the FFQ at baseline and were available for analyses in this master thesis.

3.3 Methods

Assessment of background variables and anthropometry 3.3.1

A general questionnaire posed questions about socio-economy, health and lifestyle factors giving information on age at menarche, parity, use of OC and HRT, menopause, years of education, smoking habits and leisure time activity (see appendix III).

Anthropometric measurements in the EBBA II pilot study were height, weight, hip -and waist circumference. Body composition was measured with Dual X-ray absorptiometry (DXA). Only weight and height were used in this thesis. Weight was measured in the fasted state to the closest 0.1 kg with light clothing on a scale (VETEK, TI-1200, Vaddo, Sweden). To correct for the clothing, 0.5 kg was subtracted. Height was measured to the closest 0.5 cm with a stadiometer (Hyssna, Sweden), standing upright without shoes

(33)

19 with heels touching the wall. BMI was calculated by dividing weight by the squared height (kg/m²).

The EBBA II FFQ 3.3.2

The FFQ at baseline assesses habitual diet over the past year, i.e. the diet before the breast cancer diagnosis. The FFQ has been developed at the Department of Nutrition, University of Oslo, and adapted to the EBBA II study. The FFQ has been validated in a number of studies, including energy intake using doubly labeled water (26); 7-days weighed food records and an energy expenditure monitor (28); fruit and vegetable intake using carotenoid and flavonoid biomarkers (29); and fish and n-3 supplement intake using very-long-chain n-3 fatty acids as biomarkers (49). The FFQ is enclosed in appendix IV.

The FFQ is 14 pages long asking about frequency of consumption and/or portion size for 256 food items. The FFQ is structured as smaller blocks or a grid covering the entire page with similar food items listed together. The term food item refers to foods, drinks and dietary supplements. There were also two open spaces where the subjects could list any additional dietary supplements they used or food items they eat that were not already asked about in the FFQ.

For the purpose of this master thesis the questions in the FFQ are divided into four categories; frequency questions, portion size questions, type question, and “other questions”. All questions in the FFQ are listed in table 2 in the same order as they are listed in the FFQ. The column “frequency questions” indicates how many food items that are asked about for each food group and the column “portion size” indicates whether there are portion size questions for the food items. For instance, there are questions about five different breads, without a separate portion size question, and 47 different dinner dishes, here with a separate portion size question for each dish.

There are 8-14 frequency options on for food items, and 5 for dietary supplements. The frequency options for food items range from “never/seldom” to a maximum of 4+ times per day and for supplements from “never/seldom” to 6-7 times per week. In the portion

(34)

20 size questions, a standard unit (e.g. slice, cup, dl) is given, and one can chose between 4- 6 options for the amount (e.g. 0.5 to 3+ dl for breakfast cereals).

All frequency questions should be answered by all subjects. The questions about

sweetening in tea and coffee, and servings of fruit and vegetables usually eaten per day are portion size questions that stand alone, and should also be answered by all subjects.

The remaining portion size questions should only be answered by those who report to eat the food item in question (i.e. frequency higher than “never/seldom”). The question about type of fat used for cooking (type question) and all questions in the category

“other questions” should be answered by all subjects.

Table 2 Overview of all questions in the EBBA II FFQ

Food group Frequency

questions Portion size

questions Type

question Other questions

Bread 5

Fat on bread 11 1

Spreads 30

Breakfast cereals 8 8

Jam/sugar on breakfast cereals 2 2

Milk 8

Yoghurt 5 5

Cold beverages 12 12

Alcoholic beverages 8 8

Hot beverages 10 10

Milk, sugar, sweetening in coffee/tea 6

Dinner dishes 47 47

Potatoes, vegetables, rice, pasta 25 25

Sauce, dressing 17 17

Type of fat used for cooking 1

Fruit 17 17

Additional question about amount of

fruit and vegetables eaten 2

Dessert, snacks, cake 27 27

Dietary supplements 17 17

Meals 4

Snacking 1

Recent change in dietary habits 1

Weight, height 2

Total 249 204 1 8

(35)

21 The patients were given instructions face-to-face by a study nurse on how to fill in the FFQ and other questionnaires according to protocol (the instructions can be read in appendix V). The nurse informed that the FFQ covers diet in the past year, and that all questions should be answered, also the ones about foods not consumed (except portion size). If the patients had any questions about the FFQ, they could call or email the EBBA staff. They were also told that they would be contacted after handing in the

questionnaires to obtain additional information if necessary (referred to as follow-up interview in this thesis).

Follow-up interview 3.3.3

The patients were asked to return the FFQs within one week, which for the majority meant that they brought it to their next hospital visit. If someone forgot to bring the FFQ they were reminded to bring it as soon as possible, and they did so most often within a day. Within one week after the FFQ was returned, the FFQ was reviewed and a

telephone follow-up interview was conducted if necessary, by a trained nutrition professional. In this interview any additional information that was needed from the participant was retrieved. This could be more detailed information about foods or supplements listed in the open spaces, clarification of multiple answers to the same question and completion of missing answers. All food items and supplements to be asked about in the interview were written down in a correction form, and all answers retrieved in the interview were registered in the form. In addition the answers in the FFQs were subsequently corrected and completed according to the subjects’ answers. In this way it was clear from the correction form what was changed by the interviewer. The correction form can be seen in appendix VI.

The responses were not systematically checked for improbable responses, like unusually high consumption of a food, discrepancy between the sum of bread slices reported to be eaten and the sum of bread slices eaten with the different spreads, or total number of dinner dishes eaten per month. However, if any obvious improbable responses were detected, the patients were asked if the reported answer was deliberate.

(36)

22

Definition of missing answers 3.3.4

For the EBBA II FFQ, the definition of missing values was based on seven situations with missing answers in different question categories. Situations 1-4 are illustrated in table 3.

In situation 1, information from the follow-up interview was used to determine the number of missing.

1. Both frequency of consumption and portion size are unanswered for a food item a. If the subject in the follow-up interview claims that the correct

consumption frequency is “never/seldom”, one missing answer is counted (one for frequency questions, none for portion size questions)

b. If the subject in the follow-up interview claims to eat the food item, i.e.

more frequently than “never/seldom”, and thus should have answered both frequency and portion size, two missing answers are counted (one for frequency questions and one for portion size questions).

2. Frequency of consumption is higher than ”never/seldom”, but portion size is not answered

a. Counted as one missing for portion size questions

3. The portion size question is answered, but the frequency of consumption is not answered

a. Counted as one missing for frequency questions

4. A stand-alone question for frequency or portion size is not answered a. Counted as one missing for frequency questions

b. Counted as one missing for portion size questions

5. The type question is not answered

a. Counted as one missing for the type question

6. One of the “other questions” is not answered

a. Counted as one missing for the “other questions”

(37)

23 Table 3 Illustration of how missing answers were counted for frequency and portion size questions in the EBBA II pilot study

Situation FFQ response Interview response Missing answers counted Frequency Portion Frequency Frequency Portion

1a missing missing Never/seldom 1 0

1b missing missing > Never/seldom 1 1

2a answered missing 0 1

3a missing answered 1 0

4a missing 1

4b missing 1

Imputation methods 3.3.5

In this thesis, two different methods for imputing missing answers were compared to the complete data set (reference) after the follow-up interview. The methods are referred to as zero imputation and median imputation

Imputation with zero

Imputation with the zero value here means to impute the lowest response alternative (“never/seldom”) for food items with missing frequency answer. The underlying assumption for this imputation method is that these foods have not been consumed.

When imputing “never/seldom” the intake of this food will count as 0 g/day and hence it will not contribute to calculated intake of any food group, energy or nutrients. In this thesis zero imputation also includes imputation of the smallest portion size option for missing portions.

Imputation with the median answer

The median values used for imputation were calculated from the subjects who had initially provided an answer to the frequency or portion size question for a food item.

Thus, the sample size used to calculate the median varied; from 38 to 59 for frequency, 57-58 for the eight stand-alone portion size questions and 1-54 for the remaining

(38)

24 portion sizes. The median values were rounded to the nearest response option before imputation.

Processing of FFQs and correction forms 3.3.6

The FFQs were first optically read by a scanner into the software Cardiff TeleForm version 10.5.1 (DataScan Oslo, Norway). In this program the reading performed by the scanner was verified by the operator and eventually translated into a SPSS file (original SPSS file). In the SPSS file each participant is listed in one row and each question in the FFQ is in a column (462 variables in total). Each response alternative is designated a letter, e.g. the lowest frequency option is called A, the next lowest frequency option is called B and so on. The smallest portion size option is also called A, the next smallest portion size is called B and so on. From the original file, a Notepad file is made that can be imported into the dietary intake calculation program KBS (described in section 3.3.7).

An illustration of the data collection and computation process can be seen in figure 5.

(39)

25 Figure 5 Illustration of the data collection and computation process in the EBBA II pilot study

To be able to calculate dietary intakes from the FFQs as they were before the follow-up interview (with missing answers), the original SPSS file needed to be converted into a file looking like it would have if the FFQs had been scanned with missing answers present. The conversion process was done by using the information from the correction forms to write a syntax file (syntax1) in SPSS to restore missing answers in the original SPSS file. One command was made for each person, containing one code for each missing answer (resulted in the equivalent of 12 A4 pages with commands). The new file with

(40)

26 missing answers is named the restored file. A map over files, syntaxes and which file was used to calculate which results can be seen in appendix VII.

The restored file was used to count missing answers for subjects and food items. To be able to run analysis on missing data, the letters (string variables) in the file was

converted into numbers (numeric variables) by syntax2. All subjects did not have to answer all portion size questions; permitted missing values were coded as 99 with syntax3 and in syntax4 to prevent permitted missing values from being counted as missing.

The original SPSS file was used to determine how many subjects who reported to eat each food item.

Syntax1 was used to determine which answer was given for all missing answers in the follow-up interview. These results were subsequently used to calculate the proportion of missing answers that corresponded to the response “never/seldom” and the smallest portion size option for subjects and food items.

The complete file was used for calculation of the dietary intake for complete data. The restored file was the outset for calculating dietary intake for the zero imputed data and for calculating the median answer used for median imputation. The median answer was imputed into a third file, the median imputed file. Syntax5 imputes median answer, syntax6 recodes the 99s for permitted missing values back into missing values, and syntax7 converts the numbers back into letters. The median imputed file were then ready for calculation of dietary intake.

During the work with the results all numbers were double-checked to see if any of the syntaxes contained errors.

Dietary intake calculations 3.3.7

The software KBS 7.1 (KostBeregningsSystem/Food Calculation System) developed at the Department of Nutrition, University of Oslo, was used for calculation of food and nutrient intake. The data base in KBS (AE-10), is based on the Norwegian food

(41)

27 composition table from 2006 (50) and expanded with estimated, calculated and

borrowed values (e.g. from food manufacturers and other food composition tables).

A syntax written by an IT professional for this particular FFQ was used to convert the answers from the FFQs in the SPSS file (with letters) into food item codes and food weights (g/d). The food item codes and weights appear in a Notepad file which can be imported into KBS.

The syntax codes any missing answers in the FFQ into the lowest frequency category;

“never/seldom” and the smallest portion size category. Hence, the imputation of zero did not require any active imputing or coding by me. Dietary intake calculations were performed with dietary supplements included. Food items or dietary supplements written in the open spaces in the FFQ were coded manually.

The food groups presented in table 12 are output by default from KBS, and are not the same as the grouping in the FFQ, which is used to present missing in food groups in table 12.

Statistical analyses 3.3.8

Selected background characteristics of the study participants are presented as mean and standard deviation (SD), median and 25^th and 75^th percentiles (P25, P75) or number (n) and proportion (%) of subjects. Frequency tables and histograms were used to present the distribution of missing answers. Since the distributions were skewed, both mean (SD) and median (P25, P75) are presented. Extreme values were defined as observations more than 3 interquartile ranges from P75 (IBM SPSS statistics v. 20)

The limits for being defined as an extreme value (hereafter referred to as outlier) were 75 missing answers for all questions, 65 missing answers for frequency questions, and 28 missing answers for portion size questions. These outliers were included in most analysis as one aim of the thesis was to assess the effect of the quality control on the entire study sample (n=59). Some analyses were also conducted without the outliers to be more comparable to other studies or to see if the results changed.

(42)

28 The proportion of missing answers in the FFQ (%) was calculated by dividing the

number of missing answers by the maximum possible, which was not the same for all individuals.

Energy and nutrients were found to be normally distributed, but not all food groups, such as cakes, milk, cream, ice cream, butter, margarine, and beverages, which were skewed to the right. Daily intake is presented as means. Assumption of normality was checked by visual inspection of Q-Q plots and histograms. Differences between complete data and the imputed data were not normally distributed and therefore the median difference is presented. Non-parametric tests to compare imputed intakes with

complete intakes were not performed because of a high number of ties. The number of subjects with a difference in intake between complete data and imputed data is

presented, along with the minimum and maximum differences for these subjects.

Spearman rank correlation coefficients (rs) with 95 % confidence intervals (CI) and p- values were calculated in all correlation analyses. All these values were calculated in Prism (GraphPad Prism version 6.0e for Mac, GraphPad Software, La Jolla, California, USA)

The root mean square error (RMSE) was calculated as a summary measure to determine which of the two imputation methods that best estimated the complete data. RMSE is calculated by taking the square root of the average of the squared residuals (complete intake, ŷi – imputed intake, yi) divided by the number of subjects (equation 1 (51)).

Subjects were divided into tertiles of intake of energy intake, fat, saturated fat, fiber, and alcohol based on the complete data, zero imputed data and median imputed data. The classification of subjects based on the complete data were considered correct, and crosstabs were used to compute the proportion of subjects correctly classified into the same tertile or misclassified into ± 1 and ± 2 tertiles after imputation.

(Equation 1)

(43)

29 Independent samples T-test was used to test for differences in age, BMI and years of education between the group of subjects returning a complete FFQ and the group of outliers for missing answers.

P-values <0.05 was considered statistically significant.

All figures in the results chapter are made with Prism. IBM SPSS Statistics for Windows v. 20 was used for all other statistical analyses.

My contributions in the EBBA II pilot study 3.3.9

When I became part of the EBBA II study group all FFQs from baseline were already collected and the Oslo FFQs were also scanned. My fellow master student in the EBBA II pilot study, Mathilde Enger, and I were supposed to participate in the collection and quality control (follow-up interview) of the FFQs answered by the patients participating in the EBBA II main study, and was trained for this task. The start of the main study was however delayed, and no subjects were included during the time I have been working on my master thesis.

The contributions made by Enger and I consisted of

- Scanning and proofreading baseline FFQs from Trondheim and Drammen, and FFQs from 1^st and 2^nd year follow-up that were handed in during fall 2013 - Coding of food items and dietary supplements written in the open spaces in

the FFQ

- Preparing data files for import into KBS

- Collection of FFQ and follow-up interview with patients after finishing the thesis work

During my year as a master student I was part of the research group “Dietary research and nutritional epidemiology” led by Anette Hjartåker and Lene Frost Andersen, and the EBBA II study group led by Inger Thune. Being part of these two groups has given me valuable insight on how both epidemiological and clinical research is conducted.

(44)

30

4 Results

The EBBA II pilot study included 60 women, and 59 returned the FFQ available for analysis in this thesis. A description of the breast cancer patients in the study is

presented in table 4. The age of the participants ranged from 39 to 70 years with a mean of 55.7 years. According to the WHO classification of BMI (52) 59.3 % of the participants were normal weight (BMI 18.5-25), 25.4 % were overweight (BMI 25-30) and 15.3 % were obese class I (BMI 30-35), results not shown. Years of education ranged from 8 to 24 with a mean of 15.8 years. To exemplify, a person with16 years of education in Norway may have studied four years in college or university. About one third of the subjects reported doing mainly sedentary activities in their leisure time in the year prior to diagnosis, indicating little regular physical activity or exercise.

Table 4 Background characteristics of the subjects in the EBBA II pilot study (n=59) n^* Mean (SD) Median (P25, P75) n (%) Subject characteristics

Age at randomization, years 59 55.7 (7.8)

Height, cm 59 167.4 (5.8)

Weight, kg 59 70.6 (11.6)

BMI, kg/m² 59 25.2 (3.5)

Age at menarche, years 58 13.1 (1.4)

Ever given birth 59 41 (69.5)

Ever used OC 59 49 (83.1)

Ever used HRT 59 21 (35.6)

Postmenopausal 59 41 (69.5)

Age at menopause, years 39 48.9 (6.0) Years of education 57 15.8 (3.6)

Current smokers 59 14 (24)

Leisure time activity 58

- sedentary 17 (28.8)

- moderate intensity > 4h/week 32 (54.2)

- high intensity > 4h/week 7 (11.9)

- vigorous activity regularly 2 (3.4)

Breast cancer stage

Stage I 55 41 (74.5)

Stage II 55 14 (25.5)

*Number of women values were available for. BMI – body mass index, OC - oral contraceptives, HRT - hormone replacement therapy

(45)

31

4.1 Completeness of the FFQ

This section first presents descriptive data on mean and median number and proportion of missing answers for the subjects for the different question types in the EBBA II FFQ.

Then results for missing answers for food groups and single food items follows, with separate results for frequency questions and portion size questions.

Mean and median number and percent of missing answers for subjects for all questions, frequency questions and portion size questions are shown in table 5. The data on

number of missing is very skewed towards right. The skewness, including the statistical outliers with a substantially higher number of missing than the majority of the subjects, results in large differences between means and medians.

The proportion of missing answers for frequency questions (8.0 %) and portion size questions (7.6 %) were similar.

Table 5 Number and proportion of missing answers in the different question categories in the FFQ in the EBBA II pilot study (n=59)

* Number of required answers for portion size questions is individual for each subject, and ranged from 32-137. For all questions the number of questions that should have been answered thus ranged from 289- 394.

Mean (SD) number of missing answers for subjects, without the outliers was for all question types 11.3 (10.3), frequency questions 7.6 (8.8), and portion size questions 3.1 (4.1). The sample size without outliers was 54 for all questions and frequency questions and 56 for portion size questions.

No. of questions (range*)

Mean no. of missing answers (SD)

Median no.

of missing answers (P25, P75)

Mean % missing answers (SD)

Median % missing answers (P25, P75) All

questions

462 (289-394) 27.0 (55.2) 11 (3, 21) 8.0 (16.5) 3.5 (0.9, 5.9)

Frequency questions

249 20.6 (45.1) 6 (1, 17) 8.3 (18.1) 2.4 (0.4, 6.8)

Portion size questions

204 (32-137) 6.3 (15.4) 2 (0, 7) 7.2 (16.3) 2.1 (0.0, 7.7)

(46)

32 There was only one missing answer for the type question, and two missing answers for other “other questions”; both were for “How often do you eat an evening meal?” All participants reported their weight, height, if they had changed their diet recently, snacking habits and how often they eat breakfast, lunch and dinner.

Missing answers for subjects 4.1.1

All questions

Among the 59 participants, 17% (n=10) returned a complete FFQ with no missing answers, 25 % (n=19) had from 1 to 10 missing answers, and 42% (n=25) had from 11 to 40 missing answers (table 6). Thus, the majority of the subjects (75 %) returned an FFQ with 1-40 missing answers, while 8.5 % subjects (n=5) had 156, 158, 176, 196, and 297 missing answers and were defined as statistical outliers. There was a gap in the distribution of missing answers with no subjects in the range 41 to 155. The highest number of missing corresponds to leaving 81 % of the questions in the FFQ unanswered (297 out of the 366 questions this subject should have answered).

Table 6 Distribution of missing answers for all questions^* in the EBBA II FFQ (n=59) No. of

missing

No. of subjects

Proportion of subjects

0 10 16.9

1-10 19 32.3

11-20 14 23.7

21-40 11 18.6

156-297 5 8.5

Total 59 100

*The question categories included in “all questions” are shown in table 2.

Frequency questions

Among the 59 subjects, 80 % (n=47) returned the FFQ with at least one missing answer to a frequency question (figure 6). The majority of the subjects (71.2 %) returned an FFQ with 1-38 missing frequency answers, while 8.5 % (n=5) of the subjects had 109 - 201 missing answers and were defined as outliers. These five subjects are the same as the outliers for “all questions”. The number of missing frequency answers ranged from 0 to 201, out of the total 249 (0-81 %). There was a gap in the distribution of missing

Quality control of food frequency questionnaire data by follow-up interviews in a clinical trial

Quality control of food frequency questionnaire data by follow-up

interviews in a clinical trial

- The EBBA (Energy Balance and Breast Cancer Aspects) II pilot study

Master thesis in clinical nutrition Evy Szász Nergård

Department of Nutrition Faculty of Medicine

UNIVERSITY OF OSLO

Quality control of food frequency questionnaire data by follow-up

interviews in a clinical trial

- The EBBA (Energy Balance and Breast Cancer Aspects) II pilot study

Master thesis in clinical nutrition Evy Szász Nergård

Department of Nutrition Faculty of Medicine

UNIVERSITY OF OSLO

2014

Acknowledgements

Abstract

Table of contents

List of tables

List of figures

List of appendices

List of abbreviations

1 Introduction

1.1 Aims and purpose

2 Background

2.1 Food frequency questionnaires (FFQs)

FFQ: The food list and design 2.1.1

Design factors that can affect missing answers 2.1.2

2.2 Missing data

Missing answers in FFQs 2.2.1

Why are missing answers in FFQ a problem?

2.2.2

Mechanisms for missing answers 2.2.3

Methods for handling missing answers during data analysis 2.2.4

2.3 Quality control of FFQ data in clinical studies

Literature review 2.3.1

FFQ validity and missing answers 2.3.2

2.4 Breast cancer

3 Subjects and methods

3.1 The EBBA II main study – design & aim

3.1 The EBBA II pilot study

3.2 Subjects

3.3 Methods

Assessment of background variables and anthropometry 3.3.1

The EBBA II FFQ 3.3.2

Follow-up interview 3.3.3

Definition of missing answers 3.3.4

Imputation methods 3.3.5

Processing of FFQs and correction forms 3.3.6

Dietary intake calculations 3.3.7

Statistical analyses 3.3.8

My contributions in the EBBA II pilot study 3.3.9

4 Results

4.1 Completeness of the FFQ

Missing answers for subjects 4.1.1