• No results found

3 Methodological perspectives and considerations

3.1 Study 1

3.1.1 Preregistration/data collection

The metanalysis was preregistered in Prospero and can be located here:

https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=75 281. Study 1 contains a description of most deviations from the registration. The literature search involved key words for bilingualism (“bilingual”, “second language learner” and “dual language learner”) combined with terms related to EF (“inhibition”, “attention”, “working memory”, “switching” and “executive function”). Articles published within the timeframe of 1980 to December 2017 were searched for in the databases Eric, Medline, ProQuest dissertations, PsychInfo and Web of Science. Studies were included if they reported measures of EF skills for bilingual learners and a monolingual control group with participants aged 0-18 years. See Figure 1, article 1 for a flow diagram of the search and description of the further process in extracting the data.

It should be added that one of the anonymous reviewers questioned why the EF domains of monitoring and planning were not added as search terms in the literature search. The reviewer pointed out that this may have led to missing studies on these domains. To compensate for this, an additional search was performed using the search words “bilingual” and

“benefit” or “advantage” crossed with “executive function” and

“planning” or “monitoring” in the databases Web of Science, Eric and Medline. Once the first search was performed, it was evident that the search words “planning” and “monitoring” generated very different papers than the search words used in the submitted version of the paper.

In this second search, most of the identified articles were related to the medical field. To reduce the number of extracted papers, new exclusion terms were added to the search. Further specifications were added to refrain from extracting studies on cancer, sickness, alcohol, drugs, medicine, and medical health. The exclusion of participants with learning disabilities from the original search was retained. The new

search yielded 450 articles across the different databases. These were assessed according to the inclusion and exclusion criteria for Study 1, resulting in the identification of no new articles.

Twenty percent of the first round of the full search was randomly selected for data extraction by a second author. The inter-rater agreement of the authors was ɑ = 0.891. To ensure coding reliability, 20% of the dataset was double-coded. The inter-coder correlation (Pearson’s) for the main constructs was 0.993. In the first round of the revision, all the extracted articles were checked again to ensure that all EF measures were coded, followed by a control of coding. At this point, 100% of the data for the main constructs were double-checked to ensure coding reliability.

In all, three authors were involved in quality checking the dataset.

3.1.2 Analysis

The data were coded and analysed first in the comprehensive meta-analysis software (Borenstein, Rothstein, & Cohen, 2005) and then in Rubometa for R (Fisher & Tipton, 2015; Tipton & Pustejovsky, 2015).

Study 1 presents only the results from the analysis in R, where dependency in the data structure was controlled by using robust variance estimates (Tanner-Smith, Tipton, & Polanin, 2016). First, samples of all control groups with multiple bilingual comparison groups were coded in such a way that sample size for the monolingual group was divided across the number of bilingual comparison groups. On the request of an anonymous reviewer, this was changed to include all monolingual children in every comparison with the bilingual samples (RVE handles this kind of dependency). However, the overall results from the analysis in CMA and the two different datasets analysed by RVE are very similar to the published version of the article. The same can be said for the corrections for the small study effect, indicating that the results remain stable across different analytical approaches. There were, however, some differences across moderator analysis, where fewer of the examined

moderators were significantly related to overall means of EF using robust variance estimates.

With CMA, non-verbal IQ was a significant moderator of overall EF, and TASK was a significant moderator for the domain of switching (R=

0.38). The differences in moderator analysis could have two causes.

First, CMA handles dependency in the data structure more roughly than the RVE statistic. Basically, in CMA, multiple effect sizes must either be aggregated to the study level or one effect size must be selected over another for use in the analysis. This could have resulted in a lack of information in the CMA analysis to fully investigate the true relationship between the overall outcome of the analysis and the moderator. The other reason for these differences could be that the degrees of freedom in RVE are adjusted to handle small sample sizes (Tanner-Smith et al., 2016). In RVE, the degrees of freedom depend both on the number of studies and on the features of the covariate. Basically, with a very skewed distribution (most values within a close range yet the presence of a value that deviates from the others) or an imbalanced number of studies in the different categories (e.g., 5 in one category and 25 in another), degrees of freedom are reduced. As a result, the power of some moderator analyses in RVE is surprisingly low. There was an unbalanced number of studies in the different categories of the moderator task.

The other deviation from the results presented in article 1 is related to the control for the small study effect on the domain of switching. These results “jumped” slightly back and forth across the different sets being analysed. Switching skills were equal across language groups when controlling for the small study effect in the analysis of the first dataset in R. This dataset contained smaller sample sizes for some of the monolingual control groups. However, the analysis of the small study effect in the final dataset, as well as the analysis in CMA using trim and fill adjustments, detected a bilingual advantage in switching. These inconsistencies across analyses suggest that these results are less robust

than the rest of the results in this article. Besides these analyses, provided the three different procedures of data analyses the same outcomes.