Assessing the Validity of Evaluation Research by Means of Meta-Analysis

Statistical inferential validity refers to the numerical accuracy, reliability and representativeness of the results of a study or set of studies. 9 The robustness of the average result of a set of studies with respect to how it is estimated. 3 The stability of the results of a set of studies across study contexts (details of the context must be specified on a case-by-case basis).

Feyerabend says that "what counts as evidence, or as an important result, or as "sound scientific procedure" depends on attitudes and judgments that change with time, profession and sometimes even from one research group to the next. He further claims that there is no one "scientific method", but there is a great deal of opportunism; anything goes—anything, that is, that tends to advance knowledge as understood by a particular researcher or research tradition." The widespread belief that knowledge grows and refines as research progresses is dismissed by Feyerabend as unfounded .The development of knowledge is not a well-planned and smooth-running process, it is also wasteful and full of errors, it also needs many ideas and procedures to keep it going.

The multiplicity of concepts of validity

The validity framework developed by Cook and Campbell is certainly the most elaborate currently available in social research. By internal validity, Cook and Campbell refer to the possibility of inferring a causal relationship between two or more variables. The fourth and final type of validity discussed by Cook and Campbell is external validity.

Cook and Campbell's validity framework is very comprehensive and includes all aspects of validity discussed by other authors (Black and Champion 1976, Hellevik 1977, Carmines and Zeller 1979).

The concept of objective knowledge

The main advantage is that the system for judging the validity of evaluation research itself becomes objective, by (1) having a clearly defined empirical reference (ie the set of documented studies dealing with a topic), (2) on explicit established criteria rely (2) on the use of a list of clearly defined validity criteria and a system for scoring studies according to these criteria), and (3) on becoming testable, in the sense that agreement between researchers in the use of the validity criteria can be determined experimentally. This argument for basing the assessment of the validity of evaluation research on formally stated validity criteria and a scoring system for those criteria is elaborated in the next chapter. It is not always possible to argue that confirmation bias may have influenced the interpretation of research findings in the manner illustrated above.

In informal research syntheses, belief in the law of small numbers involves giving equal weight to all studies, regardless of the sample size on which they are based.

Overview

Statistical conclusion validity

Therefore, the larger the sample, the greater the statistical validity of the results of a study or set of studies. Examples of inaccurate measurements attributable to measuring instruments can also be found in road safety evaluation studies, as shown e.g. in a discussion of the accuracy of velocity measurements in the Vaa (1995) report. To give the reader an impression of the variety of definitions that exist, Table 2 lists some of the dependent variables commonly found in road safety evaluation studies.

The robustness of the average result of a set of studies (S9) refers to how sensitive the average result based on a sample of studies is to the technique used to estimate it.

Table 1: Operational criteria of validity in evaluation studies

Theoretical validity

This means that, ideally, a meta-analysis should apply all techniques and test the sensitivity of the mean result with respect to the choice of technique. If the estimated mean differs depending on the technique used to estimate it, then the choice of technique should be discussed in more detail and justified in terms of the properties of the data set. 3 Propose hypotheses that describe relationships between variables, including:. a) which variables are related; (b) the direction of the relationship, (c) the strength of the relationship;

The first criterion, T1, refers to how well the theoretical framework for a study is developed in terms of the four points listed above.

Internal validity

An example would be: "The greater the reduction in driving speed, the greater the reduction in the number and severity of accidents". If changes in the expected direction of the dependent variable are found only in the target group of the intervention, this supports causal inference. The observed changes in the number of accidents in this study are shown in Table 3.

It can be seen that the greatest percentage change in the number of accidents occurred in the target group of the intervention: student drivers riding motorcycles with an engine displacement of more than 125 ccm.

External validity

This pattern in the study's results is consistent with what would be expected if the policy effort only affected the target group, or at least had a greater effect within the target group than for other groups. Context elements for road safety evaluation studies may include the basic rules of the road in a country (such as driving on the left versus driving on the right), the level of motorization (number of cars per capita), and the accident reporting rules (the precise definition of reportable accidents). The exact elements of context considered relevant in the assessment of external validity will need to be specified on a case-by-case basis.

The relationship between types of validity

However, research problem 3 as formulated in the paper (Can the evidence from evaluation studies be trusted?) focuses on the validity of evaluation studies that have been conducted in relation to handrails and airbags. S8, the shape of the distribution of results, which is informally discussed on the basis of funnel plot diagrams (in terms of skewness and possible outlier biases); Inspection of the results obtained in evaluation studies confirms that this is indeed the case.

It builds on previous criticisms that have been made of the validity of studies that have assessed the effects of road lighting. By predictive validity is meant the accuracy of predictions of the effects of future uses of a measure based on the results of currently available evaluation studies. The actual level of predictive validity depends on the strengths of the effects of the various factors influencing it.

In other words, the evaluation studies are robust with respect to the definition of the dependent variable used in these studies (criterion S6 in Table 1). It is found that the results of the studies are very robust in terms of study design. 7 The sensitivity of the standard error of the mean to the presence of correlated results in a sample of results.

Although none of the attached papers include this variable, it is possible in a meta-analysis to assess the validity of the studies in relation to the choice of analysis technique. Turning to internal validity, most of the criteria listed in Table 1 were used to assess the validity of the road safety assessment studies in the attached documents.

Figure 4: Dose-response pattern in effects of median guardrails

Conclusions

Provided that widely accepted formal criteria of validity can be established, meta-analysis is the best approach to assess the extent to which research meets these criteria. To what extent is it possible to assess the validity of evaluation research by conducting meta-analysis of evaluation research studies. The other approach is to code studies according to formal validity criteria and use meta-analysis to evaluate studies according to these criteria.

Informal research syntheses were discussed in Chapter 7, and formal validity criteria designed for use in meta-analysis were presented in Chapter 8. A total of twenty validity criteria intended to assess the validity of evaluation research by means of a meta-proposed analysis. In principle, all twenty validity criteria can be used in a meta-analysis to formally assess the validity of a study.

This study also revealed some problems and limitations in using meta-analysis to assess the validity of evaluative research. The main conclusion of the study, broadly formulated, is that it is to some extent possible to assess the validity of evaluation research by means of meta-analysis. But it is probably too optimistic to believe that using meta-analysis to assess the validity of evaluation research will resolve all the controversies surrounding such research.

It is much more difficult to use meta-analysis to assess the validity of theoretical models. It is not obvious how to even assess the validity of these models, let alone how to use meta-analysis to do so.

Future prospects and research needs

Some aspects of study validity can be formally assessed through meta-analysis, others are less amenable to formal assessment. Somehow, most of us place more confidence in a paper when the authors are clearly aware of the limitations of their research and point them out, than in an otherwise similar paper presented in a less condescending way. There is a need for the development of multivariate techniques of meta-analysis adapted to different weighting schemes.

The analyzes in the attached papers are carried out by stratifying the data set according to the variables of interest. Multivariate analysis techniques are clearly superior to stratification techniques, but no description of such techniques developed for the Logodds method of meta-analysis could be found in the literature. It is desirable to develop a general measure of validity that summarizes all aspects of the concept in the form of an overall rating.

For many problems, there is a choice of technique for meta-analysis, that is, several techniques can be used, and it is not always obvious which one is the best. There is a need to test the sensitivity of the results of meta-analyses with respect to the choice of technique. It may discredit meta-analysis if the results of such analyzes turn out to be highly sensitive to the choice of technique and if this choice is essentially arbitrary.

Causal Inferences in Non-Experimental Research, New York, NY, W. The Future of Controlled Randomized Experiments: A Briefing. The safety value of crash barriers and crash pads: a meta-analysis of evidence from evaluation studies. Evaluating the statistical conclusion validity of weighted average results in meta-analysis by analyzing funnel graphs.

Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: a critique. Methods used in studies of attempts to control drink driving: a meta-analysis of the literature from 1960 to 1991.