• No results found

On the comparative method

In document taming of inequality retirement (sider 171-188)

Method and design

4.3 On the comparative method

There is no agreement in the literature about the proper definition and status of a spedfic comparative method. Y ou might very well argue that all empirical research is comparative in nature, as it attempts to leam from observed differences between sodal units.162 However, to equate the comparative method with the sodal sdentific method in general is not very helpful, as it tends to make the term "comparative" redundant.

As already suggested, the comparative method is here taken to encom-pass research strategies that exploit variation between macro units (read:

country cases) in attempts to support (or reject) causal claims about macro-level variables. I recognize that in particular the second part of the statement is controversial. Over the last three decades there has been a

162 The term "comparative" could thus be taken to refer to the use of "natural experi-ments" as opposed to the "controlled experiment" associated with the natural sci-ences.

continuing debate about the role of comparative analyses and the possible status of a particular comparative method in the social sciences. To identify the purpose of the comparative method as causal inference, signa Is a lean-ing towards what is sometimes called the "variable-oriented" as opposed to the "case-oriented" approach to comparative analysis (Ragin, 1987),

This methodological debate can be associated with the classical Pop-perian distinetion between nomothetic and ideographic approaches to the study of human society (Popper, 1957). The nomothetic philosophy is characterized by a belief that the social world is (at least partly) governed by generallaws that can be discovered through the application of scien-tific (statistical) methods, while the ideographic philosophy rejects the existence of any law-like relationships in human society (or at least the search for them) , holding that explanations for (or interpretations of) social phenomena must always be specific to the unit and the historical context.

One main position in the debate on comparative methods, the variable-oriented approach, is situated well within the nomothetic camp. It insists that the purpose of comparative analysis is causal inference, and that the logicof comparative analysis is similar to the logic of all causal analysis, where empirical regularities - the pattem of covariation between inde-pendent and deinde-pendent variables - are taken to be the only valid source of inference (for some of the most important contributions in this tradition see przeworski and Teune, 1970; SmeIser, 1976; Lieberson, 1992; 1994;

King, Keohane and Verba, 1994, Janoski and Hicks, 1994; Goldthorpe, 1997). According to this view, it is the special nature of the units of analy-sis that makes comparative analyanaly-sis a distinct (and in fact a rather prob-lematic) branch of social scientific inquiry. Given the natural limitation in the number of country cases available, comparative analyses can at best be imperfect approximations to the ideal of multivariate statistical analysis:

Despite its limitations in terms of numbers of cases, the 10gic of systematie com-parative illustration is identical to the [statistical] methods just reviewed in that it attempts to develop explanations by the systematie manipulation of parameters and operative variables. (SmeIser, 1976:158).

The other main position that can be identified in current debates among social scientists - the case-oriented approach - holds that the compara-tive method is (or at least can be) distinct in both its purpose and logic from variable-oriented statistical analysis (see for instance Skocpol, 1984;

Ragin, 1987; Rueschemeyer, 1991; Rueschemeyer and Stephens, 1997).

Scholars within this tradition tend to take a middle ground between a

nomothetic and ideographic philosophy of social science.163 On the one hand they deny that the only purpose of comparative analysis is to sup-port causal claims, and in particular they tend to be skeptical about the potential for generalization. On the other hand they do not completely reject the ambition to provide (conditionaD explanations for macro-social phen omena. 164 Proponents of this position insist on the advantages that can flow from a holistic - non-statistical - treatment of each country case or macro-social event. They claim that macro-social phenomena should preferably be studied in their dynamic, historical context, and that the information on each individual case cannot and should not be reduced to scores on a set of pre-defined variables, as required for the application of statistical techniques.

In the following discussion of the comparative method, Ishall rely heavily on the first of these traditions and in particular on the interpreta-tion laid out by King, Keohane and Verba (1994), KKV for short.

The logic of comparative analysis

KKV take as the point of departure the model of causality and causal analysis developed by Holland (1986) and Holland and Rubin (1983) (the Holland-Rubin model).

According to the Holland-Rubin model, the purpose of causal analysis is to estimate the effect of a given cause (treatinent) rather than to try and trace the causes for a given effect, and a causal claim is always a claim about a clearly specified counterfactual experiment (Holland, 1986:959).

Following Holland (1986), KKV start their exposition by defining a causal effect for a particular unit at a particular point in time as the differ-ence between the (expected) outcome that obtains given that the unit has been exposed to a specific treatment (Xl) and the (expected) outcome that would have obtained if the unit had instead - under otherwise iden-tical conditions - be en exposed to a specific alternative treatment (X~.

Note that this definition of a causal effect avoids any reference to regularity

163 Many historians take the more radical ideographic position, and the historical sodolo-gists who represent the case-oriented approach are forced to wage a battle on two fronts, trying to maintain their identity as sodal scientists, while at the same time re-jecting the mechanical, positivist excesses of the proponents of the variable-oriented approach.

164 Proponents of the case-oriented approach typically adhere to a deterministic view of causality, in contrast with a probabilistic conception of causality that is inherent in the statistical, variable-oriented approach (see Ragin (1987) for a confession to a determin-istic conception of causality and Lieberson (1992) for a critique of this position).

across units in space and time. It is open to the possibility that there exists a unique relation between treatment and outcome for each unit that could have been exposed to either Xl or Xo, and for each social/historical situation in which the unit might be situated.165

However, since we can only in practice observe units that have either -at any given moment - been exposed to the tre-atment Xl or to the alterna-tive treatment Xo,166 the causal effect is unobservable, and hence we face what Holland (1986) calls the "fundamental problem of causal inference".

while the strong nomothetic disposition of KKV is not reflected in this initial definition of causality, it is introduced or if you like "smuggled in"

as a set of fundamental assumptions needed to make possible the infer-ence of causal effects from empirical data. As KKV convincingly argue, it is absolutely necessary to assume that there exists some regularity across units in time and/or space, if there shall be any hope of ever uncovering causal effects on the basis of cross-case/cross-time comparisons. If causal effects were assumed always to be unique to the specific historical units/

events, then they could never be observed, not even indirectly, with the help of external comparisons.

Given the fundamental problem of causal inference, the best we can do is to try to gain indirect information on the "average expected causa l effect" across a set of units, but this requires in turn that we make certain fundamental assumptions: the Unit Homogeneity assumption and the assumption about Conditional Independence (KKV: 94).

The Unit Homogeneity assumption is the most fundamental. It implies that the (expected) causal effect stays constant across units that are sepa-rated by time and space, and it allows the researcher to draw inferences about the typical causal effect from available empirical information on different units. Conditional Independence implies that the values are assigned to the independent/treatment variable independently of the values taken by the dependent variable.

Weaker or stronger versions of these assumptions are necessary in non-experimental settings to get around the fundamental problem of causal inference. They allow the researcher to use variation in the dependent variable across units with different values on the treatment

165 Hence, this definition is in itself rather ecumenical in spirit. It is capable of accomrno-dating a wide range of positions in the classical schism between ideographic and nomothetic approaches to the study of human society.

166 Or, for that matter, to some third alternative treatment, ID which case one would have two counterfactual situations to consider.

variable as an indirect source of information on the unobservable coun-terfactual outeornes connected with each individual case. One can say that cases with different values on the explanatory variables act as stand-in counterfactuals for each other.

Under ideal conditions, where these assumptions are fully satisfied, the causa l effect can simply be estimated by comparing the average scores of units that have been exposed to the treatment (Xl) with the average scores of units that have been exposed to the reference treatment (Xo). Of course, this is not very realistic outside the realm of controlled experiments where the researcher can make sure that the treatment is being randomly assigned.

While some minimal satisfaction of these two assumptions is needed, it would be utterly naive to simply assurne that they are fully satisfied in the natural experiments we encounter at the micro as well as the macro-level. Various statistical mo dels and statistical techniques are available to handle a wide range of modifications to the Conditional Independence assumption and even some deviations from the assumption of Unit Homogeneity. The most simple deviation is represented by the existence of variables that exercise a linear influence on the dependent variable while they happen at the same time to be (linearly) correlated with the treatment variable. The standard statistical solution to this problem is of course to "control" for the influence of these factors through the applica-tion of multivariate techniques.

This is the fundamentallogic behind all statistical analysis, and it is the claim of KKV that the comparative method should be understood as the application of this logic to the study of variation between macro units. Of course both KKV and other advocates of the variable-oriented approach immediately recognize that the application of statistical styles of analysis to samples of macro units is particularly problematic, first and foremost because of the naturallimitation in the number of cases that are available for analysis (see the citation from Smelser (1976) above).

The main argument that can be made in favor of this interpretation of the comparative method is that it is built around a consistent and meaning-ful interpretation of the notion of causation and causal effects. You might very weU choose to reject the Unit Homogeneity and the Conditional Inde-pendence assumptions even in their weakest form,167 and hence argue

167 Such a c1airn could either be made with reference to macro-social units only, or it could be made with reference human society more generally.

that the application of statistical models is fundamentally misplaced. How-ever, it seems that this position would imply that all attempts at causal inference are impossible - or at ieast that the comparison between country cases cannot play any role in efforts to support or reject causal claims.

Problems of statistical inference based on cross-national data

From the point of view of the variab1e-oriented approach, the problems facing comparative analyses are basically same as the problems facing attempts at causal inference based on micro-data with, for instance, indi-viduals as units of analysis. What distinguishes the analysis of macro units is primarily the small number of cases available, and this will often imply that the statistical methods that might be available for larger samples of replicable data cannot be applied efficiently.

The main caveat of comparative analysis is the small-N problem - both in its own right and because it aggravates a range of other problems that haunt attempts at causal inference based on naturai experiments. In the following Ishall describe five of the most important of these problems and discuss their significance with respect to the present study: 1) meas-urement error and lack of comparability, 2) ornitted variables and endog-eneity, 3) dependency and contarnination between country cases, 4) mul-tiple and conjunctural causation, 5) asymmetrical causal relationships.

1) Measurement error and lack of comparability. The problem of low-quality data (measurement error) is likely to be particularly serious in cross-national research, where agencies and routines for data collection and data verification have been slow to develop. It cannot be denied that much quantitatively oriented, comparative research is flawed by an uncritical use of low-quality data, which often have been collected by international organizations for other purposes than research.

Whenever serious measurement error is present in a cross-sectional data-set, the implications are particularly serious, due to the small sample sizes. All systematic measurement error and random error in the measure-ment of independent variables is always a serious problem, as it williead to biased estimates. Random error in the measurement of the dependent vari-able will not cause bias but only reduce the efficiency of estimation, and hence it can be considered a minor problem as long as sample sizes are big enough. But when sample sizes are as small as in typical cross-national data-sets, any loss in efficiency is of course a very serious problem.

The issue of comparability can concern variables as weU as the ca ses themselves. Comparability of variables is prirnarily an issue of measurement.

It is a basic requirement that variables should - as far as possible - be measured in an equivalent way in different countries. However, equiva-lence is not always maximized by a rigid replication of measurement pro-cedures across country cases CPrzeworski and Teune, 1970:107ff). Some-times equivalence can only be approached by letting measurement be standardized according to certain particular aspects of the national con-text. The issue of equivalent measurement arises in many places in the present thesis. Take as an example the calculation of replacement rates offered by public pension systems. In order to do such calculation you need to decide on a set of "typical workers/retirees" in terms of family relations and lab or market histories. However, what is typical in one country is not necessarily typical in another, and the question arises whether to do the calculation for similar types of individuals across coun-tries or instead to let the definition of the typical worker/retiree vary.

Also the very comparability of country cases can be put into question.

Can, for instance, small and homogeneous nation states like the Scandi-navian countries be compared with a large and heterogeneous political entity like the US? Proponents of the variable-oriented approach are inc1ined to respond that the issue of comparability between cases can often be translated into a question of omitted variables (KKV and Goldthorpe, 1997): When we say that two countries are too different to be compared, we often mean there are important differences between the ca ses that could by themselves be responsible for the observed difference in outcome variables.

2) Omitted va ria bles and endogeneity. Variables that influence the dependent variable and are correlated with the treatment variable will bias the estimation of the treatment effect if they are exc1uded from the analysis. The standard statistical remedy to this fundamental problem of all non-experimental data analysis is to inc1ude all variables that are sus-pected to influence the dependent variable (and be correlated with the treatment variable) in multivariate analyses. However, this standard solu-tion of statistically controlling for all variables that could be a source of bias is almost impossible to apply to the typical cross-national data-set due to the small number of cases that are available for analysis. Unless you have more cases than variables it is logically impossible to identify the effect of each independent variable, and of course the practical requirements are much tougher than this. As a very rough rule of thumb you need at least ten cases per independent variable in order for the esti-mates of a multivariate regression equation to stabilize (Hox, 1994). With

a sample of 20 cases, for instance, you can at the very best control for one variable in addition to the treatment variable. Furthermore, when the sample is this small, the linear dependenee between regressors need not be very strong before multi-collinearity becomes a serious problem, and of course it is when the linear dependenee between regressors is signifi-cant that we need multivariate control the most (Lieberson, 1985).

The problem - in relation to the present study - is that we cannot a priori rule out the potential influence on the income distribution among retirees of other soeial and economic factors that might even be systemat-ically related to the observed variation in public pension systems. The list of potentially important factors can be made very long, but I will argue that the degree of income stratification prevailing in the general pop ula-tion is likely to be particularly important.

The application of linear controls for all possible observable covariates is no guarantee for the satisfaction of the assumption about Conditional Independence. Endogeneity of treatment variables represents a further, more subtle, violation. It refers to a situation where the proeess of assign-ing values on the treatment variable is somehow correlated with the dependent variable - perhaps due to some unobservable characteristic of the cases involved in the analysis, or to some type of causal feed-back (simultaneity). It is possible to imagine, for instance, that the divergence' in public pension systems across the OECD area could to some extent be the effect of, as weU as the cause for, variation in the coverage and scope of occupational pens ion schemes. There are statistical remedies available also for these groups of problems, but they tend to be highly demanding of the data (the most effieient solutions require longitudinal data) and/or to be dependent up on strong secondary assumptions (for a discussion of models with sample selection and endogenous treatment effects see Win-ship and Mare (992)).

3) Dependency and contamination between (groups aj) country cases.

Standard statistical techniques for cross-sectional data assume that each case represents an independent natural experiment, where the forces responsible for the outcome are internal to each case - the structural part as weU as the stochastic part. Put in more technical terms, it is assumed that the error terms of the dependent variable are not correlated across cases,168 and more fundamentally it is assumed that the effect of explanatory

168 In time-series analysis this problem is known as auto-correlation, and when it appears in cross-sectional data it is sometimes called "spatial auto-correlation" (Johnston,

168 In time-series analysis this problem is known as auto-correlation, and when it appears in cross-sectional data it is sometimes called "spatial auto-correlation" (Johnston,

In document taming of inequality retirement (sider 171-188)