• No results found

Health Worker Labor Supply, Absenteeism, and Job Choice

3. ways Forward

Most existing studies for developing countries that analyze health worker labor choices are descriptive in nature and provide limited insights on causal relation-ships.35 Two valuable exceptions are the Rwanda study on pay for performance (Basinga et al. 2011) and the Zambia study on motivation (Ashraf, Bandiera, and Jack 2012). To contrast the strengths and weaknesses of descriptive and causal analysis, it is useful to compare observational to experimental data.

From Descriptive to Causal Analysis Descriptive Analysis

The most common way to generate quantitative microdata is through survey methods. The analysis of these data can provide excellent insights, but typically cannot assess causal relationships—although econometric techniques can help address this in some instances, as will be discussed in the next section.

Survey data have both strengths and weaknesses. Surveys are good at measuring a broad range of issues, which produce results representative of a wider population and provide strong opportunities for comparison across settings (jobs, facilities, sectors, countries), especially for straightforward factual information making use of standard questions. Surveys can rely on a rich and long tradition that provides possible examples and templates. The data can provide excellent descriptions of reality and allow testing the strength of relationships. In some instances causal analysis is feasible. Survey data can also generate hypotheses for further work.

The discussions in the previous sections provide ample illustration of how the analysis of survey data can shed light on the behavior of health workers.

They can reveal the incidence and scope of labor supply, absenteeism, and occu-pational choice, as well as describing the relative importance of correlates such as remuneration and other job attributes, in addition to workplace, household, and individual characteristics. This type of analysis creates clear descriptive overviews of who works where, what attributes seem to matter most in a given context, and which individual characteristics stand out. They also provide an insight into the equilibrium career path for health workers under prevailing conditions.

When data over time are available, movements and transitions in the labor market can be observed. A transition matrix showing movements between

different positions as well as in and out of the labor force can contribute to a better understanding of the dynamics of the labor market. The data to build such a matrix can be obtained by including recall questions in a cross-section survey or, more reliably, by repeating surveys over time, ideally with the same respondents. Although panel data provide many advantages, insights can also be gained from repeated cross-section surveys using cohort analysis. Of par-ticular interest is the collection of long-term panel data to reveal long-run career paths.

Surveys also have weaknesses: for example, they are often not good at measur-ing sensitive issues or issues bordermeasur-ing on illegality. The practice of dual jobs can, for instance, be difficult to investigate with general surveys in settings where it is illegal. In these contexts, qualitative techniques may provide an alternative.

Surveys also yield measurement error. Although understanding the accuracy of measurement through survey remains limited today, this is receiving increasing attention for survey methods in general, and for measurement of labor issues in particular.36

Surveys are at their strongest when they are combined with information on an exogenous event, such as a change in regulation, and when they collect data on both baseline and endline in treatment and control groups. But even with these data, important identification challenges often remain. With respect to labor supply, one concern is whether the reported figures should be taken as proxy for the demand or supply of labor, since they actually reflect the intersec-tion of supply and demand. In the absence of exogenous variaintersec-tion, and perhaps especially in the presence of a large monopsony (the public sector), how to interpret the results obtained from labor supply analysis with survey data often remains unclear.

Causal Analysis

From an analytical and econometric perspective, a frequent weakness of observa-tional data is that they do not allow the identification of an unbiased causal effect because unobserved variables may be important. Panel data can provide some solution because they permit controlling for (or differencing out) individual effects. Combining panel data with changes in the environment that are exoge-nous to the individual health worker—such as a change in regulation or law—can shed further light and can allow for estimating causal effects of this environmen-tal change when using the appropriate econometric technique. A discussion of the most frequently used techniques—including difference in differences (DID) analysis, instrumental variable (IV) estimation, regression discontinuity design (RDD), and propensity score matching (PSM)—falls outside the scope of this work; see existing work for an in-depth treatment.37

Although panel data can provide insights on causal relationships, a more robust approach is to carry out an RCT.38 RCTs have both strengths and weaknesses, which often remain underappreciated. Even though RCTs now increasingly start from a theory of change, they frequently do not allow the identification of the channel through which change occurs.39 Another weakness

is the problem of generalizability, because RCTs typically estimate a local aver-age treatment effect (LATE) and the results often yield weak external validity.

Environmental dependence is seen by some as the major shortcoming of this method. RCTs also say little about general equilibrium effects, or what happens when agents, aware of the treatment, behave strategically. Moreover, the method crucially depends on careful design and implementation, because small devia-tions may lead to biased results.

RCTs also have many strengths; the most important is the estimation of causal effects for which they allow. Their increased use has brought to the fore the challenges of identification and causal inference in social science. From this perspective, RCTs provide a useful starting point to think about the relationship of interest as well as the method of analysis.

To identify a causal relationship, three questions are key: (a) What is the counterfactual? (b) Can the observed change be attributed to this causal factor?

(c) Through what channel did change occur? Thinking in terms of an experi-ment often helps to address these questions because it makes more explicit what channels are to be tested and what variables need to be controlled for. The following example clarifies this. To identify the causal effect of health worker pay on labor supply, one can consider what experimental setup is needed to study this effect. The RCT would vary wages exogenously for some (the treat-ment group) but not for others (the control group), and then compare the change in labor supply before and after treatment between these two groups.

Classic theory would expect labor supply to go up (as discussed earlier, because leisure is considered to be a normal good). A behavioral approach is more agnos-tic and argues that the sign of the effect depends on the reference point.

Considering this RCT design (without implementing it) underlines a number of key messages, including the need for a control group and for an exogenous varia-tion in wages. It also triggers further thinking on the channel through which this works. This thought exercise also helps when looking for natural experiments that mimic this situation. The Canadian study mentioned earlier (Kantarevic, Kralj, and Weinkauf 2008) provides an example: exploiting a change in the law that is exogenous to the health workers to study the causal effect of changes in pay on labor supply.

Reasoning in terms of an experiment—even if not implemented—clarifies why observational data often cannot provide a good answer to the causal ques-tions: there is no independent variation in the explanatory (right-hand) variable of interest in the earlier example wages. Concerns such as reverse causality, where the explained (left-hand) variable also causes the explanatory (right-hand) variable; or simultaneity, where both the explained (left-hand) and explanatory (right-hand) variables are caused by a third unobserved variable, are not addressed. This is assuming that all key variables are observed—if not, there is an additional omitted variable bias as well.

A number of econometric approaches exist to try and build a counterfac-tual and, using exogenous variation, to identify the causal relationship, as discussed earlier. Only a few examples use these techniques applied to

health worker behavior. Apart from the Canadian study on the effect of a change in regulation of physician remuneration using instrumental variable estimation, another example is the study on health worker job choice in Tanzania using propensity score matching to construct a counterfactual (Kolstad 2010).

The use of RCTs in the study of health worker behavior also remains scarce.

The two studies mentioned earlier—by Bjorkman and Svensson (2009) and Basinga et al. (2011)—stand out; they look at the effect of community monitoring and pay for performance on different aspects of health worker behavior, including health worker absenteeism (as well as patient health outcomes), respectively.

Data and Measurement Concerns

A key question facing applied and operational research of health worker behav-ior is whether additional data need to be collected or data that already exist can be used. Although the decision ultimately depends on the research question, it is clear that existing data that can properly investigate issues of labor supply, absenteeism, and the occupational choice of health workers are severely lim-ited. Given this dearth of data, the emphasis is on self-collected data. A final section discusses in more depth the potential of building more and better administrative data.

Using Existing Data

Two types of existing data can be used for health worker labor analysis: admin-istrative data and data obtained from surveys. Adminadmin-istrative data are often limited to descriptive information that informs policy decisions at a rudimentary level; furthermore, they are available only for the public sector. Administrative data are typically scattered across different ministries, with, for instance, the ministry of health keeping information on issues such as the target number of health workers by occupation, gender, and facility, while payroll information, on the other hand, is typically housed at the ministry of finance. The different data are often difficult to merge at the individual level. Moreover, these data are not collected with a research question in mind and typically are missing information on key dimensions of interest.

Existing survey data suffer from similar shortcomings. Because no dedicated surveys of hospitals and their staff exist (as there are, for instance, for farms or manufacturing firms in many countries), survey data are typically limited to specific data collected by researchers with particular research interests in mind.

Combining data across surveys is fraught with difficulties because the measure-ment of labor supply, earnings, occupation, and so on typically differs across the surveys.

Some good examples do exist of what can be learned from combing data.

Fujisawa and Lafortune (2008) provide a good example for OECD countries, presenting a descriptive analysis of the remuneration of doctors in 14 OECD countries. Using data for general practitioners and medical specialists, the study finds large variations across countries in the remuneration levels of general

practitioners and even greater variations for specialists, whose earnings have increased more rapidly than for general practitioners in nearly all countries over the past decade.40

McCoy et al. (2008) follow a similar approach to generate insights on earnings of health workers in the public sector across four African countries (Burkina Faso, Ghana, Nigeria, and Zambia). They conclude that pay structures and levels of income vary widely across and within countries; they also underline that accurate and complete data are scarce.

Existing national household surveys, such as a labor force survey (LFS) or Living Standards Measurement Survey (LSMS), may also shed light. However, because their aim is to be nationally representative, they typically yield small samples of health workers, leaving limited degrees of freedom to analyze varia-tion within professions. In the case of Ethiopia, for instance, a navaria-tionally repre-sentative survey of the workforce (an LFS) resulted in a sample of fewer than 200 health workers spread across a range of occupations, making the sample too small to carry out a meaningful analysis. At the same time, interesting basic insights can be gained from these types of data, especially across countries, if the questions are uniform. This is illustrated by the WHO’s efforts to combine LFSs across countries, which has resulted in some primary insights regarding the age, gender, and geographical distribution of health workers across professions.41

Other creative approaches can be used to gain insights from existing data.

Ensor, Serneels, and Lievens (2013) use survey data to assess whether the distri-bution of health workers across public and private sectors has changed over time.

Because demand for health worker labor is derived from the demand for health care, the authors compare patient spending, formally called “private expendi-ture,” from surveys for 39 countries in Sub-Saharan Africa over a five-year period using annual data. Although the results need to be interpreted with care, they suggest that while private sector growth was strong over the studied period, the share of private spending as a proportion of total spending declined, signaling strong public sector growth.42

Collecting New Data

When planning to collect new data, surveyors face a number of issues. A decision must be made about whether to conduct surveys only or to implement a full RCT that includes a baseline survey, an intervention, and an endline survey. What group is being researched must also be determined: is it current or prospective health workers, and which professions?

To answer these questions, first the causal relationship of interest must be defined. Thinking in terms of the ideal experiment that would address the research question often helps: what intervention is needed and what change is expected? To answer this requires defining (a) the treatment (that is, the key right-hand variable), such as a change in pay or other job attributes; and (b) the outcome (the left-hand variable), such as labor supply, absenteeism, or occupa-tional choice. A theory of change then sets out how this effect takes place and clarifies the lines along which treatment should be designed heterogeneously, and

determines what other variables need to be included in the analysis and are required in the information gathering.

Collecting new data also raises the issue of measurement error. Measurement error of the left-hand variable is discussed first. Whether one opts for RCT or survey, the outcome (left-hand) variable is typically measured through surveys, although in some cases the outcome is determined through direct observation (for example, surprise visits to measure absenteeism). To what extent surveys intro-duce measurement error and bias estimation results is the subject of a small but growing literature. Although nonrandom measurement error of a continuous left-hand variable is of limited concern because it does not bias estimation results (although it may reduce precision), the worry is bigger with discrete dependent variables such as labor force participation. Here measurement error may bias point estimates (see Hausman 2001; Hausman, Abrevaya, and Scott-Morton 1998).

However, in both cases structural estimates of coefficients of left-hand variables of interest may be biased because of particular characteristics of the instruments used. Different surveys may, for instance, use distinct screening questions that occur early in the questionnaire to define labor force participation. These differ-ences may lead to varying categorizations of subjects resulting in different sub-samples on which estimations are carried out, introducing a selection bias.

Another example may occur when respondents systematically differ across survey methods, with certain types of respondents—such as proxy respondents, often the household head—actively or strategically trying to guess the true value of a variable about which they have imperfect information, such as the income or labor supply of their spouse, thereby introducing systematic errors. A recent study of labor statistics and estimates of labor supply in Tanzania, for instance, finds that both the type of questions and the type of respondent affected the resulting labor force participation, labor supply, and occupational categorization (see Bardasi et al. 2011). Follow-up analysis finds that structural estimates, such as returns to education, can also be affected by survey method. In the case of returns to educa-tion it is the type of queseduca-tionnaire used, but not type of respondent, that intro-duces the bias (see Serneels et al. 2016). These results indicate that care needs to be taken when designing labor supply and occupational choice studies.

Extant studies on absenteeism also provide insights on measurement error.

Measuring absenteeism in developing countries typically happens one of two ways: either through spot checks or surprise visits to the facility to verify the presence of the health worker (Chaudhury et al. 2006), or through frequent or semipermanent observations (Banerjee and Duflo 2006). Both methods provide more reliable data than employer- or facility-based registers or self-reported data, which are often used in the general literature on absence from work but are likely to be downward biased.43 Although the incidence of absenteeism is the most-used measure in studies of health worker absenteeism, more careful measure-ment using permanent observation can also look at its duration.44

Measurement of occupational choice raises specific issues. As mentioned ear-lier, in the absence of incentive-compatible study designs, two types of methods have been applied to measure job choice: contingent valuation and discrete

choice methods. The former distills the precise reservation wage to work in one job versus another, while discrete choice methods concentrate on the trade-off between sets of attributes associated with different jobs. Although each of these approaches has its strengths, they both require a strict implementation method to obtain reliable data.45

The concern that measurement error in right-hand side variables may bias estimation is the subject of a richer literature. Shields (2004) and Antonazzo et al. (2003) discuss the standard issues related to the measurement error of wages that are relevant for health workers. A central concern is whether wages might be driven by characteristics that remain unobserved by the researcher (the focus is on “unobserved ability”). This can be addressed with the use of panel data and random allocation of health workers to jobs. Serneels et al. (2016), for instance, focusing on Ethiopia, make use of a job lottery to obtain predicted wages of health workers in alternative occupations. Similar concerns about biased estimates can be raised for both monetary and nonmonetary benefits.

One issue receiving heightened attention is how to measure health worker motivation. Although there is increasing interest in formally testing the role of intrinsic motivation, there are challenges to calibrating this concept. Contemporary work has measured this variably through survey questions (see Serneels et al.

2007; Serneels et al. 2010) and experimental games (see, for instance, Serneels et al. 2016; Serra, Serneels, and Barr 2011). Ashraf, Bandiera, and Jack (2012) compare effort among health workers under different reward schemes using an RCT that compares performance when providing high financial rewards with performance when providing low financial but also social rewards (stars). They interpret the latter as effects of intrinsic motivation. It remains open for debate whether either of these are good measures.

2007; Serneels et al. 2010) and experimental games (see, for instance, Serneels et al. 2016; Serra, Serneels, and Barr 2011). Ashraf, Bandiera, and Jack (2012) compare effort among health workers under different reward schemes using an RCT that compares performance when providing high financial rewards with performance when providing low financial but also social rewards (stars). They interpret the latter as effects of intrinsic motivation. It remains open for debate whether either of these are good measures.