• No results found

The nature of dependent variable in this paper can be identified as a count variable or count data. The yearly battle deaths between central governments and rebel groups, which constitute the dependent variable in this paper, count the deaths that have occurred at a location and time. Similar types include how many patients died at a hospital within the last month, counts of doctor visits, absenteeism in the workplace, absenteeism at the school for each student or the count of terrorist attacks towards a country. In statistics, count data implies observations that have only nonnegative integer values ranging from zero to some greater undetermined value (Hilbe 2014, 2). A potential problem with count data that applies to conflict severity, specifically of the dependent variable of this paper is overdispersion. The problem of

overdispersion occurs due to large number of zeros in combination with high counts of battle deaths in other instances. For instance, battle deaths for a country may have enormous

variations from a year to another. Whilst battle deaths may be zero for year t for country n, the battle death for the same country in year t+1 may be thousands. In other words,

overdispersion refers to excess variability of the data. The problem of overdispersion occurs where the variability of the data is greater than the mean of the data (Hilbe 2014, 9).

Overdispersion in general is the occasion when the observed variance in a model is greater than its expected variance. Overdispersion is caused by a positive correlation between

responses or by an excess variation between response probabilities or counts (Hilbe 2014, 82).

When the data exhibits greater variability than is predicted by the implicit mean-variance relationship, the problem of overdispersion emerges. According to Hilbe (2014)

overdispersion occurs when:

- The model omits important explanatory predictors, - The data includes outliers,

- The model fails to include needed interaction terms or terms, - The model fails to transform a predictor to another scale

- The assumed linear relationship between the response and the link function and predictors is mistaken.

Thus, not-accounting for this problem may lead to underestimation of standard errors and misleading inference for the regression parameters. Similar studies to the study of this paper,

66 which have dependent variables as a count such as the number of fatalities, the counts of civilian deaths, the counts of terrorist attacks directed against a country or a count of battle deaths, have referred to techniques that account and correct for the problems associated with overdispersion.

The study of Hultman & Peksen (2017) that incorporates the number of fatalities in battle-related violence as the dependent variable utilizes negative binominal regressions as their estimation technique. Another study of Fjelde & Hultman (2014) which has civilian deaths as the dependent variable, uses negative binominal regression as their estimation method. The use of negative binominal regression as the estimation method is motivated by the ability of this estimator to account for overdispersion of the dependent variable. The negative binominal regression has an extra parameter which adjusts to accommodate the extra variability or the heterogeneity in the data (Hilbe 2014, 10). The extra parameter in negative binominal models provides a wider shape to the distribution of counts than is allowed under Poisson distribution method that has a single parameter to be estimated and interprets the mean and the variance of a distribution as the same. In other words, the assumption of the Poisson distribution is that the mean and variance is the same. The higher the value of the mean of the distribution, the greater the variance or variability in the data. This assumption of Poisson distribution is referred to as the equidispersion. The challenge is that when modelling real data,

equidispersion assumptions is rarely satisfied (Hilbe 2014, 9). Thus, the extra parameter allows the variance of the data to increase as the mean of the data increases.

Dependent variables that are discrete and not continuous violate basic assumptions of OLS, which require the dependent variable to be continuous and residuals to be bell-shaped. Count variables violate the assumption of normal distribution. STATA software provides a practical command that graphs the observed proportions along with the Poisson and negative binominal probabilities for a count type variable. The Poisson probabilities are computed using an estimate of the Poisson mean. The negative binominal probabilities use the same means and an estimate of the overdispersion parameter. Figure 4.1 illustrates how well Poisson and negative binominal probabilities fit the observed proportions. From the Figure 4.1 we can observe that there is not enormous difference between Poisson probabilities and negative binominal probabilities. However, the negative binominal looks to be a much better fit than Poisson probabilities.

67 Figure 4.1: The observed proportions versus Poisson and negative binomial probabilities

A critical assumption of Poisson distribution is that events are independent (Long 1997, 219).

This means that when an event occurs it does not affect the probability of the event occurring in the future. Long (1997) demonstrates this by referring to an example of publication of articles by scientists. The assumption of independence suggests that when a scientist publishes a paper, the rate of publication of a scientist does not change. In other words, past success in publishing does not affect future access. This problem is also highlighted by Fjelde &

Hultman (2014) who study a count variable as the dependent variable. One of their reasoning is the feasibility of utilizing negative binominal regressions when there is contagion in the dependent variable. In their case, contagion means that the rate at which civilians are killed in one location is no independent from how many civilians have already been killed in that location in the same year. A similar feature may also be persistent in this paper. The battle deaths in one country is not independent from how many battle deaths there were in that country in the same year. Hultman, Kathman & Shannon (2013) employ a count variable, which is the number of civilians killed in a conflict month and highlight the problem of contagion and refer to negative binominal model to estimate the results.

The graph, which is demonstrated by Figure 4.1, shows that the fitted Poisson distribution under-predicts 0s. The pattern of under-prediction is argued to be a feature of fitting a count model that does not consider heterogeneity among sample countries in their rate µ (Long &

Freese 2001, 228). The univariate Poisson distribution assumes that all conflicts have the same rate of severity, which is clearly unrealistic. The negative binominal regression model

0.2.4.6Proportion

0 2 4 6 8 10

k

observed proportion neg binom prob poisson prob

mean = 621.3; overdispersion = 15.03

68 addresses the failure of the Poisson distribution by adding a parameter that reflects

unobserved heterogeneity among observations. Negative binominal regression adds an error parameter that is assumed to be uncorrelated with the independent variables. Thus, negative binominal regression provides a more feasible technique for estimating over-dispersed count variables. In addition, this paper will estimate all the models with robust standard errors clustered on countries that have observations of battle deaths due to potential correlation between observations within the same country.