• No results found

The descriptive statistics for my variables are presented in table 5 below. Instead of only using the regular summary command in STATA, I first use a special summary command that reports the within and between standard deviations of the variables (xtsum). This will allow us to see the ratio of within and between variance the independent variables have.

Table 6: Within and Between variation

Variable Between st. deviation Within st. deviation

FDI inflow 2.490 1.379

Corruption (TI) 2.041 0.430

Bureaucratic corruption 1.028 0.545

Political corruption 0.930 0.459

Democracy 6.225. 2.020

Human Development Index 0.161 0.022

Bureaucratic Quality 0.981 0.185

Political Stability 0.968 0.304

Quality of Rule of Law 0.979 0.175

GDP growth 4.666 4.968

GDP 1.28 3.27

Extractive sector/GDP 14.252 5.476

Taxes 11.816 4.576

Export and Import/GDP 43.755 16.84

Inflation rate volatility (%) 16.400 33.391

This table clearly shows that for the majority of the variables, the majority of the variance is between countries. If one were to run a fixed effects model to test the effects of these variables upon FDI, the between variation would be controlled out and we would estimate only the within variance. Some of the independent variables do reveal a large amount of variance over time.

Market potential measured as GDP growth and market size measured as GDP naturally has the

55 most variance over time. What is interesting however is that they do have surprisingly large between variation as well. The reason for the majority of within variance is from the nature of the variables, the size of an economy does grow over time, and that is reflected in the within variance. Economic stability, measured as the change in inflation rates, also has its majority of variance from the within component, which indicates that inflation rates have been unstable over time. observations, while HDI has 1208.43 This can cause a high number of missing observations in the regression. There is also large variation in FDI inflow. This is good, because little variation would make it difficult to measure impacts of the independent variables. As mentioned above,

43 Initially the observations for the independent variables were slightly higher. However, as will be discussed in section 5.1 and 5.6 I lag the independent variables to attempt to remedy the issue of reverse causality and simultaneity, as well as the theoretical expectation that changes take time to make their effect, and are not instantaneous.

56 FDI has been logged, and the negative values dropped. However, values between zero and one that are transformed will become negatives, and this is ok. The problem is when you have values that are negative before the transformation. As such, the minimum value is still a negative, -11.5, while the maximum is 12.6. In addition, corruption shows much variation with observations all over the zero to ten scale with a total of 2429 observations. In general, this dataset has many observations, which is one of the benefits of panel data. The exception is the two variables from IPD, which will only be used for one model with 200 observations each.

Further, the Polity IV measure of democracy has much variation, and so does the Human Development Index, with observations covering nearly the entire scale and relatively large standard deviations from the mean. However, the Human Development Index does have relatively few observations, compared to the other variables. This is also reflected in the regression model with the HDI as an explanatory variable, where the total n drops to 1025. The quality of political institutions variables also displays solid variation with over 2600 observations. These have very similar coverage due to being from the same dataset. The fact that variables such as these, which do not have much variation over time (within), display such variation reflects the fact that the data sample is highly diversified in regards to country coverage. This is a good quality to have in a dataset.

One of the downsides of my dataset is that of missing observations. As such, when ordering the dataset by country and year in STATA, I am notified that the dataset is unbalanced. This is mostly due to the fact that I have merged several datasets together, and the raw datasets have varying coverage. Missing observations are not usually an issue, as long as the missing observations are not systematic. If the missing observations are systematic, then the reason they are missing is somehow correlated to the dependent variable (Verbeek 2004, 381). I do not see any correlation between foreign direct investment, and the fact that for example the values for foreign direct investment inflow is missing from Afghanistan in 1997 and 1998. Because my time series starts in 1995 (96 with lag) and ends in 2012 I avoid the most common repercussion of missing observations, namely that of heavily underrepresented low developed and/or non-democratic countries. If the time series had started in 1970, developed and non-democratic countries would have been heavily overrepresented, giving us misleading coefficients for a general relationship. Values can be imputed manually to decrease the number of missing values.

However, this requires that we have some data to justify our guess. The large majority of the missing values in my dataset are from countries that simply have no coverage in one of the raw

57 datasets, and as such we have no origin or point to estimate from.44 In addition, many missing observations come from countries that are covered later than others, such as Afghanistan. Of the control variables that are included in every regression, taxes has relatively low observations and causes many missing observations. In fact, over 600 observations are lost when adding the taxes variable to the regression. I therefore run my baseline regressions without that variable, but I add regression results with it in a column because it is has proved significant in other studies.

Because the missing observations are not systematic, there is no particular danger in employing the unbalanced dataset.