• No results found

9. METHODOLOGY

9.2 P ROBIT AND LOGIT

Binary response variables necessitate analyzing the data with binary outcome models. When modelling a binary outcome, the dependent variable 𝑦 is limited to taking the values 0 or 1.

𝑦 = {1 𝑖𝑓 𝑦𝑒𝑠0 𝑖𝑓 π‘›π‘œ

The probability that 𝑦 = 1 is estimated as a function of the independent variables. In general, the response probability is assumed to be (Wooldridge, 2014, p. 202):

𝑃(𝑦 = 1|π‘₯) = 𝛽0+ 𝛽1π‘₯1+ β‹― + π›½π‘˜π‘₯π‘˜ = 𝛽0+ π‘₯𝛽

where π‘₯𝛽 = 𝛽0+ 𝛽1π‘₯1+ β‹― + π›½π‘˜π‘₯π‘˜ .

Even if the linear probability model (LPM) would be easy to estimate, and the results easy to interpret, it has some limitations. First, the LPM model does not limit the probability of the dependent variable between zero and one. Related, it is impossible that all the independent are linearly related to the probability for all of their values. Since the partial effect of the explanatory variables are constant, the probability would then exceed one or go below zero which makes little sense mathematically. However, it is still useful if the values of the independent variables are close to the sample average. Second, the residuals in LPM are heteroskedastic (Wooldridge, 2014, p. 205), although this can be solved by obtaining residuals that are robust to heteroscedasticity.

Instead, using nonlinear binary response models, probit or logit, overcomes some of these drawbacks. The interest lies in the response probability

𝑃(𝑦 = 1|π‘₯) = 𝑃(𝑦 = 1|π‘₯1, π‘₯2, … , π‘₯π‘˜)

to analyze the dependent variable (Wooldridge, 2014, pp. 459-460).

The nonlinear binary response model is assumed to be:

𝑃(𝑦 = 1|π‘₯) = 𝐺(𝛽0+ 𝛽1π‘₯1+ β‹― + π›½π‘˜π‘₯π‘˜) = 𝐺(𝛽0+ π‘₯𝛽)

Where π‘₯𝛽 = 𝛽1π‘₯1+ β‹― + π›½π‘˜π‘₯π‘˜ and G is a function that, for all real numbers 𝑧, strictly varies between zero and one: 0 < 𝐺(𝑧) < 1.

Standard normal distribution is assumed when using a probit model, and a logistic distribution when using a logit model. The probit model uses the standard normal cumulative distribution function expressed as an integral:

𝐺(𝑧) = πœ™(𝑧) = ∫ πœ™(𝑣) 𝑑𝑣

𝑧

βˆ’βˆž

where

πœ™(𝑧) = 1

√2πœ‹π‘’π‘₯𝑝 (βˆ’π‘§2 2)

is the standard normal density function. Here, 𝐺 returns a value between zero and one.

The probit and logit models both increase in π‘₯𝛽. They increase rapidly at π‘₯𝛽 = 0, but the effect on G at extreme values of π‘₯𝛽 tends to zero. As 𝑧 approaches infinity, the limit of G(z) is one. As 𝑧 approaches negative infinity, the limit of 𝐺(𝑧) is zero (Wooldridge, 2014, p. 461).

Probit and logit models give us the opportunity to use a latent variable model approach to analyze the dependent variable. A latent variable model is traditionally used for estimating parameters of interest when the dependent variable is not fully observed, as for example the motivation behind bribing or the culture embedded in the country. If π‘¦π‘–π‘‘βˆ— is a latent, or unobserved, variable then we suppose that

π‘¦π‘–π‘‘βˆ— = 𝛽0+ 𝛽π‘₯𝑖𝑑+ 𝑒𝑖 + 𝑒𝑖𝑑, 𝑦𝑖𝑑 = 1[π‘¦π‘–π‘‘βˆ— > 0]

In the binary outcome model, 𝑦𝑖𝑑 is assumed to equal one if π‘¦π‘–π‘‘βˆ— > 0 and to equal zero if π‘¦π‘–π‘‘βˆ— ≀ 0. Further, the error term 𝑒𝑖 is assumed to be independent of π‘₯ and either standard logistically distributed or standard normally distributed. In any case, the residual is symmetrically distributed around zero so 1 βˆ’ 𝐺(βˆ’π‘§) = 𝐺(𝑧) for all real numbers 𝑧. For logit and probit, the direction of the effect of π‘₯𝑗 on 𝐸(π‘¦βˆ—|π‘₯) = π›½π‘œ+ π‘₯𝛽 is always the same as the effect on 𝐸(𝑦|π‘₯) = 𝐸(𝑦 = 1|π‘₯) = 𝐺(π›½π‘œ+ π‘₯𝛽). However, as the latent variable π‘¦βˆ— is not necessarily measurable, the magnitude of each 𝐡𝑗 is not necessarily useful (Wooldridge, 2014, pp. 461-462).

Maximum likelihood estimation (MLE) is the usual method of estimation for the probit and logit models. Since these are based on the distribution of 𝑦 given π‘₯, there is automatically heteroscedasticity in π‘‰π‘Žπ‘Ÿ(𝑦|π‘₯) (Wooldridge, 2014, p. 463). MLE chooses the parameters of the model that maximize the log-likelihood function. We need the log-likelihood function for each 𝑖. The density of 𝑦 given π‘₯𝑖is

𝑓(𝑦|π‘₯𝑖; 𝛽) = [𝐺(π‘₯𝑖𝛽)]𝑦[1 βˆ’ 𝐺(π‘₯𝑖𝛽)]1βˆ’π‘¦, 𝑦 = 0,1

The intercept, is for simplicity, absorbed into the vector π‘₯𝑖. When 𝑦 equals one or zero, we get respectively 𝐺(π‘₯𝑖𝛽) or 1 βˆ’ 𝐺(π‘₯𝑖𝛽). The log-likelihood function for observation 𝑖 is obtained by taking the log of this density function for 𝑦.

ℓ𝑖(𝛽) = 𝑦𝑖log[𝐺(π‘₯𝑖𝛽)] + (1 βˆ’ 𝑦𝑖)log [1 βˆ’ 𝐺(π‘₯𝑖𝛽)]

Thereafter, for a sample size of n, the log-likelihood is obtained by summation of all observations:

β„’(𝛽) = βˆ‘ ℓ𝑖(𝛽)

𝑛 𝑖=1

Ξ² maximizes the log-likelihood through an iterative process, and is consistent and asymptotically normal (Wooldridge, 2010, p. 568).

9.2.1 Interpretation of regressors

As a non-linear function, the interpretation of the probit and logit model is different from LPM. The partial-effect of a change in one of the independent variables must be calculated, but only the direction of the coefficient is possible to directly interpret. A positive effect in the coefficient implies a higher probability of 𝑦𝑖 = 1, and a negative coefficient a lower probability of 𝑦𝑖 = 0 (Wooldridge, 2009). The marginal effects of the explanatory values are calculated either at mean of the explanatory variable or as the average marginal effect. This thesis will use the average marginal effect, as many of the explanatory variables are ordinal and therefore have irrational means.

It is unproblematic to include standard functional forms through the explanatory variables (Wooldridge, 2014, p. 463) if one is aware of the difference in interpretation of the partial effects. The partial effect of π‘₯𝑗 on 𝑃(𝑦 = 1|π‘₯) depends on all π‘₯. 𝛽̂0 is the predicted probability of success when each π‘₯𝑗 is set to zero (Wooldridge, 2010, p. 561). The partial effects relevant to the thesis will be discussed below.

Dummy variables

If π‘₯𝑗 is a binary (dummy) variable, then the partial effect is obtained by holding all others fixed and changing π‘₯𝑗 from zero to one.

𝐺(𝛽0+ 𝛽1+ 𝛽2π‘₯2+ β‹― + π›½π‘˜π‘₯π‘˜) βˆ’ 𝐺(𝛽0+ 𝛽2π‘₯2+ β‹― + π›½π‘˜π‘₯π‘˜)

Discrete variables

Similarly, for a discrete variable π‘₯𝑗, the effect of the probability to go from π‘π‘˜ to π‘π‘˜+ 1 is 𝐺(𝛽0+ 𝛽1+ 𝛽2π‘₯2+ β‹― + π›½π‘˜(π‘π‘˜+ 1)) βˆ’ 𝐺(𝛽0+ 𝛽21π‘₯2+ β‹― + π›½π‘˜π‘π‘˜)

Continuous variables

If π‘₯𝑗 is a continuous variable, the partial effect on 𝑝(π‘₯) = 𝑃(𝑦 = 1|π‘₯) is obtained from the partial derivative:

πœ•π‘(π‘₯)

πœ•π‘₯𝑗 = 𝑔(𝛽0+ π‘₯𝛽)𝛽𝑗

Where 𝑔(𝑧) =𝑑𝐺

𝑑𝑧(𝑧) is the probability density function. As 𝐺(z) is a strictly increasing cumulative distribution function, 𝑔(𝑧) > 0 for all 𝑧. The partial effect of π‘₯𝑗 on 𝑃(𝑦 = 1|π‘₯) depends on π‘₯ via the positive quantity 𝑔(𝛽0+ π‘₯𝛽). Hence, the partial effect always has the same sign as 𝛽𝑗 (Wooldridge, 2014).

Logarithms

As a logarithm, the growth of the variable is normally presumed to decline as the value of the variable increases. If the independent term is a logarithm, the following marginal effect would be (Wooldridge, 2010, p. 567):

πœ•π‘ƒ(𝑦 = 1|𝑧)

πœ•log (𝑧𝑗) = 𝑔(π‘₯𝛽)𝛽𝑗

where the change in 𝑃(𝑦 = 1|𝑧) given a 1 percent increase in 𝑧𝑗 is approximately 𝑔(π‘₯𝛽)(𝛽𝑗

100).

Interaction terms

According to Norton, et al., (2004), the full interaction effect for an interaction term π‘₯1π‘₯2 is given by

πœ•2Ξ¦(𝑒)

πœ•π‘₯1πœ•π‘₯2 = 𝛽12Ξ¦β€²(𝑒) + (𝛽1𝛽12π‘₯3)(𝛽2𝛽12π‘₯1)Ξ¦β€²β€²(𝑒)

Where Ξ¦ is the standard cumulative distribution function and 𝑒 is the index 𝛽1π‘₯1+ 𝛽2π‘₯2+ 𝛽12π‘₯1π‘₯2+ π‘₯𝛽. Hence, the difference from the linear interaction effect has several

implications for nonlinear models. First, even if 𝛽12 = 0, the interaction effect can be different from zero. Additionally, the t-test on the coefficient of the interaction term is not sufficient to test the statistical significance of the interaction term. The statistical

significance of the entire cross derivative has to be calculated. As with continuous variables and logarithms, the interaction effect is conditional on the independent variables. Since the interaction effect depends on two additive variables that can be both positive or negative, the interaction effect may have different signs for different values of covariates. Hence, the sign of 𝛽12 does not necessarily indicate the sign of the interaction effect (Norton, et al., 2004).

9.2.2 Specification tests

The Lagrange Multiplier Test, the Likelihood Ratio (LR) test and the Wald test are the tests commonly used to test exclusion restrictions in probit and logit models. In running my regressions, I make use of the last two.

The Wald test requires a test of the unrestricted model, and thereafter tests restrictions compared to the unrestricted (base) case. It uses an asymptotic chi-square distribution, with 𝑑𝑓 equal to the number of restrictions being tested (Wooldridge, 2014).

The LR test is based on differences in log-likelihood in an unrestricted and a restricted model.

As the MLE is maximizing the log-likelihood function, dismissing variables would have the potential effect of yielding a lower log-likelihood value (Wooldridge, 2014). We therefore test if the magnitude of this potential drop in log-likelihood is big enough to conclude that these dropped values are important.

The likelihood ratio statistic is given by:

𝐿𝑅 = 2(β„’π‘’π‘Ÿβˆ’ β„’π‘Ÿ)

Where β„’π‘’π‘Ÿ is the log-likelihood of the unrestricted model and β„’π‘Ÿ is the log-likelihood for the restricted model. As the log-likelihood is always negative, this is a nonnegative and strictly

positive number as β„’π‘’π‘Ÿ β‰₯ β„’π‘Ÿ. The LR is also chi-square distributed with q degrees of freedom equal to the number of restrictions. The null hypothesis is rejected when LR exceeds a critical value (Wooldridge, 2014).

𝐿𝑅~π‘Ž

Ο‡

π‘ž2