9. METHODOLOGY
9.2 P ROBIT AND LOGIT
Binary response variables necessitate analyzing the data with binary outcome models. When modelling a binary outcome, the dependent variable π¦ is limited to taking the values 0 or 1.
π¦ = {1 ππ π¦ππ 0 ππ ππ
The probability that π¦ = 1 is estimated as a function of the independent variables. In general, the response probability is assumed to be (Wooldridge, 2014, p. 202):
π(π¦ = 1|π₯) = π½0+ π½1π₯1+ β― + π½ππ₯π = π½0+ π₯π½
where π₯π½ = π½0+ π½1π₯1+ β― + π½ππ₯π .
Even if the linear probability model (LPM) would be easy to estimate, and the results easy to interpret, it has some limitations. First, the LPM model does not limit the probability of the dependent variable between zero and one. Related, it is impossible that all the independent are linearly related to the probability for all of their values. Since the partial effect of the explanatory variables are constant, the probability would then exceed one or go below zero which makes little sense mathematically. However, it is still useful if the values of the independent variables are close to the sample average. Second, the residuals in LPM are heteroskedastic (Wooldridge, 2014, p. 205), although this can be solved by obtaining residuals that are robust to heteroscedasticity.
Instead, using nonlinear binary response models, probit or logit, overcomes some of these drawbacks. The interest lies in the response probability
π(π¦ = 1|π₯) = π(π¦ = 1|π₯1, π₯2, β¦ , π₯π)
to analyze the dependent variable (Wooldridge, 2014, pp. 459-460).
The nonlinear binary response model is assumed to be:
π(π¦ = 1|π₯) = πΊ(π½0+ π½1π₯1+ β― + π½ππ₯π) = πΊ(π½0+ π₯π½)
Where π₯π½ = π½1π₯1+ β― + π½ππ₯π and G is a function that, for all real numbers π§, strictly varies between zero and one: 0 < πΊ(π§) < 1.
Standard normal distribution is assumed when using a probit model, and a logistic distribution when using a logit model. The probit model uses the standard normal cumulative distribution function expressed as an integral:
πΊ(π§) = π(π§) = β« π(π£) ππ£
π§
ββ
where
π(π§) = 1
β2πππ₯π (βπ§2 2)
is the standard normal density function. Here, πΊ returns a value between zero and one.
The probit and logit models both increase in π₯π½. They increase rapidly at π₯π½ = 0, but the effect on G at extreme values of π₯π½ tends to zero. As π§ approaches infinity, the limit of G(z) is one. As π§ approaches negative infinity, the limit of πΊ(π§) is zero (Wooldridge, 2014, p. 461).
Probit and logit models give us the opportunity to use a latent variable model approach to analyze the dependent variable. A latent variable model is traditionally used for estimating parameters of interest when the dependent variable is not fully observed, as for example the motivation behind bribing or the culture embedded in the country. If π¦ππ‘β is a latent, or unobserved, variable then we suppose that
π¦ππ‘β = π½0+ π½π₯ππ‘+ ππ + π’ππ‘, π¦ππ‘ = 1[π¦ππ‘β > 0]
In the binary outcome model, π¦ππ‘ is assumed to equal one if π¦ππ‘β > 0 and to equal zero if π¦ππ‘β β€ 0. Further, the error term ππ is assumed to be independent of π₯ and either standard logistically distributed or standard normally distributed. In any case, the residual is symmetrically distributed around zero so 1 β πΊ(βπ§) = πΊ(π§) for all real numbers π§. For logit and probit, the direction of the effect of π₯π on πΈ(π¦β|π₯) = π½π+ π₯π½ is always the same as the effect on πΈ(π¦|π₯) = πΈ(π¦ = 1|π₯) = πΊ(π½π+ π₯π½). However, as the latent variable π¦β is not necessarily measurable, the magnitude of each π΅π is not necessarily useful (Wooldridge, 2014, pp. 461-462).
Maximum likelihood estimation (MLE) is the usual method of estimation for the probit and logit models. Since these are based on the distribution of π¦ given π₯, there is automatically heteroscedasticity in πππ(π¦|π₯) (Wooldridge, 2014, p. 463). MLE chooses the parameters of the model that maximize the log-likelihood function. We need the log-likelihood function for each π. The density of π¦ given π₯πis
π(π¦|π₯π; π½) = [πΊ(π₯ππ½)]π¦[1 β πΊ(π₯ππ½)]1βπ¦, π¦ = 0,1
The intercept, is for simplicity, absorbed into the vector π₯π. When π¦ equals one or zero, we get respectively πΊ(π₯ππ½) or 1 β πΊ(π₯ππ½). The log-likelihood function for observation π is obtained by taking the log of this density function for π¦.
βπ(π½) = π¦πlog[πΊ(π₯ππ½)] + (1 β π¦π)log [1 β πΊ(π₯ππ½)]
Thereafter, for a sample size of n, the log-likelihood is obtained by summation of all observations:
β(π½) = β βπ(π½)
π π=1
Ξ² maximizes the log-likelihood through an iterative process, and is consistent and asymptotically normal (Wooldridge, 2010, p. 568).
9.2.1 Interpretation of regressors
As a non-linear function, the interpretation of the probit and logit model is different from LPM. The partial-effect of a change in one of the independent variables must be calculated, but only the direction of the coefficient is possible to directly interpret. A positive effect in the coefficient implies a higher probability of π¦π = 1, and a negative coefficient a lower probability of π¦π = 0 (Wooldridge, 2009). The marginal effects of the explanatory values are calculated either at mean of the explanatory variable or as the average marginal effect. This thesis will use the average marginal effect, as many of the explanatory variables are ordinal and therefore have irrational means.
It is unproblematic to include standard functional forms through the explanatory variables (Wooldridge, 2014, p. 463) if one is aware of the difference in interpretation of the partial effects. The partial effect of π₯π on π(π¦ = 1|π₯) depends on all π₯. π½Μ0 is the predicted probability of success when each π₯π is set to zero (Wooldridge, 2010, p. 561). The partial effects relevant to the thesis will be discussed below.
Dummy variables
If π₯π is a binary (dummy) variable, then the partial effect is obtained by holding all others fixed and changing π₯π from zero to one.
πΊ(π½0+ π½1+ π½2π₯2+ β― + π½ππ₯π) β πΊ(π½0+ π½2π₯2+ β― + π½ππ₯π)
Discrete variables
Similarly, for a discrete variable π₯π, the effect of the probability to go from ππ to ππ+ 1 is πΊ(π½0+ π½1+ π½2π₯2+ β― + π½π(ππ+ 1)) β πΊ(π½0+ π½21π₯2+ β― + π½πππ)
Continuous variables
If π₯π is a continuous variable, the partial effect on π(π₯) = π(π¦ = 1|π₯) is obtained from the partial derivative:
ππ(π₯)
ππ₯π = π(π½0+ π₯π½)π½π
Where π(π§) =ππΊ
ππ§(π§) is the probability density function. As πΊ(z) is a strictly increasing cumulative distribution function, π(π§) > 0 for all π§. The partial effect of π₯π on π(π¦ = 1|π₯) depends on π₯ via the positive quantity π(π½0+ π₯π½). Hence, the partial effect always has the same sign as π½π (Wooldridge, 2014).
Logarithms
As a logarithm, the growth of the variable is normally presumed to decline as the value of the variable increases. If the independent term is a logarithm, the following marginal effect would be (Wooldridge, 2010, p. 567):
ππ(π¦ = 1|π§)
πlog (π§π) = π(π₯π½)π½π
where the change in π(π¦ = 1|π§) given a 1 percent increase in π§π is approximately π(π₯π½)(π½π
100).
Interaction terms
According to Norton, et al., (2004), the full interaction effect for an interaction term π₯1π₯2 is given by
π2Ξ¦(π’)
ππ₯1ππ₯2 = π½12Ξ¦β²(π’) + (π½1π½12π₯3)(π½2π½12π₯1)Ξ¦β²β²(π’)
Where Ξ¦ is the standard cumulative distribution function and π’ is the index π½1π₯1+ π½2π₯2+ π½12π₯1π₯2+ π₯π½. Hence, the difference from the linear interaction effect has several
implications for nonlinear models. First, even if π½12 = 0, the interaction effect can be different from zero. Additionally, the t-test on the coefficient of the interaction term is not sufficient to test the statistical significance of the interaction term. The statistical
significance of the entire cross derivative has to be calculated. As with continuous variables and logarithms, the interaction effect is conditional on the independent variables. Since the interaction effect depends on two additive variables that can be both positive or negative, the interaction effect may have different signs for different values of covariates. Hence, the sign of π½12 does not necessarily indicate the sign of the interaction effect (Norton, et al., 2004).
9.2.2 Specification tests
The Lagrange Multiplier Test, the Likelihood Ratio (LR) test and the Wald test are the tests commonly used to test exclusion restrictions in probit and logit models. In running my regressions, I make use of the last two.
The Wald test requires a test of the unrestricted model, and thereafter tests restrictions compared to the unrestricted (base) case. It uses an asymptotic chi-square distribution, with ππ equal to the number of restrictions being tested (Wooldridge, 2014).
The LR test is based on differences in log-likelihood in an unrestricted and a restricted model.
As the MLE is maximizing the log-likelihood function, dismissing variables would have the potential effect of yielding a lower log-likelihood value (Wooldridge, 2014). We therefore test if the magnitude of this potential drop in log-likelihood is big enough to conclude that these dropped values are important.
The likelihood ratio statistic is given by:
πΏπ = 2(βπ’πβ βπ)
Where βπ’π is the log-likelihood of the unrestricted model and βπ is the log-likelihood for the restricted model. As the log-likelihood is always negative, this is a nonnegative and strictly
positive number as βπ’π β₯ βπ. The LR is also chi-square distributed with q degrees of freedom equal to the number of restrictions. The null hypothesis is rejected when LR exceeds a critical value (Wooldridge, 2014).
πΏπ ~π