P ROBIT AND LOGIT - The determinants and impact of business corruption : evidence from establis

9. METHODOLOGY

9.2 P ROBIT AND LOGIT

Binary response variables necessitate analyzing the data with binary outcome models. When modelling a binary outcome, the dependent variable 𝑦 is limited to taking the values 0 or 1.

𝑦 = {_{1 𝑖𝑓 𝑦𝑒𝑠}^{0 𝑖𝑓 𝑛𝑜}

The probability that 𝑦 = 1 is estimated as a function of the independent variables. In general, the response probability is assumed to be (Wooldridge, 2014, p. 202):

𝑃(𝑦 = 1|𝑥) = 𝛽₀+ 𝛽₁𝑥₁+ ⋯ + 𝛽_𝑘𝑥_𝑘 = 𝛽₀+ 𝑥𝛽

where 𝑥𝛽 = 𝛽₀+ 𝛽₁𝑥₁+ ⋯ + 𝛽_𝑘𝑥_𝑘 .

Even if the linear probability model (LPM) would be easy to estimate, and the results easy to interpret, it has some limitations. First, the LPM model does not limit the probability of the dependent variable between zero and one. Related, it is impossible that all the independent are linearly related to the probability for all of their values. Since the partial effect of the explanatory variables are constant, the probability would then exceed one or go below zero which makes little sense mathematically. However, it is still useful if the values of the independent variables are close to the sample average. Second, the residuals in LPM are heteroskedastic (Wooldridge, 2014, p. 205), although this can be solved by obtaining residuals that are robust to heteroscedasticity.

Instead, using nonlinear binary response models, probit or logit, overcomes some of these drawbacks. The interest lies in the response probability

𝑃(𝑦 = 1|𝑥) = 𝑃(𝑦 = 1|𝑥₁, 𝑥₂, … , 𝑥_𝑘)

to analyze the dependent variable (Wooldridge, 2014, pp. 459-460).

The nonlinear binary response model is assumed to be:

𝑃(𝑦 = 1|𝑥) = 𝐺(𝛽₀+ 𝛽₁𝑥₁+ ⋯ + 𝛽_𝑘𝑥_𝑘) = 𝐺(𝛽₀+ 𝑥𝛽)

Where 𝑥𝛽 = 𝛽₁𝑥₁+ ⋯ + 𝛽_𝑘𝑥_𝑘 and G is a function that, for all real numbers 𝑧, strictly varies between zero and one: 0 < 𝐺(𝑧) < 1.

Standard normal distribution is assumed when using a probit model, and a logistic distribution when using a logit model. The probit model uses the standard normal cumulative distribution function expressed as an integral:

𝐺(𝑧) = 𝜙(𝑧) = ∫ 𝜙(𝑣) 𝑑𝑣

𝑧

−∞

where

𝜙(𝑧) = 1

√2𝜋𝑒𝑥𝑝 (−𝑧² 2)

is the standard normal density function. Here, 𝐺 returns a value between zero and one.

The probit and logit models both increase in 𝑥𝛽. They increase rapidly at 𝑥𝛽 = 0, but the effect on G at extreme values of 𝑥𝛽 tends to zero. As 𝑧 approaches infinity, the limit of G(z) is one. As 𝑧 approaches negative infinity, the limit of 𝐺(𝑧) is zero (Wooldridge, 2014, p. 461).

Probit and logit models give us the opportunity to use a latent variable model approach to analyze the dependent variable. A latent variable model is traditionally used for estimating parameters of interest when the dependent variable is not fully observed, as for example the motivation behind bribing or the culture embedded in the country. If 𝑦_𝑖𝑡^∗ is a latent, or unobserved, variable then we suppose that

𝑦_𝑖𝑡^∗ = 𝛽₀+ 𝛽𝑥_𝑖𝑡+ 𝑒_𝑖 + 𝑢_𝑖𝑡, 𝑦_𝑖𝑡 = 1[𝑦_𝑖𝑡^∗ > 0]

In the binary outcome model, 𝑦_𝑖𝑡 is assumed to equal one if 𝑦_𝑖𝑡^∗ > 0 and to equal zero if 𝑦_𝑖𝑡^∗ ≤ 0. Further, the error term 𝑒_𝑖 is assumed to be independent of 𝑥 and either standard logistically distributed or standard normally distributed. In any case, the residual is symmetrically distributed around zero so 1 − 𝐺(−𝑧) = 𝐺(𝑧) for all real numbers 𝑧. For logit and probit, the direction of the effect of 𝑥_𝑗 on 𝐸(𝑦^∗|𝑥) = 𝛽_𝑜+ 𝑥𝛽 is always the same as the effect on 𝐸(𝑦|𝑥) = 𝐸(𝑦 = 1|𝑥) = 𝐺(𝛽_𝑜+ 𝑥𝛽). However, as the latent variable 𝑦^∗ is not necessarily measurable, the magnitude of each 𝐵_𝑗 is not necessarily useful (Wooldridge, 2014, pp. 461-462).

Maximum likelihood estimation (MLE) is the usual method of estimation for the probit and logit models. Since these are based on the distribution of 𝑦 given 𝑥, there is automatically heteroscedasticity in 𝑉𝑎𝑟(𝑦|𝑥) (Wooldridge, 2014, p. 463). MLE chooses the parameters of the model that maximize the log-likelihood function. We need the log-likelihood function for each 𝑖. The density of 𝑦 given 𝑥_𝑖is

𝑓(𝑦|𝑥_𝑖; 𝛽) = [𝐺(𝑥_𝑖𝛽)]^𝑦[1 − 𝐺(𝑥_𝑖𝛽)]^1−𝑦, 𝑦 = 0,1

The intercept, is for simplicity, absorbed into the vector 𝑥_𝑖. When 𝑦 equals one or zero, we get respectively 𝐺(𝑥_𝑖𝛽) or 1 − 𝐺(𝑥_𝑖𝛽). The log-likelihood function for observation 𝑖 is obtained by taking the log of this density function for 𝑦.

ℓ_𝑖(𝛽) = 𝑦_𝑖log[𝐺(𝑥_𝑖𝛽)] + (1 − 𝑦_𝑖)log [1 − 𝐺(𝑥_𝑖𝛽)]

Thereafter, for a sample size of n, the log-likelihood is obtained by summation of all observations:

ℒ(𝛽) = ∑ ℓ_𝑖(𝛽)

𝑛 𝑖=1

β maximizes the log-likelihood through an iterative process, and is consistent and asymptotically normal (Wooldridge, 2010, p. 568).

9.2.1 Interpretation of regressors

As a non-linear function, the interpretation of the probit and logit model is different from LPM. The partial-effect of a change in one of the independent variables must be calculated, but only the direction of the coefficient is possible to directly interpret. A positive effect in the coefficient implies a higher probability of 𝑦_𝑖 = 1, and a negative coefficient a lower probability of 𝑦_𝑖 = 0 (Wooldridge, 2009). The marginal effects of the explanatory values are calculated either at mean of the explanatory variable or as the average marginal effect. This thesis will use the average marginal effect, as many of the explanatory variables are ordinal and therefore have irrational means.

It is unproblematic to include standard functional forms through the explanatory variables (Wooldridge, 2014, p. 463) if one is aware of the difference in interpretation of the partial effects. The partial effect of 𝑥_𝑗 on 𝑃(𝑦 = 1|𝑥) depends on all 𝑥. 𝛽̂₀ is the predicted probability of success when each 𝑥_𝑗 is set to zero (Wooldridge, 2010, p. 561). The partial effects relevant to the thesis will be discussed below.

Dummy variables

If 𝑥_𝑗 is a binary (dummy) variable, then the partial effect is obtained by holding all others fixed and changing 𝑥_𝑗 from zero to one.

𝐺(𝛽₀+ 𝛽₁+ 𝛽₂𝑥₂+ ⋯ + 𝛽_𝑘𝑥_𝑘) − 𝐺(𝛽₀+ 𝛽₂𝑥₂+ ⋯ + 𝛽_𝑘𝑥_𝑘)

Discrete variables

Similarly, for a discrete variable 𝑥_𝑗, the effect of the probability to go from 𝑐_𝑘 to 𝑐_𝑘+ 1 is 𝐺(𝛽₀+ 𝛽₁+ 𝛽₂𝑥₂+ ⋯ + 𝛽_𝑘(𝑐_𝑘+ 1)) − 𝐺(𝛽₀+ 𝛽₂₁𝑥₂+ ⋯ + 𝛽_𝑘𝑐_𝑘)

Continuous variables

If 𝑥_𝑗 is a continuous variable, the partial effect on 𝑝(𝑥) = 𝑃(𝑦 = 1|𝑥) is obtained from the partial derivative:

𝜕𝑝(𝑥)

𝜕𝑥_𝑗 = 𝑔(𝛽₀+ 𝑥𝛽)𝛽_𝑗

Where 𝑔(𝑧) =^𝑑𝐺

𝑑𝑧(𝑧) is the probability density function. As 𝐺(z) is a strictly increasing cumulative distribution function, 𝑔(𝑧) > 0 for all 𝑧. The partial effect of 𝑥_𝑗 on 𝑃(𝑦 = 1|𝑥) depends on 𝑥 via the positive quantity 𝑔(𝛽₀+ 𝑥𝛽). Hence, the partial effect always has the same sign as 𝛽_𝑗 (Wooldridge, 2014).

Logarithms

As a logarithm, the growth of the variable is normally presumed to decline as the value of the variable increases. If the independent term is a logarithm, the following marginal effect would be (Wooldridge, 2010, p. 567):

𝜕𝑃(𝑦 = 1|𝑧)

𝜕log (𝑧_𝑗) = 𝑔(𝑥𝛽)𝛽_𝑗

where the change in 𝑃(𝑦 = 1|𝑧) given a 1 percent increase in 𝑧_𝑗 is approximately 𝑔(𝑥𝛽)(^𝛽^𝑗

100).

Interaction terms

According to Norton, et al., (2004), the full interaction effect for an interaction term 𝑥₁𝑥₂ is given by

𝜕²Φ(𝑢)

𝜕𝑥₁𝜕𝑥₂ = 𝛽₁₂Φ^′(𝑢) + (𝛽₁𝛽₁₂𝑥₃)(𝛽₂𝛽₁₂𝑥₁)Φ^′′(𝑢)

Where Φ is the standard cumulative distribution function and 𝑢 is the index 𝛽₁𝑥₁+ 𝛽₂𝑥₂+ 𝛽₁₂𝑥₁𝑥₂+ 𝑥𝛽. Hence, the difference from the linear interaction effect has several

implications for nonlinear models. First, even if 𝛽₁₂ = 0, the interaction effect can be different from zero. Additionally, the t-test on the coefficient of the interaction term is not sufficient to test the statistical significance of the interaction term. The statistical

significance of the entire cross derivative has to be calculated. As with continuous variables and logarithms, the interaction effect is conditional on the independent variables. Since the interaction effect depends on two additive variables that can be both positive or negative, the interaction effect may have different signs for different values of covariates. Hence, the sign of 𝛽₁₂ does not necessarily indicate the sign of the interaction effect (Norton, et al., 2004).

9.2.2 Specification tests

The Lagrange Multiplier Test, the Likelihood Ratio (LR) test and the Wald test are the tests commonly used to test exclusion restrictions in probit and logit models. In running my regressions, I make use of the last two.

The Wald test requires a test of the unrestricted model, and thereafter tests restrictions compared to the unrestricted (base) case. It uses an asymptotic chi-square distribution, with 𝑑𝑓 equal to the number of restrictions being tested (Wooldridge, 2014).

The LR test is based on differences in log-likelihood in an unrestricted and a restricted model.

As the MLE is maximizing the log-likelihood function, dismissing variables would have the potential effect of yielding a lower log-likelihood value (Wooldridge, 2014). We therefore test if the magnitude of this potential drop in log-likelihood is big enough to conclude that these dropped values are important.

The likelihood ratio statistic is given by:

𝐿𝑅 = 2(ℒ_𝑢𝑟− ℒ_𝑟)

Where ℒ_𝑢𝑟 is the log-likelihood of the unrestricted model and ℒ_𝑟 is the log-likelihood for the restricted model. As the log-likelihood is always negative, this is a nonnegative and strictly

positive number as ℒ_𝑢𝑟 ≥ ℒ_𝑟. The LR is also chi-square distributed with q degrees of freedom equal to the number of restrictions. The null hypothesis is rejected when LR exceeds a critical value (Wooldridge, 2014).

𝐿𝑅_~^𝑎

χ

_𝑞²

In document The determinants and impact of business corruption : evidence from establishments in Eastern Europe and Central Asia (sider 46-53)