• No results found

Stata: Logistic Regression

N/A
N/A
Protected

Academic year: 2022

Share "Stata: Logistic Regression"

Copied!
39
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Stata:

Logistic Regression

3 h

Hein Stigum

Presentation, data and programs at:

https://www.med.uio.no/helsam/forskning/aktuelt/arrangementer/andre/2022/stata-course-uio.html

(2)

DAG: Physical activity and CHD

• CHD analysis

– Binary outcome

– Plots by physical activity – Compare proportions

– Logistic regression

(3)

Agenda

• Purpose

• Workflow

• Syntax

• Testing assumptions

• Influence of outliers

(4)

BACKGROUND

(5)

Logistic model and assumptions

• Logistic model

• Assumptions of the standard model

– Independent residuals

– Linear effects (on the log-odds scale)

– No interactions

linear predictor: xb

(6)

Association measure, Odds ratio

Model:

 

Start with:

Hence:

(7)

Short: need to know

• Binary outcome

• Assume

– Linear effects on the log-odds scale

• Association measure

– OR=e b , b=coefficient

• Scale

– Multiplicative exposed to both x

1

and x

2

: OR

1

*OR

2

(8)

Purpose of regression

• Estimation

– Estimate effect of exposure on outcome adjusted for other covariates

Estimate the effect of smoking on lung cancer

• Prediction

Predict outcome by exposures

1. Estimate model (air pollution and distance from roads)

2. Predict air pollution in a new dataset using distance from roads

DAGs, bias, precision

Predictive power, model fit, R

2

(9)

Syntax

• Estimation

– logistic y x1 x2 logistic regression

– logistic y i.smoke c.age cat. smoke, cont. age – logistic y i.smoke##c.age interaction, 3 terms

• Manage models

– estimates store m1 save model – est table m1, eform show OR

• Post estimation

– predict yf, pr predict probability

– margins, over(ageI) predict(xb) linearity on the log odds scale

(10)

Workflow

• DAG

Confounders: age, educ  adjust Risk factors: sex   include*

* sex specific estimate otherwise population estimate

• Bivariate analysis

• Regression

– Model fitting

• Exposure

• + Confounders

– Test of assumptions

• Independent errors

• Linear effects (on the log odds scale)

• Interactions

– Influence of outliers

(Daniel et al. 2020)

(11)

Syntax

“Descriptive Analysis”

(12)

Physical activity and CHD, example

 

21 pp lower risk

0.22 times the risk 0.17 times the odds

(13)

Syntax

“Regression Analysis”

(14)

ASSUMPTIONS

(15)

Assumptions of the standard model

1. Independent residuals

2. Linear effects on the log-odds scale

3. No interactions

discuss

add interactions

add splines

When will the

heart disease of one person depend on the

heart disease of another?

Dependent residuals?

Siblings, twins

logistic …, vce(cluster(m_id)) If many siblings:

clusters by mother’s id

or use mixed models

(16)

Non-linear effects

(17)

Smoothers in regressions

• Polynomials

– x, x 2, x 3

• Splines

– cubic – linear

• Fractional polynomials (2 of 8)

x -2 , x -1 , x -0.5 log(x), x 0.5 x, x 2 , x 3

c

1

c

2

estimates only plots

knots y

x

y

x

Polynomials:  global

(18)

Syntax

“Non-linear effect”

(19)

INTERACTION

Effect modification

(20)

Interaction

• Interaction: combined effect of two variables

• Example

y= b

0

+b

1

x+b

2

sex effect of x does not depend on sex

y= b

0

+b

1

x+b

2

sex+ b

3

x∙sex effect of x depends on sex (interaction)

• Test

– Interaction if b

3

≠0

• Scale

– Linear models additive

– Logistic, Poisson, Cox multiplicative – Interaction is scale dependent

No interaction on the additive scale implies interaction on other scales

(21)

Interaction

Is the effect of physical activity on heart disease the same for low and high education?

Syntax:

logistic chd c.age c.phys##i.educ Terms:

… c.phys i.educ c.phys#i.educ

main effect interaction effect

_b[phys] +0* _b[1.educ#c.phys] ) exp(

Effect of physical activity for low and high education:

educ=0 _b[phys] +1* _b[1.educ#c.phys] )

exp( educ=1

(22)

Syntax

“Interaction”

(23)

INFLUENCE

Measures of influence of outliers

(24)

Measures of influence

• Measure change in:

– Coefficients (beta)

• Delta beta

Remove obs 1, see change remove  obs 2, see change

-. 6 -. 4 -. 2 0 .2 In flu en ce

1 2 10

Id

One delta-beta per observations

(with same covariate pattern)

for all covariates

(25)

Syntax

“Influence”

(26)

MARGINS

Predictions from the model

(27)

Margins

• Helpful to predict the probability of the outcome over exposure.

• "margins" handles interactions and non- linearities

• "margins" can be followed by "marginsplot“

Predicting from the model does not make this a “prediction model”. 

Our modeling strategy using DAGs make this an estimation model.

(28)

Margins, Examples

• Model

mkspline cs=age, cubic nk(4) 3 splines: cs?

logistic chd c.phys##c.cs? i.educ sex phys*cs?

• Margins examples

margins overall risk

margins educ risk by educ (cat)

margins, at(sex=(0 1)) risk by sex (cat or cont) margins, at(phys=(3 6 9 12)) table of risks

• Conditional vs marginal

logistic y x a b c model

margins, dydx(x) at(a=1) effect of x on y, conditional on a,

marginal over b and c

(29)

Margins plot

• Pr(CHD) by age for low and high phys

margins, at(phys=(1 15)) over(ageI) integer age

marginsplot, xdim(ageI) x-dimension=age

(30)

Syntax

“Margins”

(31)

Summing up 1

• Build model

logistic chd phys crude model

est store m1 store

logistic chd phys age educ full model

est store m2 store

est table m1 m2, eform compare ORs

• Non-linearity (cubic spline)

mkspline cs=phys, cubic nk(4) spline in phys: cs1, cs2, cs3 logistic chd cs? age educ regression with spline

margins, over(physI) predict(xb) predict on log-odds scale

marginsplot

(32)

Summing up 2

• Interaction

– logistic chd c.phys##i.sex test interaction

• Influence of outliers

– predict dBeta, db delta beta (common)

– scatter dBeta p, jitter(10) delta-beta by p, pr(outcome)

• Predictions from the model

– margins educ, at(phys=(1(1)15))

– marginsplot

(33)

References

• Binder H, Sauerbrei W, Royston P. 2013. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: A simulation study with continuous response. Stat Med 32:2262-2277.

• Daniel R, Zhang J, Farewell D. 2020. Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets. Biom J.

• Govindarajulu US, Malloy EJ, Ganguli B, Spiegelman D, Eisen EA. 2009. The comparison of alternative smoothing methods for fitting non-linear exposure-response relationships with cox models in a simulation study. Int J Biostat 5.

• Kahan BC, Rushton H, Morris TP, Daniel RM. 2016. A comparison of methods to adjust for continuous covariates in the analysis of randomised trials. BMC medical research methodology 16.

• Pregibon D. 1981. Logistic regression diagnostics. The Annals of Statistics 9: 705-724.

• Robinson LD, Jewell NP. 1991. Some surprising results about covariate adjustment in logistic-regression models. Int Stat Rev 59:227-240.

(34)

EXTRA SLIDES

(35)

Generalized Linear Models, GLM

250030003500400045005000birth weight (gram)

250 270 290 310

gestational age (days)

0.2.4.6.81risk

0 20 40 60 80

age

Linear regression

Logistic regression

Poisson regression

51015

(36)

Syntax

“Binary regression,

OR, RR and RD effect measures”

(37)

The end

(38)

Stata regression commands

(39)

Regression with simple error structure

– regress linear regression (also heteroschedastic errors)

– nl non linear least squares

GLM

– logistic logistic regression – poisson Poisson regression

– binregbinary outcome, OR, RR, or RD effect measures

Conditional logistc

– clogit for matched case-control data

Categorical outcome (>2 categories)

– mlogitmultinomial logit (not ordered) – ologit ordered logit

Regression with complex error structure

Referanser

RELATERTE DOKUMENTER