What are frailty models for survival data, and can I run the analyses
myself?
TRON ANDERS MOGER
Department of Health Management and Health Economics
Institute of Health and Society University of Oslo
Introduction
• Talk borrows a lot of material from this paper:
What are frailty models?
• Frailty models are random effect models for time to event data
• Means that you include a random term (the frailty term) in the model, following some distribution chosen by you
• The frailty term can be interpreted as the amount of heterogeneity in time to events caused by
unobserved factors
• Means that the interpretation of effect of observed covariates in the model is conditional on comparing two individuals with same frailty value
What are frailty models cont’d:
• Frailty models are commonly used in
situations where there is dependence in the data
• The frailty term is then constructed in a way that models this dependence
• The strength of dependence can then be estimated and reported in various ways
• I will cover some of the simplest models for this, but will see later that frailty models can be expanded to very complicated data
Mathematical definition of frailty models
• Multiplicative frailty model:
Individual hazard =Z*λ(t)
• Z is a frailty variable; measures unobserved heterogeneity
• High value of Z implies high risk of disease
• λ(t) is a basic hazard; often common to all individuals
• Survival function
• Covariates (observed heterogeneity) often included in the model, e.g. as a Cox term:
Individual hazard =Z*exp(βx)*λ0(t)
)) ( ( )
( )) ( (
exp )
(t z t g z dz L t
S
Z Illustrations of frailty effect on population hazard
From Haugen et al.
Frailty modeling of bimodal age-incidence curves of nasopharyngeal carcinoma in low-risk populations Biostatistics 2009
From Aalen
Effects of frailty in survival analysis
Statistical methods in Medical Research 1994
Simplest model for creating dependence
• The shared frailty model
• All observations within a cluster (e.g. family) will share the same value of the frailty term
• This obviously will create dependence in the time to events
• Most common distribution for the frailty is then the gamma distribution (frailties will often have a skewed distribution)
• Other choices could be lognormal or log-t
Can I do frailty analysis myself?
• Yes, in the sense that the shared frailty model is implemented in all standard software:
– option frailty in coxph in R – option frailty in streg in Stata
– option random in proc phreg in SAS
• These options only need a cluster id variable (i.e. numbers that are shared for observations within a cluster)
• One then gets a p-value on whether the frailty is significant, among other things
Usefulness of frailty models
• If there is some structure in your data that could
create dependence in times to events, the standard option might be to run a standard Cox regression anyway with some option creating robust (or
unbiased) standard errors for the regression coefficients
• What is then the usefulness of quantifying the dependence?
• What can you get in addition from frailty models compared with simpler alternatives?
• This is the topic for the remainder of my talk
Background on insurance data
• Do insurance companies use statistical methods? Yes!
• E.g. logistic regression is popular
• In Norway, the actuarial education is a
specialization within statistics at University level
• Internationally there are numerous journals dedicated to research within insurance – also development of new methods
Background cont’d
• Would like to introduce new methods (for
insurance companies) that can be used on a daily basis by actuaries
• That is: No programming involved, only use
commands/methods implemented in standard software like R or SAS
• Based on the fact that insurance companies also have vast amounts of time to event data
• Got access to a data set from Gjensidige on time- to-lapse for company car fleet customers
Frailty options in R
• The paper includes references to source articles for most of the different frailty options in R
• Only methods that are implemented in the standard survival library in R are used in the paper
• However, there is also a package called frailtypack in R with even more options – e.g. you get estimates of standard errors of frailty parameters
• Example also shows that it is not a problem to run analyses on large data sets (provided access to
supercomputer)
Motivation
• Corporate customers of car policies often hold multiple contracts and this might give
dependence between the lapsing times of the single policies
• Methods for analyzing data that are both time-to-event and are not independent are not so established in the insurance industry
• This can be analyzed by frailty models, which was the focus of Marion’s PhD
Aim:
• Using data that are routinely stored by most insurance companies in the world:
• What are the characteristics of customers who are most likely to leave the company (at the
single car policy level)?
• In presence of dependence in the data: How wrong will results from methods disregarding the dependence be?
• How large is the effect of unobserved factors causing the dependence?
Data
• Corporate customers in a car insurance portfolio for small- and medium-sized companies in
Gjensidige 1999-2007
• 48.040 unique customers and 108.274 unique cars
• Outcome: Time to lapse of single car policies
• Covariates: premium, bonus, discounts, number of covers, area, car brand, usage, indicator of less than 30 days till due date (some are time
dependent)
Info on covariates:
Alternative ways of analyzing data:
• Standard: Cox regression w/time-dependent covariates. Assumes cars are independent in time-to-lapse also within a single company
• New: Shared frailty model w/time-dependent covariates. Is a random effect model for survival data where dependence is modeled by letting
cars in the same company have the same value of the random effect (frailty) variable. Use gamma distribution for random effect.
Descriptives
• 59% of customers had only one car contract while 98% had at most 10 cars insured.
• 70.751 (65%) cars left Gjensidige during the study period, 46% of the customers only had one lapse and 96% of the customers had at most five lapses.
• Single car policies entered and left the sample, lapses occurred without the customer leaving the company.
Among the customers holding at least two car contracts
(41% of the customers), 17% lapsed all the contracts within a maximum of two months, 49% lapsed some of the
contracts within a maximum of two months and 34% only had one lapse during the study period.
• The yearly turnover in the data was approximately 21%.
Results of modelling
• The dependence was significant
• As a measure of the correlation between time- to-lapse within companies, one frequent
measure for frailty models is Kendall’s tau (can be between -1 and +1)
• This is 0.34 for the final frailty model, hence should expect the correlation to be large
enough to matter
Examples of regression coefficients
from frailty and Cox models
Covariate values for four
illustration cars
Illustration of effects of continuous
covariates:
Estimated frailty values in data
In R, the actual estimated frailty values are stored under «frail» in the coxph-object
Illustration of difference between Cox and frailty models
Individual survival curves can be constructed by
using the basehaz function in R
Illustration of effect of having another car in the cluster cancelling the policy
Conditional survival curves are also constructed using the basehaz function in R
Goodness of fit
• This is a more difficult topic for frailty models, few options are implemented
• One option is to compare the estimated model with Kaplan-Meier curves for various covariate values (not shown in here)
• However, the frailty models dependence, and dependence may not be constant over time
• Different distributions for the frailty capture different types of dependence: gamma distribution captures late dependence, so late events are more important than early events
• Not much is discussed on this point in the paper
Conclusion on data
• Have seen that the correlation structure in the data matter, and should not be ignored
• Can make many useful presentations of the effect of the frailty variable, even though it is just a
random effect
• This should hopefully make insurance companies (and others) more aware of both the fact that
they have many types of data with time-to-event outcomes, and dependence that should be
analyzed in a proper way
• And one may use it to differentiate between customers
Conclusion on frailty models
• Is in its simplest case a straightforward extension to a standard Cox model
• Have shown that it is possible to make very useful illustrations of the frailty effect, using information that is stored in the estimated model
• Parts of this information (estimates of frailty values and baseline hazard) are not normally utilized by the novice frailty modeler
• But it can be used with interesting results!