• No results found

data, and can I run the analyses

N/A
N/A
Protected

Academic year: 2022

Share "data, and can I run the analyses"

Copied!
28
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

What are frailty models for survival data, and can I run the analyses

myself?

TRON ANDERS MOGER

Department of Health Management and Health Economics

Institute of Health and Society University of Oslo

(2)

Introduction

• Talk borrows a lot of material from this paper:

(3)

What are frailty models?

Frailty models are random effect models for time to event data

Means that you include a random term (the frailty term) in the model, following some distribution chosen by you

The frailty term can be interpreted as the amount of heterogeneity in time to events caused by

unobserved factors

Means that the interpretation of effect of observed covariates in the model is conditional on comparing two individuals with same frailty value

(4)

What are frailty models cont’d:

• Frailty models are commonly used in

situations where there is dependence in the data

• The frailty term is then constructed in a way that models this dependence

• The strength of dependence can then be estimated and reported in various ways

• I will cover some of the simplest models for this, but will see later that frailty models can be expanded to very complicated data

(5)

Mathematical definition of frailty models

Multiplicative frailty model:

Individual hazard =Z*λ(t)

Z is a frailty variable; measures unobserved heterogeneity

High value of Z implies high risk of disease

λ(t) is a basic hazard; often common to all individuals

Survival function

Covariates (observed heterogeneity) often included in the model, e.g. as a Cox term:

Individual hazard =Z*exp(βx)*λ0(t)

)) ( ( )

( )) ( (

exp )

(t z t g z dz L t

S

Z

(6)

Illustrations of frailty effect on population hazard

From Haugen et al.

Frailty modeling of bimodal age-incidence curves of nasopharyngeal carcinoma in low-risk populations Biostatistics 2009

From Aalen

Effects of frailty in survival analysis

Statistical methods in Medical Research 1994

(7)

Simplest model for creating dependence

• The shared frailty model

• All observations within a cluster (e.g. family) will share the same value of the frailty term

• This obviously will create dependence in the time to events

• Most common distribution for the frailty is then the gamma distribution (frailties will often have a skewed distribution)

• Other choices could be lognormal or log-t

(8)

Can I do frailty analysis myself?

• Yes, in the sense that the shared frailty model is implemented in all standard software:

option frailty in coxph in R option frailty in streg in Stata

option random in proc phreg in SAS

• These options only need a cluster id variable (i.e. numbers that are shared for observations within a cluster)

• One then gets a p-value on whether the frailty is significant, among other things

(9)

Usefulness of frailty models

If there is some structure in your data that could

create dependence in times to events, the standard option might be to run a standard Cox regression anyway with some option creating robust (or

unbiased) standard errors for the regression coefficients

What is then the usefulness of quantifying the dependence?

What can you get in addition from frailty models compared with simpler alternatives?

This is the topic for the remainder of my talk

(10)

Background on insurance data

• Do insurance companies use statistical methods? Yes!

• E.g. logistic regression is popular

• In Norway, the actuarial education is a

specialization within statistics at University level

• Internationally there are numerous journals dedicated to research within insurance – also development of new methods

(11)

Background cont’d

Would like to introduce new methods (for

insurance companies) that can be used on a daily basis by actuaries

That is: No programming involved, only use

commands/methods implemented in standard software like R or SAS

Based on the fact that insurance companies also have vast amounts of time to event data

Got access to a data set from Gjensidige on time- to-lapse for company car fleet customers

(12)

Frailty options in R

The paper includes references to source articles for most of the different frailty options in R

Only methods that are implemented in the standard survival library in R are used in the paper

However, there is also a package called frailtypack in R with even more options – e.g. you get estimates of standard errors of frailty parameters

Example also shows that it is not a problem to run analyses on large data sets (provided access to

supercomputer)

(13)

Motivation

• Corporate customers of car policies often hold multiple contracts and this might give

dependence between the lapsing times of the single policies

• Methods for analyzing data that are both time-to-event and are not independent are not so established in the insurance industry

• This can be analyzed by frailty models, which was the focus of Marion’s PhD

(14)

Aim:

• Using data that are routinely stored by most insurance companies in the world:

• What are the characteristics of customers who are most likely to leave the company (at the

single car policy level)?

• In presence of dependence in the data: How wrong will results from methods disregarding the dependence be?

• How large is the effect of unobserved factors causing the dependence?

(15)

Data

Corporate customers in a car insurance portfolio for small- and medium-sized companies in

Gjensidige 1999-2007

48.040 unique customers and 108.274 unique cars

Outcome: Time to lapse of single car policies

Covariates: premium, bonus, discounts, number of covers, area, car brand, usage, indicator of less than 30 days till due date (some are time

dependent)

(16)

Info on covariates:

(17)

Alternative ways of analyzing data:

Standard: Cox regression w/time-dependent covariates. Assumes cars are independent in time-to-lapse also within a single company

New: Shared frailty model w/time-dependent covariates. Is a random effect model for survival data where dependence is modeled by letting

cars in the same company have the same value of the random effect (frailty) variable. Use gamma distribution for random effect.

(18)

Descriptives

59% of customers had only one car contract while 98% had at most 10 cars insured.

70.751 (65%) cars left Gjensidige during the study period, 46% of the customers only had one lapse and 96% of the customers had at most five lapses.

Single car policies entered and left the sample, lapses occurred without the customer leaving the company.

Among the customers holding at least two car contracts

(41% of the customers), 17% lapsed all the contracts within a maximum of two months, 49% lapsed some of the

contracts within a maximum of two months and 34% only had one lapse during the study period.

The yearly turnover in the data was approximately 21%.

(19)

Results of modelling

• The dependence was significant

• As a measure of the correlation between time- to-lapse within companies, one frequent

measure for frailty models is Kendall’s tau (can be between -1 and +1)

• This is 0.34 for the final frailty model, hence should expect the correlation to be large

enough to matter

(20)

Examples of regression coefficients

from frailty and Cox models

(21)

Covariate values for four

illustration cars

(22)

Illustration of effects of continuous

covariates:

(23)

Estimated frailty values in data

In R, the actual estimated frailty values are stored under «frail» in the coxph-object

(24)

Illustration of difference between Cox and frailty models

Individual survival curves can be constructed by

using the basehaz function in R

(25)

Illustration of effect of having another car in the cluster cancelling the policy

Conditional survival curves are also constructed using the basehaz function in R

(26)

Goodness of fit

This is a more difficult topic for frailty models, few options are implemented

One option is to compare the estimated model with Kaplan-Meier curves for various covariate values (not shown in here)

However, the frailty models dependence, and dependence may not be constant over time

Different distributions for the frailty capture different types of dependence: gamma distribution captures late dependence, so late events are more important than early events

Not much is discussed on this point in the paper

(27)

Conclusion on data

Have seen that the correlation structure in the data matter, and should not be ignored

Can make many useful presentations of the effect of the frailty variable, even though it is just a

random effect

This should hopefully make insurance companies (and others) more aware of both the fact that

they have many types of data with time-to-event outcomes, and dependence that should be

analyzed in a proper way

And one may use it to differentiate between customers

(28)

Conclusion on frailty models

• Is in its simplest case a straightforward extension to a standard Cox model

• Have shown that it is possible to make very useful illustrations of the frailty effect, using information that is stored in the estimated model

• Parts of this information (estimates of frailty values and baseline hazard) are not normally utilized by the novice frailty modeler

• But it can be used with interesting results!

Referanser

RELATERTE DOKUMENTER

We model the statistical distribution of catch data as a mixture of two processes, population abundance and random trawl efficiency, making the following assumptions 1) individual

In this study, survival (0/1) was analyzed using six different models; a linear direct-effect sire model, a sire model with a random cage effect, a direct-indirect-random

In this paper, we use multi-state models for time-to-event data to assess the long-term effects of completing upper secondary education on employment, tertiary education, sick

To this end a particular discrete/continuous random utility choice model is developed, in which the probability distribution of the prices and quality indexes of the

The Random Forest model-family is a non-parametric model-family, quite different from logistic regression and neural networks. Random forest models are applicable for

In some cases we see that the importance assigned to the random factor variable is higher in the random forest model where the random factor variable is encoded as a

Models that include random individual and …rm e¤ects as well as random individual and …xed …rm e¤ects are of substantial interest— both types of model allow for the

Second, Dagsvik (1991) has demonstrated that the subclass of random utility models generated from max-stable processes is dense in the class of random utility models, in the...