by
0rnulf Borgan
by
¢rnulf Borgan
Institute of Mathematics University of Oslo
January 1990
PREFACE
The first draft of these lecture notes was written the autumn
semester 1984, when I gave a course in Life history analysis at the Institute of Mathematics, University of Oslo. In the preparation of the notes I then to some extent used earlier hand-written notes by Jan M. Hoem. The notes were revised in 1986.
It has since then been my intention to revise the notes more thoroughly and extend them wit~ a number of practical examples.
However, in the competition fo~ my time, other projects have been given higher priority, and the plan of a substantial revision of the notes has never materialized. Therefore, I have now decided to let the notes appear as a Statistical Memoir almost in the form they had in 1986. The hope for a more thorough .. revision in the future is still alive, however; .:· ... :_
···v ,, -..
. '" ·-. ~-
: ~ ' ' ..... ~ .I
Oslo, January l;9;9Q
¢n ~u ~ t '".' ~~J;g~~
,_:;-:·
. -· ' :' t-1 .:r! .. ~. l . •
-· ' -- ._.. .... ·'
:··,·--·; j~ -~ ;. .
. .., . -.,_-~--..• · -· . . .
. . • ;: :···_:-:,_If''.: ·=··.
: ;·_ ._·: :~:-, .:; -::.. ''" ~:;-. ;..
CONTENTS
1. INTRODUCTION
2. PARAMETRIC INFERENCE IN SURVIVAL ANALYSIS 2.1 Basic results
2.2 Fixed censoring
A. Probability distribution
B. Maximum likelihood estimation
c.
ML-estimation for the exponential distribution D. ML-estimation in the general caseE. Piece-wise constant intensity 2.3 Random censoring
A. Model and ML-estimation
B. Asymptotic properties in the exponential case
c.
Asymptotic properties for the general case 2.4 A general resultA. The general case
B. Piecewise constant intensity
3. PARAMETRIC INFERENCE FOR LIFE HISTORY MODELS 3.1 Markov chains
3.2 The multiple decrement model A. Probabilistic properties
B. ML-estimation of constant intensities
c.
ML-estimation for the general case 3.3 The disability modelA. Fixed censorship, the general case B. Fixed censorship, constant intensities 3.4 A general result
A. The general case
B. Piecewise constant intensities 4. INCOMPLETE OBSERVATIONAL PLANS
4.1 Incomplete observations from an exponential distribution 4.2 Working life tables
A. Probabilistic results
B. Estimation of piecewise constant intensities 5. GRADUATION OF RAW ESTIMATES
5.1 Analytic graduation
5.2 Graduation by moving averages 6. NONPARAMETRIC METHODS·
6.1 The Kaplan-Meier estimator 6.2 The Nelson-Aalen estimator 6.3 Nonparametric tests
A. A one-sample test B. The log-rank test
7. REGRESSION MODELS
7.1 General introduction
7.2 Parametric regression models
7.3 Cox's semi-parametric regression model 7.4 Extensions
8. PURGED AND PARTIAL MARKOV CHAINS 8.1 Partial Markov chain models 8.2 Purged Markov chains
REFERENCES
. ' .. :: .
._: ..
. ,J
1. INTRODUCTION
Life history analysis is a field of statistics concerned with inference in simple stochastic processes, often Markov processes with a finite state space. Especially one has in mind situations
where each individual sample path represents a segment of the life of a person, an animal, a technical component or whatever one is studying. The states of the processes correspond to the various statuses considered (e.g. unmarried, married, married before), while the transitions between the states correspond to the events of interest (e.g. marriage, divorce). The study of the simplest situation, in which there is only the two states "alive" and "dead"
(or "functioning" and "not functioning"), is often called life- table analysis, survival analysis, or failure time analysis. The statistical theory of life history analysis finds applications in many fields, e.g. actuarial science, biostatistics, demography and reliability analysis.
Life-table analysis has been studied for centuries by
actuaries and demographers. Also other life-history models, like the disability model much used in insurance, have a history going back to the start of our century. In spite of this long history, life history analysis has been one of the most active fields for research in statistics the last 10-20 years, and new important contributions are steadily appearing.
In these lecture notes we will only be able to cover a small part of the theory of life history analysis. The interested
readers may find more material in the text-books and review papers by Kalbfleisch and Prentice (1980), Miller (1981 ), Hoem and Jensen
(1982), Andersen and V~th (1984), Cox and Oakes (1984), and Andersen and Bergan (1985).
Finally some words about notation.
(i) The normal distribution with mean ~ and variance o 2 will be denoted N(~,o2), and the p-dimensional multivariate normal distribution with mean
£
and covariance matrix ,... E will be deno- ted Np(~,f). If for a sequence ~1
,~2
, ... of random vectorsf:n(X -~) D + N (O,E), as n+~, we say that X is asymptotically
-n - p - - -n
multinormally distributed with mean ~ and asymptotic covariance matrix -E. 1
n-
(ii) For a function h:~+R, we use the short-hand notation 0 h (~
0
) forox.
J(iii) If
o I
pox. h(~) x=x ' ~'~OER ·
J -
-o
u(x)/v(x) is bounded (tends to zero), as write u(x)
=
O(v(x)) (u(x) = o(v(x))), as x+L.x+L, we
(iv) Within a subsection, an example, figure, etc. is refered to by its number. · When refered to from another subsection, we e.g.
write Example 2.1 .3 for Example 3 of Subsection 2.1.
2. PARAMETRIC INFERENCE IN SURVIVAL ANALYSIS
Before we turn to general life history models in the next chapter, we wi l l here in some det ails consider parametric inference for the survival analysis set-up.
2.1. Basic results.
We consider a positive random variable U with absolutely continu- ous distribution function F(u;~)
=
P(U~u), depending on a parame- ter ~ = (a1, ... ,ap)'Ee
f
RP, and density f(u;£) = F' (u;£). This random variable may represent the future age of death of a newly born individual, time from diagnosis of cancer to death for a pati- ent with cancer, time to breakdown for a technical component, or some such quantity. We will mainly denote U an age of death, but i t should be kept in mind that U may also have other interpreta- tions.In survival analysis one usually takes an optimistic attitude and considers the survival distribution
S(u;~)
=
1-F(u;~)=
P(U>u) ( 2 • 1 )instead of the distribution function itself. Furthermore, we in- troduce the death intensity or the hazard rate function, defined as
=
f(u;a)/S(u;a)-
... ( 2. 2)for u>O such that S(u;a)>O.
-
This quantity is also denoted the force of mortality. To get an alternative, and more directly interpretable, expression for ~(u;~), note thatP(u<U(u+~ujU>u) • [s(u;e)-s(u+~u;e)]/S(u;e).
- - -
Hence, using (2.1) and (2.2), we get
~(u7~) = -S' (u;~}/S(u;~) = lim P(u<U(u+~ujU>u}/~u.
~u.j-0
( 2. 3 )
This shows that for small values of ~u, ~{u;~)~u equals approx- imately the probability of dying in (u,u+~u] for an individual who is still alive at age u. In this respect ~(·;~) measures the instantaneous risk of dying.
To express the survival distribution and the density in terms of the death intensity, we see that (2.3} yields
~(u;9)
=-
d --d !nS(u;9) .- u -
The well known formula
u
-j~(v;9)dv 0 -
S(u;~)
=
eresults. Differentiating (2.4) we get
f(u;~)
=
u
-j~(v;9}dv 0 -
~(u;~)e
( 2. 4)
( 2. 5}
Actuaries and demographers often use a special notation for the situation we consider. This notation will only occationally be used in these lecture notes. Introduce
P
=
P(U>x+ujU>x),U X
and note that p - S(u·9}
u 0 - ·- . By (2.4), we immediately get
'·
x+u
J
l!(v;9)dv-
X ( 2. 6)
which is an important formula. Another important quantity is
It is common to write and for the one-year probabilities, i.e. and when age is measured in years.
Example 1 - Exponential distribution.
The exponential distribution is much used in life testing of technical components. Here we have
2 =
!!>0, andF(u;~)
=
1-e -!!U ; f(u;~)=
IJ.e -IJ.U ;S(u;~)
=
e -!!U and IJ.(U;~)-
IJ.for u>O.
Example 2 - Weibull distribution.
The Weibull distribution is a generalization of the exponential.
Its main application is in life testing in engineering settings.
For the Weibull distribution we have with ~
=
(a,IJ.)'; a,IJ.>O;F(u;2)
f(u;~) S(u;~)
and
for u>O.
= 1-e -IJ.U
= aiJ.U a-1 a
=
e -IJ.U i=
al!u a-1 a;
a e -IJ.U
Example 3 - Gompertz-Makeham distribution.
The Gompertz-Makeham distribution is much used for modelling human mortality, especially for adult ages. It is given by having the intensity 1-4(u;e)
- =
a:+~c u •2.2. Fixed censoring.
A. Probability distribution.
In applications in survival analysis, one may often not be able to observe the exact value of a life time. Instead one is only able to observe the censored survival time
T
=
min ( U, z),for some fixed censoring time z, along with an indicator for death/censoring:
if U(z;
i f U>z.
This type of censorship is often called Type I or fixed censoring.
We will encounter other forms of censoring below.
The distribution function of (T,D) is now given by
F(t;~) for O<t<z, d = 1 : F(z;£) for t>z, d = 1 :
G(t,d;~) = P(T(t,D=d) =
0 for O<t<z, d = 0;
1-F(z;e) for t>z, d = 0.
Thus, the elementary probability function (i.e. the density w.r.t.
a suitable measure) is given as
d .
dtG(t,d;~) • f(t;~) for 0 < t < z , d
=
1 ;g(t,d;~) a: G(t,d;9)
...
-G(t- ,d;9)...
= 1-F(t;9) for t=z,d=O;0 otherwise.
This is a mixture of an ordinary density funct ion (i.e. w.r.t.
Lebesgue measure) and a discrete distribution function. Using ( 2 • 1 ) , ( 2 • 4) and ( 2 • 5) we get
g(t,d;~} =
!-l(t;9}e
...
-t j~(v;9}dv 0 ...
e t
- /1J.(v;9)dv 0 ...
This may be written in a closed form as t
-fiJ.(V;~)dv
g(t,d;~) = IJ.(t;~)d e 0
B. Maximum likelihood estimation.
for O<t<z, d = 1;
for t
=
z, d=O.( 2. 7)
Let (T. ,D.); i = l, . . . ,n; be independent pair of random variables,
~ ~
all distributed as (T,D) in Subsection A above. We want to derive the maximum likelihood estimator (ML-estimator) for
... e .
as
By (2.7), the likelihood corresponding to
D. ~
A.(9) = g(T.,D.;9) = 1J.(T.;9)
~ .... ~ ~ ~
...
eT. ~
-f
IJ.(v;~)dv 0and the total likelihood is
A(~)
=
T. ~
-f
IJ.(V;~)dvn D.
II { 1J. (T. ; 9) ~ e 0 } . i=l ~ ...
(T. ,D.)
~ ~ is given
( 2. 8)
The ML-estimator ~ for 9 is the value of ~ which maximizes
(2.8), or equivalently maximizes
n n T. ~
.R.nA(~)
= L
D.!n~(T.;S)-L f
~(v;S)dv.i=l ~ ~ - i=l 0 -
( 2 • 9 )
(Note that we with an abuse of notation use the symbol
2
both for the true parameter value and for the argument in the likelihood function.) Now-
~ is found by solving the set of equations= 0; j = l, .•• ,p;
and checking that the solution of these equations yields a maximum of ( 2. 9) • Thus, t:..e ....
0 (':..
~~(T.;e)
n oo. ~ -
L
D. ~i=l ~ ~ (T. ; e)
~ ....
is the solution of the set of equations T. ~ 0 (':..
b
oej~(v;~)dv = 0; j = 1, ..• ,p ni=l
I
c.
ML-estimation for the exponential distribution.(2.10)
The exponential distribution is important in life testing. We will in this subsection give a fairly detailed study of ML-estimation based on a Type I censored sample from an exponential distribution with parameter ~· By (2.8) and the results of Example 2.1 .1 we have the likelihood
n D. -~T.
A(~) = IT {~ ~e ~} = ~De-~T, i=l
n n where D =
L
D.. 1 ~
~=
is the observed number of deaths and T =
L
T.i=l ~ is the total exposure for all the individuals counted together.
Thus
and
D - T,
~
Hence the ML-estimator 1\ ~ is given by
i.e. i t is the occurrence/exposure rate
1\ ~
=
D/T. (2.11)The exact distribution of 1\ ~ is very complicated (Hoern,
1969a), and we shall not discuss it here. Using this exact distri- bution, Beyer, Keiding and Simonsen (1976) gave the following ex- pressions for the expectation and variance of
Let p
=
and q=
1-e -~z . Then( 2. 1 2a) and
2 1
=
L[1+ -{4+(2~2z2-1o~z-7)p nq nq3+ (3~2z2+10~z+2)p2+p3 }] + O(n-3), ( 2. 1 2b) as n+m, for fixed ~Z·
It is seen that even for the fairly simple situation with censored data we are considering, exact distributional results are complicated. Therefore, as is typical in life history analysis, we have to be content with asymptotic distributional results, valid as n gets large.
Since (T. ,D.): ~ ~ i
=
1, . . . ,n: are independent and identically distributed, standard results for the ML-estimator give that(2.13) as n +m, where
~ 2 .lnA . ( ~ ) -1 a2
= {
-E 1 }a~2
Here A.(~) is the contribution to the total likelihood carre-
l.
sponding to a single individual, i.e.
Now
and
= -D.
I
~2 Il.
ED.
=
P(D.=l) = P(U.~z)l. l. 1
It follows that
=
1-e -~z = q.(2.14) cf. (2.12b) .. We may give an alternative expression for a2 • To this end we use integration by parts to get
ET. 1
If we introduce ~
=
ET., we have the following alternative1
expression
(2.15) The main point with (2.15) is that this form of a2 may be gene- ralized to other censoring schemes.
The two formulas ( 2. 1 4) and ( 2. 1 5) give rise to the following two consistent estimators for a2 :
(2.16) and
'"'
·,1\
/\, u
a~ • ~ (2.17)
T n
where T
= I
T./n.. 1 l. It should be realized that (2.17) is nothing J.=
2 (/\) -1
{- l a
lnA ~ } . As far as I know, it is unknown which ofn o~2
but
these two estimators is the best. An advantage with (2.17) is that this form of the estimator continues to hold also for other
censoring schemes. Also the asymptotic variance of 1\ ~' J..e. . a 2 /n, may be estimated by
(2. 18) Thus, we do not need t o know n, i t is enough to know and T.
For practical purposes it is important to know how ~ood the asymptotic distribution of 1\ ~ is as an approximation to its exact distribution for finite n. Schou and V~th (1980) have studied this problem by means of stochastic simulations. Let us briefly recapitulate their main results.
The normal approximation to the distribution of 1\ I! is good for nz11~100 in the sense that
(2.19)
has a distribution which is approximately equal to the standard normal distribution. For nzl!
=
100 this holds true except for fractiles less than the 1%-fractile or larger than the 99%- fractile. For nZI!=
10 the approximation is satisfactory for fractiles between 10% and 90%, but it is bad for the tails.If f is a real-valued function, which is differentiable in a neighbourhood of ~ and has continuous derivative at 1!1 with
f' (I!) :fO, then
as n+m, and hencefor th
1\ D
f(~)-f(e) + N(O, l)
If'<~>
IIf;;
as n+=.
Now the question is wtether one by choosing f in a clever way can get a statistic fo~ which the normal approximation works better than it did for (2 .. 9) . Schou and V<Eth (1980) shows that it pays to take f(x)
=
x 1/3 , i.e. to useFor this statistic the nor.nal approximation is good also for the tails when n~z>10. Thus, for moderate sized n and/or low values of ~. one should use (2.2J) instead of (2.19) for constructing tests and confidence inter7als.
In practice the value of n~z is unknown. But since ED =nED. = n(1 -e -~z ) ~ n~z
l.
for small values of ~z, ~ie may as a practical rule replace n~z
by D in the results stat~d above.
D. ML-estimation for the ceneral case.
We return to the setting cf Subsection B. By standard results, the ML-estimator 1\ ~, determinej by the equations (2.10), satisfies
1\ D
In(~-~) + N (O,E),
·- ~ p - - (2.21)
as n+"', where
under weak regularity conditions.
Here
T. l.
A. (9) =
l. -
- f
!l(v:9)dvDi 0
!l(T.:9) e
l.
is the contribution to the total likelihood corresponding to (T.,D.). We find
l. l.
T. l.
i..nA. ( 9)
=
D. in 11 ( T. :e ) -
j 11 ( v: 9 ) dv,l. - l. l. ... 0 -
o.R..nA.(e) T.
l. - 0 l. 0
=D. :;---
9 !l(T. :9)/!l(T. :9}- j -;----
9 !l(V:9)dv
l. u j l. - l. - 0 u j -
o9. J
and
o 21nA. (e)
(2.22)
l. - =D. {
o
2 0 009 09 [!l(T. :9} )!l(T. :9)- ~e !l(T. :9}~
9
11 (T. :9)}/!l(T. :9)2l. j k l.""' l.""' u j l . " " ' u k l. J . -
de£
=
A-B.Now
dt
- f
z e0
and by integration by parts
z t EB
= f
{f0 0 z +{f
0
t
-fl-l(V;S)dv
o2
o -
oejoekl-l(v;~)dv}l-l(t,~)e z
- fl-l(V;9)dv
o
2o -
oe .oe ~J.(v;~)dv}e J k
Combining this we get
- E e
dt
which inserted into (2.22) gives an expression for The matrix
f
may be estimated by inserting 1\ 9-
dt,
-
Lfor
(2.23)
e
in (2.22) and (2.23). This may be akward, however, since one then has-
to compute the integrals in (2.23). A more common way to estimate
f
is the following. By the law of large numbers, we haven
o2J.nll(~)
oejoek
=n n i=l
I
o
2J.nll. (9) P-~~l.--_ + - E
oejoek
o
2 J.nll. ( 9 )l. ....
as n+m, It can be shown that this continues to hold, under weak regularity conditions, when the true ....
e
is replaced by - , l.. ,1\.e • e that1\ ~ 2 .lnA. (e)
~2.lnA(9)
...
p l. -- -
n oejoek + - E ~ejoekas n+oo. Thus, as n+oo,
{- n
so that the asymptotic covariance matrix estimated simply as
Example 1 - Gompertz-Makeham.
Consider the situation with
e
= (a,~,c)' and~(t;~)
=
a+~c t ,cf. Example 2.1 .3. Since
(2.8) gives for the total likelihood
of
n
_1_ T.
n T.
o.
-[aT.+ o (c l.-1)]II {(a+~c l.) l.e l. A.nc }, A
=
II A.=
. 1 l.
J.= i=l
may be
where we have written A for A(a,~,c) for short. Thus
inA
=
nI
D . .tn(a+~c T. l.)- nI
[aT.+ ~(c ~ T. l.-1)]. 1 l. . 1 l. A.nc
J.= J.=
n T. n ~ n T.
I
Di.tn(a+~c l.)-aI
Ti - IncI
(c l._l ),i=1 i=1 i=1
=
(2.24)
(2.25)
and
a.tnA
=
a.tnA n
I
0~ =
i=l
~= oc n
I
i=1
D. ~
a+~c ~ T.
T.
n.c ~
~
T.
a+~c ~
-
nI
T. ;i=l ~
n T·
I
(c ~-1};.tnc i=1 T.-1
D.T.c ~
~ ~
T.
a+~c ~
~ n T.-1 n T.
{ L
Tic ~ .tnc -L
(c ~-1 }/c}.{.tnc} 2 i=1 i=1
1\ 1\ 1\ 1\
The ML-estimator ~ = (a,~,c}' is given as the solution to the set of equations
o.tnAI 1\ = O;
oe . e=e J - -
j = 1,2,3.
These equations have to be solved iteratively.
Using the general results of this subsection we have that
as n+m, where by (2.23}
z t-1
J
@tc y{t}dt0 a+~ct z t
J
c y{t}dt0 a+~ct
z 2t-l
J
@tc ty(t}dt0 a+~c
z 2t
J
c y(~}dt 0 a+~cl:- 1
=
z Q2t2 2t-2 {t}
J
~ c y dt0 a+~ct {Symetric}
and we for short have written y{t} for -[at+~ (ct-1 }/.tnc]
e . The
matrix
f
is most easily estimated by {2.24}. You may yourself o 2.tnAcompute the explicit expressions for - with oejoek
= {a,~,c}.
E. Piece~ise constant intensity.
For situations with large data sets, the determination of the ML- estimators, e.g. for the Gompertz-Makeham model, will require much computing. One then often use an "approximating model" for which the intensity is assumed to be piecewise constant. This approach may also be used because one does not have the detailed data
required to apply the method of Subsecti~ns B and D.
Therefore, in this subsection, we assume given
~(t;~)
=
K LI.L·I.(t), j=l J J where-- [ 01
I . ( t) J
for x.
1<t,;;x . ,·
J- J
otherwise:
and
e =
(1!1, ••• ,I.Lk) •.
The part of the likelihood corresponding to be written
where
and
T. 1
-f
I.L(v;9)dv ....A.= A.{S)
1 1 ....
D. 1
=
1.L(T.,9) 1 .... e 0D ..
l.J
T ..
1J
-
KI
T . . I.L.j=1 l.J J
e
= {o1
x. J-1<T.<x., D. 1 J l.=
1;otherwise
r
-x. 1 i f T. >x.;J J- .1 J
=
T. -x. 1 i f X · 1 <T. (X • ;l. J- J- 1 J
0 if T.,;; X . 1 .
l. J-
( T. , D. ) may now
1 1
We see that D. . is the number of times we observe a death for the
l.J
ith indiviqual in (x.
1, x.], while T. . is the total time the
J- J l.J
ith individual is observed to live in this interval.
where
Now, the total likelihood is
A
=
n n A.=
i=l l.
n
K
K D -
I
T .. ~.n i ' ·-1 l.J J IT {IT (~. J)e J-
i=l j=l J
=
K
K - Iv.~.
Mj j=l J J
IT(~. )e j=l J
M. J =
I
D.i=l l.j
is the observed number of deaths in (x.
1,x.], J- J n
and V. J =
I
Tl.j . is the total exposure in this interval. Thusand
i=l
K K
lnA
= I
M.ln~.-I
v .~.,j=l J J j=l J J
~lnA
~~
It follows that the ML-estimator for
i.e. an occurrence/exposure rate.
is
(2.26)
1\ 1\
To derive the asymptotic properties of ~l' ... ,~K' we can apply standard maximum likelihood theory to get
[
/\ ~1rn ~
~K
(2.27)
as n~CX>, where
(2.28) By straightforward computations
where 6 ..
JJ
is a Kronecker delta, i.e. 6jk = 0 NCM
k-1
- L (
x . -x .1 ) ~ . J. =1 J J- J
=
ek-1
- L (
x . -x . 1 ) ~ . ( . ) '=1 J J- J - xk-~-1 ~k= e J ( 1-e ) •
(2.29)
for j:fk and
Thus, by (2.28) and (2.29),
f-
1 is a diagonal matrix with diagonal elements=
~2 kk-1
- L (
x . -x . 1 ) ~ . ( ) j=1 J J- J - xk -xk-1 ~ke ( 1 -e ) .
1\ 1\
It follows that ~
1
, ••• ,~K are asymptotically independent with asymptotic expectations ~1
, ... ,~K' respectively, and1\
as.var ~k =
n
a~ = n (2.30)-
k-1L
(x.-x. 1)p.. ( )j=1 J J- J - xk-xk-1 ~k
e { 1-e }
By direct computations we find ET . k
=
ED . I~.
~ ~k k Thus, we may alternatively write
as.var 1\ ~k
=
where ~k = ETik'
1\
sistently by
v--,
n~kk
(2.31) cf. (2.15).
so that (2.31) may be estimated simply as
c£. (2.18).
These results may also be derived directly by using the general results of Subsection D. By (2.23) we immediately get
as above.
Ij(t)Ik(t)
K
I
~hrh<t>h=1
~
J
X.. ~k
K-1
2.3. Random censorship.
A. Model and ML-estimation.
t K
- f I
~hlh(v)dve 0 h=1 dt
Fixed censoring, as described in the preceding section, may be encountered in reliability testing situations. However, this form of censoring is much too simple to describe the type of data one usually has to deal with in actuarial, biomedical or demographical applications. One model which may be closer to reality, and is much discussed in the biostatistical litterature, is random censoring.
For random censoring we assume that the censoring time for the i-th individual is the outcome of a random variable Z., which is
l.
independent of the true survival time are
u ..
l. Thus, our observations
i f
u
0 ~z
01. 1.
i f U.>Z. ;
1. 1.
where u
1, ... ,un are i . i.d. with hazard rate function ~(u;~) as before, and z
1 , ... ,zn are i.i.d. with distribution function H(z) = P(Z.~ z), which we for simplicity assume has finite support
1.
[o,a].
Moreover, the Z, IS1. and the U. 1 s
1. are mutually indepen- dent, and H(z) is assumed to be functionally independent of rv
e .
The elementary probability function g(t,d;2) of easily derived. Reasoning as in Subsection 2.2.A we find
g(t,d;2) =
-f~~(v;~)dv
{
~(t;~)e e -ft0
~(v;~)dv dH(t) (1-H(t))for d=l
for d=O.
( T. , D. ) is
1. 1.
Thus the likelihood corresonding to (T.,D.)
1. 1. is proportional to
A.(e) 1. ~
=
~(T.;9) 1. rv D. 1. e-J~~(v;~)dv T.
and the total likelihood is proportional to T.
A(~)
=
n D. -f~~(v;~)dvII {~(T.;9) 1 e }.
i=l 1. rv
( 2. 32)
(2.33)
It follows that the ML-estimator
Q
is the solution to the set'· .,
of equations (2.10) also for the model with random censorship.
Even if the expressions for deriving rv 1\
e
coincides for fixed and random censorship, this is not (completely) true for the asymptotic distribution of rv 1\e.
Let us first consider the exponential case.B. Asymptotic prope~ties in the exponential case.
In this subsection we assume ~(t;2)=~· By the result of the pre- ceding subsection we still have the ML-estimator (cf. (2.11))
1\ ~
=
D/T,n n
where D =
I
D· and T =I
T .. Since (Ti,Di);i=l l i=l l
i .i.d. we may use standard theory for ML-estimation Ill(~-~) D + N(O,a 2 ),
as n+oo, where
As in Subsection 2.2.c we have
Furthermore, using the result just above (2.14),
ED . = EE ( D . l l
I z . )
lIt follows that a
a2 = ~2/J(l-e-~z)dH(z).
0
J
a (1-e-~z)dH(z).0
i = 1, . . . 1ni are to get
(2.34)
(2.35)
It is also here possible to derive an alternative expression for a2 • Using the result just below (2.14), we get
-~J.Z.
=
EE (T. ~I z. )
~=
E ( 1 -e ~ )= -
1 af (
1 -e - z IJ. ) dH ( z ) .IJ. IJ. 0 Introducing 't for ET., we may write
~
which is similar to (2.15).
(2.36)
It is not convenient to use (2.35) for estimation purposes, since this will also involve estimation of the censoring
distribution H. The asymptotic variance of 1\ IJ. is most easily estimated by -a l/\2
=
1\ 1-1/T, wheren
1\2 1\ -
a
=
1-1 /T, cf. ( 2. 1 7) .c.
Asymptotic properties for the general case.Also for the general set-up of Subsection A, we may use standard theory for ML-estimation. It follows, under weak regularity conditions, that
(2.37) as n+cn, where
(2.38)
and A. (a)
~ .... is given by (3.1).
a 2 ~nA. (a)
The expression for ~ ... for the present situation coin- cides with that
expression for
given in Subsection 2.2.0. Furthermore, the
o 2 .tnA. (a)
~ ....
E for Type I censoring, equals the aajoak
conditional expectation for the present -situation, given
z.
~=
z.Hence, by (2.23) we have
a 2 ..tnA. { 9) a 2..tnll. {9 >
- E
z.
~=
Ef0
a z
= J J
0 0
~ ....
=
-EE{ ~ ....lz)
a9ja9k a9ja9k
a o t
~~(t;~)o9k~{t;~) -J~(v:~)dv 0
~(t;~) e dt
0 0 t
~~(t;~)o9k~(t;~) -J~(v;~)dv
J e 0 dtdH(z)
~(t;9) ....
t
-J~(v;9)dv
e 0 - (1-H(t))dt, (2.39)
where the latter equality follows by changing the order of integra- tion.
Thus, the asymptotic distribution of ~ 1\ is given by (2.37)- (2.39). The asymptotic covariance matrix may s t i l l be estimated as given in (2.24)-{2.25).
To conclude the discussion on ML-estimation for random censor- ship, we note that 1\ ~ is found exactly as was the case for fixed censoring. The estimator is asymptotically normally distributed with the proper expectation, but with a covariance matrix that is
{slightly) different from the one we had for fixed censoring. The estimator of the asymptotic covariance matrix has the same form for the two types of censoring, however.
2.4. A general result . . A. The general case.
In the two preceding sections, we have seen that we get very
similar results for fixed censoring and random censoring. In this section we will present a general result which contains the results of the two preceding sections as special cases.
He observe n individuals, but we do not any longer need to assume that the observation starts at time
o.
Let individual no i be observed from time x. until death or until censoring atl
time x.+Z.. "Time" may here be an individuals age, the time
l l
elapsed since some initializing event (e.g. the start of a treatment) or some such quentity. If this individual dies at u.E(X. ,X.+Z.], we let T. = u. and D. = 1. Otherwise, we let
l l l l l l l
T.= X.+Z. and D.
=
0. The X. 's and thez.
's may be fixedl l l l l l
numbers or random variables. If they are random, we assume for the sake of simplicity that (X.,Z.)7 i=l, ... ,n are independent (but
l l
not necessarily identically distributed) and independent of the u. 's. We also assume that the distributions of the (X.,Z.) are
l l l
functionally independent of and that x.+Z.<a
l l for all i . Now, the triplets (X.,~. ,D.)7 i = 1,2 ... ,n7 are independent
l l l
and i t follows that the likelihood is proportional to
A(~)
=
n D.
II {!..!.(T.78) 1 e i=l l ~
-fx~~-L(v;£)dv T.
l
} .
The ML-estimator 1\
e
is determined by the set of equations cH.nA(~)oe .
J i.e. by
=
07 j = l , ... ,p7n
i=l
I
which is almost the same as (2.10).
l , . . . ,p~ (2.40)
It is possible to derive asymptotic distributional results for
~ for the quite general situation we are considering (Bergan, ....
1984) . To be able to state this result, we introduce n
Y(t)
= L
I{X.<t(T,}, . 1 ~ ~~=
(2.41)
where I{•} is the indicator function. We note that Y(t) is the number at risk just prior to time t. If some weak regularity
conditions (primarily on ~(v~e)) are fulfilled, and provided that there exists a function y(t) such that
P a
+
f
0 ~(t;~}
Y(t)dt n
def 'k y ( t) dt
=
0' J . ,as n+m, where t. '1"-1 -- {,..jk} ..., is positive definite, then
1\ D
In (
_e -_e )
+ N ( 0 , E ) , p - -as n+m. Moreover, we have
-
nas n+m.
(2.41)
(2.42)
(2.43)
It is straightforward to see that this result generalizes the ones we have derived above. E.g. for the model with random censor- ship we get using the law of large numbers as n+m:
a a
a ~~(t;~)~~~(t;~)
J
J ,., • Y(t~t0 ~(t;~) n
n T i
= I J
n i=l 0
a a
~~(t;~)~~(t;~)
~(t;e) dt
-
a a
a ~~(t;~>aek~(t;~)
= 6
J~(t;~)
y(t)dt.Here the latter equality follows by (2.39) with t
-j~(v;e)dv
y(t)
=
e 0 (1-H(t)).B. Piecewise constant intensity.
Assume, as in Subsection 2.2.E, that there are given x
0<x1< ••• <xK, and that
~(t;~)
=
K I~.I.(t),j=l J J
where I .(•) is an indicator for [x.
1,x.).
J J- J
i to is
We introduce T .. as the observed lifetime for individual no
~J
in Lx. 1'x.), and J- J
die in [x.
1,x.);
J- J proportional to
let
D ..
=
~J
D. .
=
1 if this individual is observed~J
0 otherwise. Then the total likelihood
n D.
n {l.l(T.;9) ~e . 1 ~ -
~=
T. ~
f
~(v;9)dv-
x.
~}
K
D -I~.T
..
n Di1 °i2 iK j=l J ~J
= .
n {
~ 1 ~ 2 • • • ~K e }~=1
K
K
- L
~ .V.j=1 J J
e
K with M.
= L
D .. andv = L
Tj j=1 i j . It follows that the ML- J j=1 ~ J
estimators are the occurrence/exposure rates
1\ M.
~· J
=
...J..v.
J
as before.
j
=
1, ••• ,k (2.44)The asymptotic properties of (2.44) follow by the general result of Subsection A: We find for the left-hand side of (2.41)
n
• Y(t)dt
=
n
=
6J.k - -L
T . . . 11j n i=1 ~JX. J
6 jk ~. n
f
Y(t)dtJ x. J-1
It follows by the general result above that, if
n P
L
T .. + 't . >0 n i=1 ~J Jas n+=: j
=
1, ... ,K; then(2.50)
as n+'"', with E a diag{--, ••• ,--}. ~1 ~K
- ~1 ~K
Thus, we have as before that 1\ 1-L 1 I • • • I 1\ j.l.K are asymptotically independent and that is asymptotically normally distributed with expectation and
be estimated by 1\ IJ.
.Jv ..
J J
as.var 1\ t-L.
J
1 j.l. •
= - ~.
n ~j
Also, as.var 1\ j..L.
J may
3. PARAMETRIC INFERENCE FOR LIFE HISTORY MODELS
This chapter shows how the results from the survival analysis set- up may be generalized to other life history models.
3.1. Markov chains.
As an introduction to general life history models, let us see how the alive/dead situation may be reformulated into a Markov chain set-up. We have a survival time U with absolutely continuous distribution function F(u) as in Section 2.1. We write W(u) = 0 if U>u, and W(u) = 1 if U~u. Then W(•) is the sample path of a Markov chain with state space I = {0,1} which may be illus-
trat~d as in Fig 1.
0 Alive
~(t)
)
Dead . \
Fig. 1. A Markov chain model for the survival analysis set-up.
For a general time inhomogeneous Markov chain with sample path W(•) and finite state space I, the transition probabilities are given by
Phj(s,t) = P(W(t) = jjW(s) =h); h,jEI.
In the survival analysis situation studied in Chapter 2, we have
=
e-f~(v)dv t s
-t J~(v)dv
P 0 1 ( s , t ) = 1 -e 8
P11 ( s , t ) - 1.
Moreover, by (2.3) the death intensity may be given as
~ ( s)
=
lim P01 ( s, t)
I (
t-s ) . t"-s( 3 • 1 )
For a general time inhomogeneous Markov chain model with finite state space I, the transition intensities are defined similar to
(3.1 ). Namely by
~hj ( s) =lim Phj(s,t)/(t-s).
t"-s
h*j, h,jEI. This immediately gives
~j(s)lls+o(lls) = Phj(s,s+lls), as lls"-0,
( 3 . 2 )
which shows that for small values of lls, ~hj(s)lls equals approx- imately the probability of a h+j transition in (s,s+lls] given W(s) =h.
We also define
~h(s)
=
lim[1-Phh(s,t)]/(t-s) t"-sfor hEI, and note that
( 3 • 3 )
( 3. 4)
3.2. The multiple decrement model.
A. Probabilistic properties.
In this section we will study a Markov chain model for cause specific mortality. The model may be illustrated as in Fig 2.
1
~l(t)
Dead of cause no. 1
0 2
Alive !Dead of
2 cause no .
• • of cause no. k
Fig. 2. ·The multiple decrement model.
For short, introduce ~h(t) for the transition intensity ~Oh(t), k
and let ~(t)
= L
~h(t) denote the total force of mortality.h=l
We will show how the transition probabilities may be derived from the transition intensities. By a standard argument we have
Dividing through by ~t, and letting i t tend to zero, we get the Kolmogorov forward differential equations
0 otpoo<s,t)
=
-~(t)Poo(s,t), ( 3 . 5 ) 0 otPOh(s,t)=
~h(t)Poo<s,t). ( 3 • 6 )The former equation yields
'
and the obvious relation t
- fl!(v)dv P s
00(s,t)
=
e ( 3 . 7 )results. Inserting this into (3.6) and integrating, we get for h = l, ..• ,k
u
-fl!(v)dv t
P0h(s,t)
= f
e s ~(u)du, swhich may also be given an immediate interpretation.
B. ML-estimation of constant intensities.
( 3. 8)
Let us in this subsection assume that l!h ( t) =1-lh, i.e. independent k
of t, for all h, and write I! =
I
l!h. Moreover, assume that h=ln individuals are observed over
[o,z]
for some z>O. For the i-th of these individuals we let T. be the observed lifetime in~
[o,z],
and D=
1hi if the individual is observed to die of cause h; Dhi = 0 otherwise.
Then the likelihood for individual no. i may be written as
A.
=
~
k Dh. -~J.T.
~ ~
II (l!h· )e h=l l.
The total likelihood is therefore
A
=
n
( 3. 9)
(3.10)
where D
=
h
I
Dh.i=l ~ is the total number of deaths from cause h in n
T
= L
T.i=l ~ It follows that the
[ 0, z], and is the total exposure.
ML-estimators are given by
1\ ~h = Dh /T I h = 1 I • • • I k (3.11)
i.e. they are occurrence/exposure rates.
Since the vectors (T.,D 1
.,o
2., ..• ,Dk. )~ i
=
l, ... ,n~ are~ ~ ~ ~
i.i.d., i t follows by standard ML-estimation theory that
/\
j.11 j..L1
• D
{n • • + Nk(Q,~)
/\
~ ~
as n+oo, where
By straightforward computations, using (3.9), we get
where is a Kronecker delta. Thus, by (3.8), we find
~j
POh(O,z)
=
~j1\ 1\
It follows that ~
1
, ... ,j.lk are asymptotically independent and that1\ ~ is asymptotically normally distributed with expectation j.lh and
as.var 1\ ~h
=
( 3. 12)By computations similar to those just above (2.15), we find
1j.1(1-e-j.lz), 1 ·
ET.
=
so that we may a so wr~te~
as.var llh 1\
.. --
n 1 llh 't, (3.13) with 't=
ET.. This result is analogous to that of the survivall.
analysis situation. Also, we have as before that
~ 1\ 1\
as.var llh
=
llh/T. (3.14)The results presented in this subsection are easily generalized to the situation with random censorship, i.e. where individual no. i is observed over
[o,z.],
and thez
's are i.i.d.l. i
c.
ML-estimation for general case.We assume that ~h(t)
=
~h(t;~) for some a = ( e 1 , ••• , e ) ' , and.... p
assume that our observations are as in Subsection B above. Then the likelihood for the i-th individual is
A. l. .... (e)
=
eTi k
- f I
llh ( v; ~ ) dv 0 h=1The total likelihood is therefore
Thus
!J..(~)
=
n II {II k u.. (T.;9) Dh. 1 ei=l h=l·n 1 ....
J.n/J.. ( ~)
=
-
kI
h=l T. l.
J
llh (v; e )dv 0 ....(3.15)
} .
(3.16)1\e 1\ 1\
and the ML-estimator = (e
1, ••• ,ap)' is found by solving the system of equations
n k
I L
Dh.i=l h=l l.
j = l , ... ,p.
!;:,.
u.. (T.;
e
>. n l. ...
k T.
n l.
o
~L L
0
!
oeJ.Ilh(v;~)dv = O;i=l h=l
(3.17)
By standard theory for the ML-estimator, we have under weak regularity conditions.
1\ D
f:n(9-9) ... ... + N (O,E), p ... ""
as n+a:~, where
Now we have using (3.15)
~ 2 .lnll... (e)
~
- =
oej~e.l
k ~2
hi 1 Dhi{oej~e.l[~h(Ti;~)]~h(Ti;e)
k T1
- I J
h=1 0
By computations parallel! to those giving (2.23) we find
k z
- E
= I . J
h=1 0
1\
~h ( t i ~) e t
-f~(v;g)dv
0 dt.
(3.18)
(3.19)
(3.20)
Thus ~ is asymptotically multinormally distributed with mean ~
and a covariance matrix given by (3.19) and (3.20). The asymptotic covariance matrix
lr
is as usual most conveniently estimated byn-
The results of this subsection is easily extended to the situation with random censorship, cf. Subsection 2.3.C.