• No results found

Two-Stage sampling from a prediction point of view

N/A
N/A
Protected

Academic year: 2022

Share "Two-Stage sampling from a prediction point of view"

Copied!
38
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Discussion Papers No. 383, August 2004 Statistics Norway

Jan F. Bjørnstad and Elinor Ytterstad

Two-Stage Sampling from a Prediction Point of View

Abstract:

This paper considers the problem of estimating the population total in two-stage cluster sampling when cluster sizes are unknown, making use of a population model arising basically from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. The predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean square error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n0 of sampled clusters they differ significantly, however, for large n0 the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. Roughly, the simulation study indicates that for large sample sizes the coverage measures achieve approximately the nominal level 1 - α and are slightly less than 1 - α for

moderately large sample sizes. For small sample sizes the coverage measures are about 95% of the nominal level.

Keywords: Survey sampling, population model, predictive likelihood, optimal predictor, prediction intervals, simulation

JEL classification: C42, C13, C15

Address: Jan F. Bjørnstad, Statistics Norway, Division for Statistical Methods and Standards.

E-mail: jab@ssb.no

Elinor Ytterstad, University of Tromsø, Department of Mathematics and Statistics, N-9037 Tromsø, E-mail: Elinor.Ytterstad@matnat.uit.no

(2)

Discussion Papers comprise research papers intended for international journals or books. A preprint of a Discussion Paper may be longer and more elaborate than a standard journal article, as it may include intermediate calculations and background material etc.

Abstracts with downloadable Discussion Papers in PDF are available on the Internet:

http://www.ssb.no

http://ideas.repec.org/s/ssb/dispap.html

For printed Discussion Papers contact:

Statistics Norway

Sales- and subscription service NO-2225 Kongsvinger

Telephone: +47 62 88 55 00 Telefax: +47 62 88 55 95

E-mail: Salg-abonnement@ssb.no

(3)

1. Introduction

Two-stage surveys are used in sampling from finite populations of, say, N primary units or clusters, where each cluster consists of mi units. N is assumed known. As mentioned by Kelly and Cumberland (1990) and Valliant, Dorfman and Royall (2000, ch. 8.9), it often happens that the mi's are unknown before sampling, and this is the case we consider in this paper. Let yij be the value of the variable of interest for unit j of i'th cluster. The problem is to estimate the total

∑∑

= =

= N

i m

j ij

i

y t

1 1

.

An example is considered in Thomsen, Tesfu and Binder (1986) and Thomsen and Tesfu (1988), with t being the size of a particular population. The clusters are certain administrative units, the units are households and yij is the number of persons in household j of the i'th administrative unit.

We assume that, before sampling, other measures of the sizes of the clusters are available to us. Let xN

x1,..., be these measures with =

N= i xi

X 1 . Kelly and Cumberland (1990) consider a case where the clusters are blocks of dwelling units and xi is the number of units in block i from a previous census.

The sampling plan is as follows: At stage 1 a sample s of size n0 of the clusters (1,..., N) is selected according to some sampling design. At stage 2 we select for each is, a sample si of size ni of units using possibly a different sampling design than at stage 1. The designs are assumed to be non- informative, i.e., they do not depend on the yij's and the mi's. E.g., in Thomsen and Tesfu (1988) the two-stage sampling plan is to use pps-sampling at stage 1 (letting selection probabilities of clusters be proportional to the xi's) and simple random sampling (srs) at stage 2. This is a common two-stage sampling plan, as also mentioned by Kelly and Cumberland (1990). Usually, the second stage sample sizes are the same, leading to approximately equal selection probabilities for all units provided the ratios xi/mi are not too different. When the mi's are known, one often used sampling plan is to let the first stage selection probabilities be proportional to mi, and then srs with same sample sizes at stage 2 yielding equal selection probabilities for all units. As mentioned by Valliant et al. (2000, ch.8.1), equal sample sizes at stage 2 has many advantages and is probably the most common allocation of sample units in practice.

(4)

The total sample size is =

s i ni

n and our data now consists of

( )

si

j s ij i

y

y(s)= , and the vector

s i

mi

s

m( )=( ) , where s = }{s,si:is . Let y=

(

y(s),m(s)

)

. For the pps-srs sampling plan mentioned above, a commonly used design-unbiased estimator of t is the Horvitz-Thompson estimator (see, e.g., Thomsen et al. 1986, Kelly and Cumberland,1990, and Särndal, Swensson and Wretman, 1992)

=

s

i i

i HT i

x y m n t X

0

ˆ (1)

where =

si

j ij i

i y n

y / .

In this paper a population model is adopted, regarding mi,yij as realized values of random variables

ij

i Y

M , for j = 1, ..., Mi and i = 1, ..., N. The Mi's are assumed independent of all Yij, and furthermore:

. if 0 ) , (

0 , if )

, (

) ( , ) (

0 ) , (

) ( ) ( , ) (

2 2

2

i l Y

Y Cov

j k Y

Y Cov

Y V Y

E

M M Cov

x v M

V x M E

lk ij

ik ij

ij ij

j i

i i

i i

=

=

=

=

=

=

=

ρ ρτ

τ µ

σ β

(2)

Since the variance of a cluster total is nonnegative, we must have ρ ≥−1/(maxmi−1)as also noted by Kelly and Cumberland (1990). It is therefore a minor restriction to assume a nonnegative ρ. Also, usually v(x)=xg with 0 ≤ g ≤ 2. In fact, it is typically assumed that v(x) = x (see e.g., Royall,1986, Kelly and Cumberland,1990, and Valliant et al.,2000, ch. 8.9).

A more general model is to let ρ and τ vary with the clusters, having cluster parameters ρii . However, we then have the problem of estimating these parameters. Without further assumptions we are only able to estimate (1−ρii2. As noted by Valliant el al. (2000, ch. 8.1), it is often sensible to adopt model (2), especially after suitable stratification that also may allow µ to be different for different parts of the population.

The model (2) for the Yij's arises naturally from expressing Yij in the following way:

ij i

Yij=µ +ε

where all µiijare independent with

(5)

) 2

( , )

( i V i b

E µ =µ µ =τ and Eij)=0,Vij)=τw2. (3) Here, Vi)=τb2 expresses the variability between the clusters, and Vij)=τw2 expresses the

variability within the clusters. Then τ2b2w2 and the intraclass correlation is given by:

2 2/τ τ ρ= b ,

the proportion of the total variability due to the variability between the clusters.

The total t is now a realized value of a random variable T, where T can be expressed as Z

Y T i s j s ij

i +

=

∑ ∑

with

1 .

∑ ∑

∑ ∑

+ =

= i s

M

j ij

s

i j s ij

i

iY Y

Z (4)

Expressing the T on this form, we see that the problem can be expressed as one of predicting the un- observed value z of the random variable Z. It is often clarifying to write a predictor Tˆ of T on the form

Z Y T i s j s ij

i

ˆ

ˆ=

∑ ∑

+ (5)

where Zˆ then implicitly is a predictor of Z. Considering the modified Horvitz-Thompson estimate Tˆ given by (1) on the form (5), we can use the following expression, with HT =

s

i i

s x

X ,

∑ ∑

∑ ∑

⎟⎟

⎜⎜

⎝ + ⎛

− +

= i s j s j

j j i

s i

i i

i i s s

i j s ij

HT Y

x m x n

Y x n n m X Y

T

i 0 0

) 1

ˆ ( .

The last term predicts

∑ ∑

is = M

j ij

iY

1 while the second term predicts

∑ ∑

is js ij

i

Y . From this point of view Tˆ does not look like a reasonable predictor. HT

Modeling the population in survey sampling has been and still is somewhat controversial, although most statisticians seem to agree on using modeling in developing statistical methods while evaluation is done with respect to the sampling design. An important aspect of this issue is that the likelihood principle in a sense makes it necessary to model the population. Without a model the only stochastic elements are the samples s = },{s,si:is and the likelihood function is then flat (see, e.g., Cassel, Särndal and Wretman, 1977). This means that from the likelihood principle point of view the data contains no information about the unobserved yij's andmi's. To make inference we therefore need to

(6)

relate the data to the unobserved values somehow, and the most natural way of doing so is to formulate a model (see also remarks by Berger and Wolpert, 1988, p. 114 and Bjørnstad, 1996).

The random variables observed are Y(s), M(s) and s, where s now is ancillary. The likelihood principle implies that inference should depend only on the actual s observed and not on the sampling design.

This is usually called the prediction approach to survey sampling and will be adopted in this paper.

Hence, theoretical considerations are conditional on given s. The prediction approach aims at choosing a predictor that is good for the actual s obtained and has given significant contributions to a better understanding of several problems in survey sampling, some of which are mentioned in Thomsen and Tesfu (1988) and Valliant et al. (2000). It also enables one to use more conventional statistical methods, although the problem is not to make inferences about θ but rather predict Z. Hence, θ basically plays the role of a nuisance parameter.

To predict Z we shall use predictive likelihood based methods, a non-Bayesian likelihood approach to prediction problems in general. One can argue that in the context of a population model, survey sampling provides one of the more natural "prediction" problems in statistics. Predictive likelihood can therefore serve as a basis for essentially all problems of this kind in survey sampling. Some major references to the general theory of predictive likelihood are Hinkley (1979), Mathiasen (1979) and Butler (1986). A review of some of the suggested likelihoods is given in Bjørnstad (1990, 1998).

Predictive likelihood is discussed from the perspective of the likelihood principle for prediction in Bjørnstad (1996). Bolfarine and Zacks (1992) consider methods based on predictive likelihood in survey sampling.

Section 2 introduces the concept of predictive likelihood and shows how predictors and prediction intervals can be constructed from a predictive likelihood, and in Section 3 a predictive likelihood is derived for the normal model. Considering a predictive likelihood for Z directly does not work, mainly because Z is a sum of a stochastic number of random variables. Therefore, predictor and prediction interval will be obtained from a joint predictive likelihood for Z and the vector M(s) = (Mi)is. In Section 3.3 optimality theory for a class

l

of predictors linear in the Yij's, but not simultaneously in both Yij's and Mi's, under the distribution-free model (2) is developed.

In Section 4 three prediction intervals for Z based on similar predictive likelihoods are studied and a comprehensive simulation study for estimating confidence levels, both model-based and design-based is undertaken. The prediction intervals are evaluated by four different measures; the model-based

(7)

coverage Cm, the design-based coverage Cd, the unconditional coverage C (expected design-based coverage), and the conditional coverage given the data.

2. Predictive likelihood

We shall here give a brief general introduction to the concept of predictive likelihood. For a more complete exposition we refer to Bjørnstad (1990, 1998). Let Y = y be the data. The problem is to predict the unobserved or future value z of a random variable Z usually by a predictor and confidence interval for Z. It is assumed that (Y,Z) has a probability density or mass function (pdf) fθ(y,z). In general we let fθ(⋅) and fθ(⋅|⋅)denote the pdf and conditional pdf of the enclosed variables. The likelihood basis in prediction is the generalized joint likelihood for the two unknown quantities, z and θ. In Bjørnstad (1996) it is shown that the joint likelihood function is given by ly(z,θ)= fθ(y,z).

With this likelihood, the corresponding likelihood principle is implied by the sufficiency principle for prediction and the conditionality principle, generalizing the fundamental result by Birnbaum (1962) for parametric likelihood. The aim is to develop a partial likelihood for z, L(z|y), by eliminating θ from ly. Any such likelihood is called a predictive likelihood and gives rise to one particular prediction method.

Different ways of eliminating θ give rise to different L. The two main type of suggestions are the conditional predictive likelihood Lc , essentially suggested by Hinkley (1979), and the profile predictive likelihood Lp, first considered by Mathiasen (1979). Let R = r(Y,Z) denote a minimal sufficient statistics for (Y,Z). Then

Lc(z|y)= fθ(y,z)/ fθ(r(y,z)) (6)

) , ( ) , ( max )

|

(z y f y z fˆ y z

Lp = θ θ = θz . (7)

Typically, Lc and Lp are quite similar when sufficiency provides a genuine reduction and the dimension of θ is small.

In linear models, Lp will ignore the number of parameters and can be misleadingly precise. A modification of Lp, Lmp, that adjusts for this was suggested by Butler (1986, rejoinder), see also Bjørnstad (1990). If Y,Z are independent, Y consisting of n independent observations and Z being an m-dimensional vector of independent variables, then Lmp is given by

2 / 1 ' 2

/

1 /| |

| ˆ ) (

| )

| ( )

|

( p z z z z

mp z y L z y I H H

L = ⋅ θ . (8)

(8)

Here, )}Iz(θ)={Iijz(θ is the "observed" information-matrix based on (y,z), i.e. Iijz(θ)= .

/ ) , (

2log

j

z i

y

fθ ∂θ ∂θ

Hz =Hz(θˆz), and Hz(θ)is the k x (n+m) matrix of second-order partial derivatives of log fθ(y,z) with respect to k-dimensional θ and (y,z).

We shall assume that any L considered is normalized as a probability distribution in Z. The mean and variance of L are then called the predictive expectation and the predictive variance of Z, denoted by Ep(Z) and Vp(Z). Ep(Z) is then a natural predictor for z, called the mean predictor. L(z|y) also gives us an idea on how likely different z-values are in light of the data, and can be used to construct prediction intervals for z. An interval (ay, by) is a (1-α) predictive interval based on L(z|y) if

abyy =

dz y z

L( | ) 1 α.

A simplified (1-α) predictive interval is of the form ) ( )

(Z u 2 V Z

Ep ± α p (9)

where

α2

u is the upper α/2-point in the actual (exact or approximate) conditional distribution, given y, of (ZEθ(Z|y))/ Vθ(Z|y).

3. Predictive likelihood and predictor in two-stage sampling

3.1 Predictive likelihood for mixtures

In two-stage sampling, Z is given by (4), and is the sum of two mixtures. Therefore, instead of considering a predictive likelihood for Z directly, we look at a joint predictive likelihood for Z and

) (s

M . It has the following form

L(z,m(s)|y)=Lm(s)(z|y)L(m(s)|y). (10) )

|

)(

( z y

Lms is a predictive likelihood for z conditional on M(s)=m(s), i.e., based on fθ(y,z|m(s)). Since fθ(y,z|m(s))= fµ,τ,ρ(y(s),z|m(s),m(s))fβ,σ(m(s)), Lm(s)(z|y) is, in fact, based

on fµ,τ,ρ(y(s),z|m(s),m(s)). L(m(s)|y) is a predictive likelihood for m(s)based on fθ(y,m(s)). The predictive likelihood for Z is given by the marginal in (10), e.g., in case of a continuous model for Mi ,

(9)

L(z|y)=∫L(z,m(s)|y)dm(s) (11) Then Ep and Vp follow the usual rules for double expectation, i.e.,

))}

(

| ( { )

(Z E E Z M s

Ep = p p (12)

))}

(

| ( { ))}

(

| ( { )

(Z E V Z M s V E Z M s

Vp = p p + p p .

In (12), Ep(Z|m(s)) and Vp(Z|m(s)) are the predictive mean and variance for Z from Lm(s)(z|y). In principle we can derive L(z|y) as the marginal likelihood in (11). The advantage of (12) is that we are able to obtain Ep(Z) and Vp(Z) without actually deriving L(z|y).

Under the model (2) we can factorize fθ(y,z,m(s))= fµ,τ,ρ(y(s),z|m(s),m(s))fβ,σ(m(s),m(s)) and it is readily seen that applying Lp, given by (7), to the terms on the right hand side in (10) in fact gives us

)) ( , , ( max )

| ) ( ,

(z m s y f y z m s

Lp = θ θ , i.e.,

)

| ) ( ( )

| ( )

| ) ( ,

(z m s y L ( ), z y L m s y

Lp = ms p p . (13)

It follows that Ep(Z) and Vp(Z) based on Lp(z,m(s)|y) can be derived by (12). We note that Lc, given by (6), has the same property, i.e., Lc(z,m(s)|y)=Lm(s),c(z|y)Lc(m(s)|y).

3.2 Normal model

It is now assumed that model (2) holds with Yij and Mi normally distributed. We shall first consider the second likelihood in (10), L(m(s)|y), using the profile predictive likelihood Lp. Let tν(k)(∑) denote the k-dimensional multivariate t-distribution with ν degrees of freedom and variance-

covariance matrix Σ, i.e., tν(k)(∑) is the distribution of (U/W) ν where U ~Nk(0,Σ) and W2 has a chi-square distribution with ν degrees of freedom. Let X(s) be the vector of (xi:is). Then

)

| ) ( (m s y

Lp leads to a multivariate t-distribution, such that [ ( ) ˆ ( )]/ ˆ~ ( 0)( )

0 V

t s

X s

M −β σ nNn , where

the maximum likelihood estimators (MLE) are, with =

s

i i i

s x v x

W 2/ ( ),

=

s i

i i i s

x v x W1 m / ( )

βˆ , the

best unbiased estimator uniformly minimizing the variance, and ˆ2 1 ( ˆ )2/ ( ).

0 i s i i i

n m βx v x

σ = Σ

V = (vij) with vii=v(xi)+xi2/Ws and vij =xixj/Ws for i ≠ j. It follows that Ep(Mi)=βˆxi,

(10)

) / ) ( ˆ ( )

( 2 2 2

0

0 i i s

n n i

p M v x x W

V = σ + and the predictive covariances are given by

s j n i

n j i

p M M xx W

Cov ( , ) 2 ˆ2 /

0

0

= σ for i ≠ j. This implies that )

(

∉s

i i

p M

E = βˆXs (14)

⎟⎟⎠

⎜⎜⎝

⎛ +

= −

s

s s s

i i

p W

v X n

M n

V 2 2

0

0 ˆ

) 2

( σ

where

=

s

i i

s x

X andvs =

isv(xi).

Lc and Lmp (for Mi / v(xi),is), given by (8), lead to moments similar to (14) with n0 - 2 replaced by n0 - 5 and n0 - 4 respectively.

Let us now consider the first term in (10), Lm(s)(z|y) based on fµ,τ,ρ(y(s),z|m(s),m(s)). For this likelihood we will restrict attention to Lp, i.e., deriving Lm(s),p(z|y). The MLE µˆ,τˆ2,ρˆ can be expressed in the following way, with SSE=

∑ ∑

is jsi(yij yi)2:

− + − +

=

s

i i

i i s

i i

i

n y n

n n

ρ ρ ρ

µ ρ

ˆ 1 ˆ

ˆ / 1 ˆ

ˆ (15)

⎟⎟

⎜⎜

+

− + −

= −

∈s

i i

i i

n y SSE n

n ρ ρ

µ τ ρ

ˆ 1 ˆ

ˆ) ( 1 ˆ

ˆ 1

2 2

and ρˆ is found numerically, maximizing

).−(n/2)logτˆ2 −(1/2)

log(1+(n −1)ρˆ)−((nn0)/2)log(1−ρˆ

s

i i

When ni = c, for all is, then µˆ =y=

isyi /n0 , τˆ2 =SS/n, ρˆ =max(0,1−cc1SSESS ), where

∑ ∑

= i s j s ij

i y y

SS ( )2.

Consider first the case when ρ and τ are known. Then µˆ is given by (15) with ρ replacing ρˆ . In this case Lm(s),p(z|y) is such that Z is normally distributed with predictive mean and predictive variance

(11)

⎟⎟+

⎜⎜ ⎞

+ +

⋅ +

=

s

i i

s

i i

i i i

i i

p y m

n n n n

m s

m Z

E µ

ρ ρ µ ρ ρ ρ

ρ ˆ

- ˆ 1 -

1 - ) 1

( )) (

|

( (16)

2

1 2

- 1 ) 1 (

)) ( ,

| ( )) (

|

( ⎟⎟

⎜⎜ ⎞

⋅ +

− +

+

=

∑ ∑

+ is i i i s

i i

s

i n

p n

n n -

m m

s m y Z V s m Z V

i

i ρ ρ

τ ρ

ρ ρ

. (17)

Here, )V(Z|⋅ denotes the usual variance in the conditional distribution of Z. When ρ,τ are unknown, )

|

), (

( z y

Lms p will for large n0 be approximately such that Z is normally distributed with Ep(Z|m(s)) and Vp(Z|m(s)) given by (16) and (17) with ρˆ,τˆ2 replacing ρ,τ2. Recall that =

si

j ij i

i y n

y / and

ˆ) ˆ, ˆ, ˆ, ˆ,

ˆ (µ τ ρ β σ

θ = the MLE of θ =(β,σ,µ,τ,ρ). Then the conditional expected value of Z given the data, estimated at θˆ , is equal to

ˆ ) ˆ ( ˆ ) 1 ˆ

ˆ ˆ ˆ 1 ˆ

1 ˆ )(

( )

|

ˆ( i

s

i i s

i i i i

i

i y x

n n n n

m y

Z

E

∑ ∑

+ + + − +

− −

= µ β

ρ ρ µ ρ ρ ρ

ρ

θ (18)

Let )Vθˆ(Z|y denote the estimated conditional variance of Z given the data. It now follows, from (12) - (14), (16) - (18) that, approximately, Lp(z,m(s)|y) has

)

| ( )

(Z Eˆ Z y Ep = θ and

).

2 ˆ (

ˆ ˆ

ˆ ˆ 1 ˆ

ˆ) 1 ˆ ( ˆ

)

| ( ) (

2 2 2

2 2 2

ˆ 1 ˆ

2 ˆ

W h x W

X n

n X m

y Z V Z V

s s

i i

s s s

i i

i s i

s

i n

n p

i

i ⎟⎟+

⎜⎜

⎛ ⋅ + ⋅

⎥ +

⎢ ⎤

+

− − + +

=

∑ ∑

+

τ ρ µ

ρ σ ρ ρ

τ β

ρ ρ θ

(19)

Here,

⎟⎟⎠

⎜⎜ ⎞

− + +

=

s

i i

i i i

i n m n n

m y

Z

V ρ

ρ ρ

θ τ

) 1 ( )1 (

1 ) (

) 1 ( )

|

( 2

+ s

s i

i i s

s x x

X ρσ ν ρ β β µ σ ν

β

τ2( + 2 +

( −1))+ 2 2

and

1 . ˆ

ˆ ˆ ˆ ˆ

) ˆ

( 2 2 2

0 2 2

ˆ 0 ˆ 1 2 2

0

0 ⎟⎟

⎜⎜⎝

⎛ +

− ⋅

⎟⎟+

⎜⎜ ⎞

⎛ +

⎟⎟

⎜⎜

⎛ +

− ⋅

=

+

is

i s s s

s s s

i n

n x

v W k

n k W

v X n

k k

n k n h

i

i µ ρτ σ

σ τ

ρ ρ

(12)

The predictive likelihood

)

| ) ( ( )

| ( )

| ) ( ,

( ( ),

, z m s y L z y L m s y

Lpc = ms p c

leads to the same Ep(Z) while Vp(Z) equals (19) with h(5) instead of h(2). With )

| ) ( ( )

| ( )

| ) ( ,

( ( ),

, z m s y L z y L m s y

Lpmp = ms p mp

we get the same Ep(Z) and Vp(Z) equal to (19) with h(4).

Let )wˆi=(niρˆ)/(1−ρˆ+niρˆ . Writing the predictor ˆ ( | )

0 Eˆ Z y

Z = θ , given by (18), as

(

(1 ˆ )µˆ ˆ

)

(βˆ )µˆ

ˆ0=

∑ ∑

− + +

s

i i

s

i j s wi wiyi x

Z

i ( 20)

we see from (4) that predicting Z by Zˆ0means that for i∉ s each unobserved Yij is predicted by µˆ and Mi is predicted by βˆ . For ixi s, j si, Yij is predicted by wˆiyi+(1−wˆi)µˆ.This predictor shrinks the natural estimate yitowards µˆ . Using the representation (3) of the model, we note that

)) ( )

| ( /(

)

| ( ) 1

/(

) 1

( −ρ −ρ+niρ =VarYi µi Var Yi µi +Var µi . Hence, for i∈ s, the smaller Vari)is compared to Var(Yii), the more weight we put on µˆ to predict Yij for j ∉ si. Or, in other words, the smaller the variability is between the clusters compared to the variability within the clusters, the more

yishrinks towards µˆ .

3.3 Some optimality considerations

All three predictive likelihoods for the model (2), with normally distributed Yij and Mi, give the same predictor for the population total T,

0

0 ˆ

ˆ y Z

T i s j s ij

i

+

=

∑ ∑

with Zˆ0 given by (20).

The optimality considerations are conditional on s = },{s,si:is and Eθ(·) is usedto denote Eθ(·|s).

Let ={ˆ: ˆ =

∑ ∑

}

s

i j s ij ij

i

Y a T

l T be a class of "partially" linear predictors, where eachaij is a function of M(s). We shall restrict attention to the class of model-unbiased predictors in

l

, i.e.,

}.

, 0 ˆ )

( ˆ :

{ ∈ θ − = ∀θ

= T E T T

u l

l

(13)

We shall now consider the distribution-free model (2). The parameter estimates of (β,σ2) are still valid, βˆ now the best linear unbiased (BLU) estimator and 1 ˆ2

0 0 σ

n

n still unbiased. Regarding the MLEµˆ , given by (15), it is readily seen that with ρ replacing ρˆ , µˆ is the BLU estimator as also noted by Kelly and Cumberland (1990). What remains is to derive alternative estimators for ρ and τ2. Here one can use an ANOVA approach, as in Valliant et al. (2000, ch. 8.3) or Kelly and Cumberland (1990). When ni = c for all i ∈ s, these two ANOVA approaches yield the same estimators ρˆav,τˆav2 satisfying

⎟⎟⎠

⎜⎜ ⎞

− −

= −

0 0

2

1 ˆ 1

ˆ n n

SSE n

SSE SS

av c

avτ ρ

0

ˆ2

ˆ ) 1

( n n

SSE

av

av = −

−ρ τ .

It follows that, approximately, (for large n0 with(n0−1)/n0 ≈1) , τˆav2 =SS/n and ρˆav =1−cc−1SSESS ; the same as the MLE in the normal model.

With these new parameter estimates Tˆ0is clearly a reasonable predictor also for this distribution-free model, e.g., Kelly and Cumberland (1990) suggests using this predictor (see also Valliant et al., 2000, ch.8.9). The optimal procedure at θ, Tˆ , in θ lu is defined to be the predictor in lu that minimizes

)2

(Tˆ T

Eθ − for Tˆ∈lu. If Tˆ does not depend on θ it is uniformly optimal. θ

We see that, by using that E(Tˆ−T)=EE(Tˆ−T|M), with M = (M1,...,MN) Tˆ∈lu

∑ ∑

∈ ∈

=

s

i j saij X E

i

β

β( ) β ,

We note the following result.

Lemma 1. The optimal predictor Tˆ must be a member of the class θ } ), ( of function is

ˆ ; ˆ :

0 {T T bY bi M s i s

s i

i i u

u = ∈ =

l l

and

=

s i

i X

b

Eβ( ) β , β (21)

(14)

Proof. Using the rule V(Tˆ−T)=EV(Tˆ−T|M)+VE(Tˆ−T|M), we see that, with i

s

j ij

i a n

a

i

/

= ,

Tˆ∈luEθ(Tˆ−T)2 =Vθ(Tˆ−T)

= 2(1 ) [ 2] 2 ( i i)2

s i s

i j s aij E na

E

i

∑ ∑

+

−ρ θ ρτ θ

τ

- 2 2 ( i i)

s

i Mina

Eθ

ρτ +µ

)

2 (

s i s i

i niai M

V (22)

Here, ψ is a function of the parameters only. Since 2 i i2

s

j aij na

i

≥ , it follows that Tˆ must have aθ ij = ai, for all j∈si, and bi=niai. ♦

We restrict attention to the class Luof model-unbiased predictors in lu where each aijis a linear function of M(s). We note that TˆHT , given by (1), is a member of Lu. Then, from Lemma 1, it is sufficient to consider the class

} and

ˆ ˆ :

0 {

+

=

=

= i s

s j

j ij i

i i i u

u T T bY b c c M

L l .

From (21),

β

β =β ∀

, ) ˆ (

0 E b X

L T

s i

i

u

∑∑

=β ,∀β.

X x c c

s

i j s

j ij s

i i

Hence,

X x c c

L T

s

i j s

j ij s

i i

u ⇔ = =

∑ ∑∑

and ˆ 0

0 . (23) We note that Tˆ0 can be expressed as

sbi0Yi and bi0is linear in M(s). Tˆ0 satisfies (23) with ci = 0 and hence is model-unbiased when ρ is known (e.g., when ρ = 0) and approximately model-unbiased otherwise.

Lemma 2. The optimal predictor Tˆ in Lθ 0u minimizes with respect to c = (ci, i∈s; cij, i∈s, js), subject to condition (23),

Q(c) = 2 1( ( ) ( )2) 2 2 ( ) 2 [ ( i)]

s i

i i

s i

i i

i s i

M b V b

M E Eb

b

i V + −

+

µ ρτ

τ φ

where

ρ φ ρ

i

i i n

n +

= −

1 .

(15)

Proof. For Tˆ∈Lou, we see from (22), using (21),

=

− )2 (Tˆ T

Eθ 2(1 ) ( 2/ ) 2 [ ( i) ( i)2]

s i s

i bi ni V b E b

Eθ ρτ θ θ

ρ

τ −

+

+

- i

s

i Mib

Eθ

ρτ2

2 +µ

)

2 (

s i s i

i bi M

V

= (1 ) )[ ( ) ( ) ]

( 2 2 i i 2

s

i i

b E b

n ρ ρτ Vθ θ

τ − + +

- i

s

i Mib

Eθ

ρτ2

2 +µ

)

2 (

s i s i

i bi M

V

Result follows since (1−ρ)/ni+ρ=1/φi. ♦

Let now φs =

s iφ , )α=τ2/(τ2sµ2 and mˆi=(1−α)mi+αβˆxi. Then the following result holds.

Theorem. The optimal predictor at θ in Lu is given by

− +

= i s mi wi miwiyi

Tˆ ( ˆ (1 )ˆ )

ρ

θ µ + µˆρ

i∉sβˆxi (24)

i.e., Tθ y Zθ

s

i j s

ij

i

ˆ

ˆ =

∑∑

+

where

] ˆ )

) 1 ˆ ( ˆ [(

i s i

i mi wi miwiyi n y

Zθ =

− µρ + − + µˆρ

i∉sβˆxi. Here wi = ρφi andµˆρ =

sφiyis.

Remarks. (I) The optimal predictor at θ depends only on ρ and the coefficient of variation τ/µ, and is hence uniformly optimal in (µ,β,σ) if ρ and τ/µ are assumed known.

(II) The expression for Tˆ means that for iθs,

= mi

j yij

1 is estimated by (mˆi(1−wi)µˆρ +miwiyi), and for i∉s,

=

mi

j yij

1 is estimated by µˆρβˆxi, i.e. mi is estimated by βˆ and each yxi ij by µˆ . ρ (III) Letµˆi =(1−wi)µˆρ+wiyi. Then an alternative expression to (24) is:

= i smi i

Tˆθ µˆ + µˆρ

i∉sβˆxi+ R

Referanser

RELATERTE DOKUMENTER

There had been an innovative report prepared by Lord Dawson in 1920 for the Minister of Health’s Consultative Council on Medical and Allied Services, in which he used his

This represents the total for the cluster of trips in the selected stage 1 PSU (vessel) for the stratum. iv ) Raise from the PSUs (vessels) sampled at stage 1 to all the vessels

This paper presents the R package MitISEM (mixture of t by importance sampling weighted expectation maximization ) which provides an automatic and flexible two-stage method

The combined effect of these measures may well be a decline in jihadi activity in the short run, i.e., in the next two to five years. There are already signs that this is

This report presented effects of cultural differences in individualism/collectivism, power distance, uncertainty avoidance, masculinity/femininity, and long term/short

In this paper we examine the effect of sampling unit size, intra-cluster correlation and variable density on the precision of population estimates.. Based on an examination

It is important to note that this sampling scheme corresponds to a two-stage cluster sampling design (see, e.g. Skinner et al., 1989); the first stage is a sample of clusters of

The sampling method in the decremental approach can be expressed as a view selection problem and the optimized views imply a kind of best view which is representative of the