M1006.pdf (276.1Kb)

(1)

Theme Session M: Environmental and fisheries data management, access, and integration.

Measuring Uncertainty in Trawl Surveys:

Implementation in DATRAS

Lena Inger Larsen¹, Rainer Oeberst², David Maxwell³,Michael Pennington⁴, Henrik Sparholt¹, Hans Lassen¹

1ICES Secretariat, ²Bundesforschungsanstalt für Fischerei,Rostock, Germany ³CEFAS, ⁴Institute of Marine Research, Bergen, Norway

Abstract

The basic data from several ICES coordinated trawl and beam trawl surveys in the Baltic Sea, the North Sea, the area west of Scotland and France are stored in the DATRAS database in the ICES Secretariat; every year and for each survey member countries report data to the database. The

DATRAS system offers facilities for the calculation of fish stock abundance indices, these indices are fed into the routine fish stock assessment work. The indices should be supplemented by an estimate of their uncertainty and the present paper suggests a bootstrap algorithm for implementation in DATRAS to calculate such estimates of variation for each stock/area index in a given year. The procedure is a two-step bootstrap routine whereby the length compositions and age-length key are bootstrapped independently, i.e. the uncertainty estimate is based on variability between catches in individual hauls in that year. The goal is to supplement the routine indices with uncertainties estimates in the future.

Introduction

Basic data from several ICES coordinated trawl and beam trawl surveys in the Baltic Sea, the North Sea, the area west of Scotland and France are stored in ICES’ DATabase of TRAwl Surveys,

DATRAS, in the ICES Secretariat (Figure 1). Every year and for each survey, member countries report data, and indices of fish stock abundance are calculated and fed into the routine fish stock assessment work.

Procedures to calculate uncertainties of these indices are not developed, although such estimates would be very useful for the population modelling used in the assessment work. Uncertainty estimates of the abundance indices would also be very useful for evaluating individual surveys, e.g. to determine whether they should be intensified to improve the precision of the fish stock assessment models.

Therefore, the EU Commission requested ICES to implement uncertainty estimation in DATRAS and supplement the routine abundance indices with uncertainty estimates.

The terms “variance estimation” and “uncertainty of the abundance estimator” are used loosely. In the context of DATRAS, these terms refer to the confidence interval, i.e., the interval where we have, x%

falling below and x% falling above. There are therefore two decisions to be made:

• Decision on the percentage (x) corresponding to the tail of the distribution of the abundance index estimator;

• Decision on the procedure to calculate the distribution

Obviously, the first decision is arbitrary but it is required for consistency between years and between surveys. For convenience and to avoid excessive computer time it was decided that present the range for the abundance indices i.e. the difference between the quartiles: Q75% - Q25%.

Therefore, herein, focus is on the second question: how to calculate the distribution. This paper analyses statistical methods and their implementation.

(2)

We considered the technical advantages and disadvantages of a range of methods, including their suitability for implementation in DATRAS, making use of the work reported in:

• ICES Workshop on the Analysis of Trawl Survey Data (ICES 1992/D:6), which reviewed survey design and index definition;

• Nordic Council of Ministers Workshop on evaluations of fish stocks (Lassen, H. 1999. Ed.);

• The EVARES project (2003, EVARES - FISH/2001/02 - Lot 1) on evaluation of research surveys in relation to management advice;

• ICES (WKSCFMD) (2004), which reviewed analytical variance estimators, bootstrapping and modelling approaches;

• ICES Workshop on Survey Design and Data Analysis (WKSAD) (ICES 2005/B:07), which considered the effect of spatial structure of the population and provides information on geostatistical models.

In addition many individual scientists have provided input to the issue of estimating survey index variance (e.g. Pennington (1983) on the use of the Delta distribution where zero values are treated separately and positive values are assumed to follow a lognormal distribution and Petitgas (1993) on a geostatistical approach,).

Bootstrapping has been widely used within fisheries in recent years (see e.g. O’Brien et al 2001a, 2001b, Simmonds et al 2001). It is relatively easy to explain, and does not have many assumptions. The lack of assumptions on spatial distributional also suggests it will be robust to changes in spatial

distribution from year to year.

Implementing the bootstrap for survey catch numbers-at-age differs from standard examples as two bootstrap samples, age and length, are generated not one.

For many surveys the sampling design operates with many strata and thus few observations within each stratum. This is creating complications when trying to estimate variance, because the bootstrap

sampling will underestimate the real variance (if only two samples by a factor of two on average, and more than 20 samples is needed to avoid it). When there is only one sample for a stratum, analytical estimations will not be possible and bootstrap will of course give a variance of zero. Dealing with strata with few hauls is one of the main challenges in the estimating variance of survey indices.

(3)

Figure 1. Surveys in ICES area for which data are stored in DATRAS

Estimating the accuracy of the abundance estimator

The term variance estimation and uncertainty of the abundance estimator is used loosely above. In DATRAS, we are after the confidence interval, i.e. the interval where we have, say, 5% of falling below and 5% probability of being above.

In the analysis of which method would be the better we considered three approaches, the variance (2^nd order moment of the distribution), geostatistical approaches and various form of bootstrapping.

The variance

This calculation of the 2^nd order moment (variance) is widely used and is well known outside technical statisticians. Methods for calculating this estimator are developed for virtually all survey designs, but differ among designs, i.e. there would be a separate algorithm for each survey; known estimators include fixed station surveys. Even where there is no analytical solution for the variance estimator to the maximum likelihood estimation, this can be done numerically in a routine fashion. It is easy to implement and can run as a routine procedure.

Calculation of confidence limits would be of the type (2 is chosen for simplification of the formula)

[ ^Index ⁻ ² ^* ^std ^. ^dev ^; ^Index ⁺ ² ^* ^std ^. ^dev ]

The approach is not robust to distributions that are far from symmetrical around the abundance estimator and therefore often a transformation is used, the classical example being the logarithmic transformation.

Reflection of the survey design could be through a multiplicative model or similar

(4)

( )

tobeusedas thebasisfor calculating thestd.devrequiredabove log

μ

ε μ

V

roundfish rect

CPUE = + + +

This could be implemented by linking DATRAS to R.

Geostatistical variance estimates

Geostatistics is an expansion of the model described above which accounts for the geographical variance structure. The functional form of the variogram (variance as a function of distance between the stations) shall reflect the distribution of the species surveyed and is unlikely to be the same for all species. The functional structure of the variance would need stability between years for a routine implementation as the approach is not robust to changes in distribution between years; hence this approach may be difficult to implement on a routine basis. The analysis may involve transformation.

Geostatistics has many variants related to the underlying geographical structure of the population. A routine implementation would need to decide on one of these and experience has shown that the variogram need close scrutiny before a suitable model can be agreed. The technique is not well suited for robust routine application. Provided the geostatistics model chosen is not too complicated, the fundamentals are easily understood can be explained to non-statisticians.

The calculation of confidence limits will be similar as described above.

An implementation might be to link DATRAS to an appropriate statistical package (e.g. R or SURFER).

Bootstrapping the observations

Bootstrapping overcomes the problems with asymmetric distributions and allows a direct estimation of the confidence limits.

Bootstrapping is based on the assumption that the samples available are indicative for the population distribution (similar assumptions as embedded under section 4.1). The bootstrap procedure shall reflect both the design and the assumption of the variance population structure on which the survey design is based.

Bootstrapping can be implemented as a routine, but must reflect the design, i.e. there would be a separate algorithm for each survey. Implementation would require special software to be written to reflect the survey design and assumption of stock distribution.

Appropriateness for DATRAS

We considered the technical advantages and disadvantages of a range of methods, including their suitability for implementation in DATRAS, making use of work reported in ICES WKSCFMD (2004), which reviewed analytical variance estimators, bootstrapping and modelling approaches and ICES WKSAD (2005), which considered the effect of spatial structure of the population and provides information on geo-statistical models. The summary table from WKSCFMD is reproduced at the end of this discussion.

The design-based approaches, bootstrapping and analytical calculations, scored well on issues relating to communication. Both have been widely used within fisheries, are relatively easy to explain, and do not have many assumptions. The lack of distributional assumptions also suggests they will be robust to changes in catch distribution.

Bootstrapping has technical advantages over analytical calculations because asymmetric distributions do not cause problems when calculating a confidence interval and covariance between ages is part of the output. A further issue with the usual analytical calculations is that they are based on the

assumption of random (or stratified random) sampling, so they are not strictly valid for fixed stations designs.

(5)

The major concerns with bootstrapping are that strata with few observations can lead to poor estimates of variance and therefore a need to combine strata.

For age sampling, the survey protocols suggest there should be enough age samples per length group.

We studied if this is true in two ways: by viewing a selection of ALK’s and for all surveys by

calculating the proportion of length groups, above a relevant minimum length, in each ALK with only one age sample and with fewer than five age samples.

ALK from cod in BITS and NS-IBTS in first quarter 2006 is taken as an example in this report. It can been seen that, at least in NS-IBTS, a large proportion of the length classes are only represented by one sample (Table 1) and only few length classes are represented with five or more samples (Table 2).

In some Roundfish areas, 100 percent of the length classes have been sampled less than five times.

Table 1. The table shows number of length classes where there have only been one age sample, the total number of length classes with samples, and the percentage of length classes with one sample of the total sample.

Sur Year Quar pecie rea o of one sample

Total sample

ercen sample vey ter S s A N

s P of total

tage s BITS 2006 1 Gadus

morhua

22 1 49 2

49 2

6 1 49 2

28 2 43 4

NS-IBTS 2006 1 Gadus morhu

1 8 27 29 24

2

1

a

9 12 75

8 23 34

4 9 19 47

6 11 14 78

7 6 26 23

9 1 49 2

2 3

able 2. The table shows number of length classes where there have been less than five age samples, the umber of length classes with samples, and the percentage of length classes with one sample of the total

Year Quarter Species Area No of less than

ve sample

Total samples

Percentage of total

mple T

total n sample.

Survey

fi s

sa s BITS 2006 1 Gadus

morhua

22 9 49 18

24 7 14

25 1 49 2

49 8

8 8 43 18

NS-IBTS 2006 1 Gadus morhua

1 21 27 77 49

26 2

4

2 12 12 100

3 15 23 65

13 19 68

14 14 100

7 18 26 69

9 17 49 34

4 6

(6)

Possible model d app hes rang from gener inear m ls (GL of surv pue, to geostatistical mode which in orate sp ial structur to more plex l hood-b d or Bayesian

ierarchical models. The GLM and geo-statistics approaches generally model indices which have

be more flexible but

Conclusions

ll the methods have advantages and disadvantages: the usefulness of geo-statistical methods has been demonstrated for individual analyses, Bayesian hierarchical models are technically very strong and

them should be promoted. But overall, the bootstrap approach was considered

Bayesian

Strata must be a partition of the space.

Resampling unit must be independent.

Advantages

Explicit, identify variance due to age and due to length, can derive statistics to analyse sampling design.

Non-parametric, can deal with complex processes, simple concept, estimates covariance.

Explicit, deal with complex situations, id var comps, estimations of uncertainty, parameters can have biological interest, can include expert knowledge.

Idem frequentist model, easier to deal with missing observations, include more complex expert knowledge and different sources of data.

Disadvantages

It becomes extremely complex to apply to more than 1 strata situation, no covariance between ages.

Sensitive to low number of samples in strata which

-base roac e al l ode Ms) ey c

ls corp at e, com ikeli ase

h

already been calculated by age, while more complex models better represent the sampling process by including the variation due to sub-sampling age and estimating an age-length relationship. This ability to capture all the sources of uncertainty is particularly strong in Bayesian models.

Routine implementation was an issue for modelling approaches, particularly geo-statistical models.

Geo-statistical models cannot be properly implemented on all surveys in DATRAS as several use ndomised not systematic sampling schemes. Other modelling approaches would

ra

could require different model set-ups for different surveys and species. This could lead to debate over the appropriateness of a model for any particular survey, and could make explaining the approach to non-scientists more difficult. Expert knowledge would be needed to run modelling approaches as the model fit and assumptions need to be checked.

Table 3. Comparison of methods – Summary. Table 6.1 from WKSCFMD 2004

Analitical Non-parametric

bootstrap Frequentist

A

further development of

most appropriate for routine analysis in DATRAS. It should be technically adequate, relatively straightforward to implement and easy to explain. It will represent the calculations currently used to produce survey indices and a consistent definition of bootstrap sampling should be possible across all the surveys in the system.

can underestimate variance or produce biased estimates due to merging of strata.

Complex assumptions, requires model testing and fitting, different sampling schemes and stocks may require different models.

Idem frequentist model, more difficult to implement, MCMC convergence problems.

Implementation Simple Simple, uses simulations. Complex. More complex, uses simulations.

Example (ref) WD1

Assumptions

WD 4, 5, 6, 7 & 8

Distributions and relationships between variables.

Desi gn- based Model - based

Sample representative of the population, sampling scheme unbiased

(7)

How to bootstrap in DATRAS

Abundance Index

the calculation covers a so-called index area. This the area within which hauls are considered and the average over the length compositions is done. In

d the entire survey area are identical. The age-length keys are aggregated out restricting the data to the index area.

ach survey and within each survey for each species E

is

several cases the index area an by a set of “otolith areas” with

The abundance index I is in principle calculated based on

∈

∑

aul (

•

=

length

l l

a l l

a

m

n m

I *

^,

∑

Where

n

_l is the length omposition observed (#/hr) in the h c

∈

=

length l

nl

CPUE ) and m_l_,_a is the

r works in four steps: 1) First, the average length composition by sub-areas h Sea beamtrawl s , 3) t stributions calculated per sub-area in step 1 are raised to age compositions using

tep 2, and 4) these age compositions are averaged over an index area (e.g. the entire

e individual length groups) thereby maintaining covariance

d

Age 1 Age 2 Total

age-length key.

The abundance estimato

(e.g. rectangles) is calculated, 2) In parallel, ALK’s are aggregate of “oto areas”

(e.g. rou fish areas, the phrase “otolith area” is used by the Nort urvey) he average length di

ALK’s found in s

d on a separate set lith nd

North Sea), i.e. ignoring length compositions that refer to sub-areas outside the index area. This later step means that some rectangles fished will not be considered in the index calculation because the rectangles are not part of the index area.

This suggests that the haul information in a survey shall be considered as two components: the length frequency distribution of the catch and the age-length keys.

These components are bootstrapped independently. The bootstrapping approach chosen is the naïve approach (Lehtonen, R. and Pahkinen, E. 2004). The haul bootstrap unit are the entire length

istribution of a haul (not bootstrapping th d

between the length groups within a haul. The ALKs are bootstrapped as individual aged fish length group by length group.

Bootstrapping the length frequency distribution of the catch by haul will consider how the hauls are distributed within an area. As discussed below a certain amount of pooling among sub-areas is desirable and there are therefore two steps in this bootstrap procedure, 1) selecting a haul and 2) allocating this haul to a sub-area.

Also, for the age-length keys although aggregated over a larger area there is a need for pooling over length classes but the sampling intensity, i.e. number of fish per length class that is aged is maintaine in the bootstrap. This is illustrated in the text table below

ength Class Age 0 L

30 1 7 3 11

31 0 9 3 12

32 0 8 2 10

30-32 1 24 8 33

The sample from whi the bootstrap is sa led is the 30-32 cm roup while the bo pped ALk maintain the ind ual length classes 30,31 and 32 cm th their sampling sities 11,12 10 fish.

2 below show e bootstrapping d low; the flow is d ribed in the follow section.

ch mp g otstra

will ivid , i.e. wi inten

and

Figure s th ata f esc ing

(8)

NS-IBTS, EVHOE, SC, BTS

Pool hauls over

Combine ALK length classes into

e.g. 5 cm length

areas classes

BITS Bootstrap hauls with respect to CPUE by length

Bootstrap hauls with respect to CPUE by length

Re-allocate hauls to sub-areas

Bootstrap ALK within an area

Re-allocate ALK to length classes

Calculate CPUE per age and haul

Calculate indices

Areas: Roundfish area, otolith area, Scottish sampling areas, EVHOE areas

Sub-areas: Depth stratum, statisical rectangles

SC: Scottish groundfish survey

Calculating confidence interval of all the

indices

* 1000

Based on 1000 indices

* 1000

Figure 2. Bootstrapping data flow

Length distribution bootstrapping

nly hauls within the index area for a given species are used in the bootstrapping.

The number of hauls within each sub-area will generally be kept when bootstrapping. For those surveys a with many hauls per stratum (generally 5 to 20) at stratum and the approach is straightforward. For rveys like the North Sea IBTS and BTS, which operate with rectangles as strata (sub-areas) for index

p sample will be

ability distribution for allocation of auls to sub-areas (rectangles).

O

where the index calculation is done by large strat hauls, we draw the bootstrap from the hauls in th su

calculation, too few hauls are available within a stratum (rectangle) and the bootstra drawn from a larger pool of hauls i.e. by RoundFish Area (RF).

The abundance index calculation requires that each haul is assigned to a sub-area (rectangle) and the bootstrap therefore needs a second step to allocate the selected haul to a sub-area. Because we may not want to bootstrap exactly the number of hauls that were originally taken we have interpreted the allocation of hauls by sub-areas that are in the sample as the prob

h

Table 4. Overview of sampling strata and pooling strata

Survey Species Sampling strata Pooling strata

NS-IBTS All Statistical rectangle Roundfish area combined with index area

BITS All SubDiv + depth strata SubDiv + depth strata EVHOE All EVHOE area + depth strata Combined depth strata of

same depth within Celtic

(9)

Survey Species Sampling strata Pooling strata Sea or Bay of Biscay BTS All Statistical rectangle Oto

Scottish groundfish survey Cod, whiting, haddock, monkfish

Scottish demersal sampling area

lith areas

s s

Bootstrap age-length keys by length class

The Age-Le for all grega amples

com er area (

is assumed for most surveys that the individuals for the ALK are randomly taken from all parts of the area. Furthermore, this sampling regime for ALK data assumes that the age distribution of a length

f the sampling area.

able 5. Overview of ALK sampling areas ngth-Keys

bined over a larg

the surveys are an ag Table 5).

tion of individual s from a haul

It

class does not significantly different within the different parts o

T

Survey Area of ALK

BITS ICES subdivision IBTS Roundfish area BTS Otolith areas Scottish groundfish survey Demersal sampling area

EVHOE EVHOE areas

The sampled numbers of fi s are used as basis for bootstrapping the ALK’s. Analyses data which are availa S have shown that in many cases the number of aged fish per h class is significantly he required number for bootstrapping (see Tables 1 and 2).

e, pooling to const m which to bootstrap is necessary. We pool data in m, by 1 cm classes. This pooling does not solve the

al not available in all length intervals and a length plus

low. The length h interval is larger sh per length clas

of the ble in DATRA

lengt lower than t

Therefor ruct the sample fro

length classes of 2 or 5 c roblem of large individu

the data are originally s where aged fish are p

group is included. All individuals which are larger than a defined length are summarized in this plus group. This length plus group will be used for the length frequency and the ALK.

The length where the plus group begins can be different from species to species, from survey and to survey, and from year to year due to different developments of the stock size and the progress of rebuilding of the stocks.

One possible way to define length class, l+, where the plus group begins is given be lus group starts when the sum of total aged fish beginning with the smallest lengt p

than defined fraction B of the total number of aged fish. The length l+ is defined by X

l

X B

l >

∑

⁺ ^l

= .

here Xl. denotes the total number of aged fish in length class l, and X represents the total number of all length classes.

s will be defined based on data analyses of the different surveys and after discussions w the working group which coordinates the survey.

..

1 .

w

aged fish over The length plus group

ith

(10)

Number of bootstraps

he length CPUE and the ALK data will be bootstrapped 1024 tim T

an

es as recommended by Lehtonen d Pahkinen (2004).

lit ax – min value of the bootstrapping indices into 20 equal intervals. From

nfidence intervals can easily be given. There are a variety of e intervals (see for example, Efron and Tibshirani, 1993). We

d

Performance of the bootstrap procedure

Presentation of results.

For each index value, the entire length distribution of the bootstrap indices are given with bin size sp into 20, by splitting the m

this, user specified needs like CV and co methods available to calculate confidenc

feel that this choice will not greatly change the relative results for different surveys or years and therefore propose that the percentile method is implemented in DATRAS. This is the simplest metho available. For example, for a 90% confidence interval based on 1024 values, it involves reporting the 50th and 950th largest values.

Simulation Framework for Investigating Bootstrap

To illustrate the effects and as a background for discussing the choices made in the implementation of rectangles

Performance

the uncertainty calculation implemented in DATRAS we investigated a system with 16 arranged in 4*4 quadrant.

,

2

,

R _within

R

CPUE σ

n

The system is desc

• The number of haul per rectangle

n

_R – total number of hauls = ribed by

∑

R

n

R

ithin each rectangle

• The mean CPUE w

CPUE

R

• The between rectangle variance

σ

²

R is the gle R) is drawn from rectangular distribution with

a me o to 4 hauls, i.e. for

x N

• The within rectangle variance

σ

_wit² _hin; and

Between

rectangle;

n

_R (the number of hauls in rectan

∈

(x is a numeral) an f two hauls per rectangle and a range of 0

{

=

}

=_⎪⎩ ∈

[ ]

2

* 2

; 2 0

* 2

Prob n_R x ⎪ dx forx

⎨

⎧0 if x>2*2 orx<0

(11)

CPUE

R is the mean CPUE in rectangle R and

CPUE

is the overall mean CPUE from the area.

The realized

CPUE

R is drawn from a lognormal distribution.

{ }

_⎟

⎟ ⎠

⎠

⎛ ²

* 5 . 0 Pr

Between

CP

ob

σ

^⎟

⎜ ⎞

⎛− − ⎟⎞

=

= log log

exp log *

R

CPUE x

x x d

UE ⎜

⎝ ⎜

⎝

Between 2

π

σ

^⎜

The between rectangle variance is ²

¹ ^* _∑ ( )

²

∑ ⁻

= CPUE CPUE

n

_R ^R

Between

σ

The observed CPUE_i,_R of haul i in rectangle R is drawn from log-normal distribution with a cut-

om the same population as is used for the Age-length key as escribed below.

he ALK is generated as follows

etical ALK is calculated based on

o Growth rates are von Bertalanffy (L-inf =45 ;k =0.2 ; t0 =-0.5 ) strength are predefined

al with mean defined by the von nce

off twice the standard deviation.

Length dist and Age-Length ys (ALK)

The length distribution is drawn fr

ributions ke

d T

• There is only one ALK corresponding to the entire area

• The theor

o The age groups

o The length distribution for given age is log-norm

(

Bertalanffy growth and with a varia _Growth =0.15

)

• This e

σ

2

th oretical distribution is then rearranged to provide

{ ⁿ } ^for ^given ^length ^l

ob

_a,_l _L

al ALK is drawn from this theoretical dist

m pr

• The actu ength group and with

order to compare the performance of the bootstrap procedure the results were compared with the ey as described above. All simulations were done with 1024 simulated

rate the theoretical solution while all bootstraps were done with 1024 plicates. The ALK were generated with 16 fish per length class, there were age data for all length

ribution for each l

( ⁼ ¹² )

n

L and

n

L

( ⁼ ¹⁶ )

fish per length group

Simulation Results

In

result of a simulation of the surv realizations of the survey to gene re

classes.

The results are reported as “correction factor” which is the ratio

simulation Replicate

Bootstrap heoretic

T c

al results indicates that the bootstrap sample and the number of hauls/aged fish in a length lass may preferable be different, for the results reported below

otstrapped = no of fish ag d

Ho o UE is the same for all

ec n

we have used

• Hauls bootstrapped = no of hauls observed – 1

• For each length class in the ALK, No of aged fished bo e

m geneous area, i.e. the mean CP

r ta gles

(12)

The mean

CPUE

R

= CPUE

and all rectangles are fished with 2 hauls.

Correction Factor for Median

1.5

Age Distribution

0 0.5 1

0 1 2 3 4 5 6 7 8

Age

Ratio between teoretical and bootstrapped

Pooled Sampling Rectangle Sampling

Figure 3. Correction factor (=Bootstrapped range/Simulated (replicate) range)

This graph suggests that the bootstrap procedure is perhaps slightly biased towards too low values.

Whether we sample the individual rectangles or pooling is of little difference.

Investigating the Range (=[Q25%;Q75%]) for the length distribution suggest that bootstrap based on an s a distinct

nderestimation of the range for abundance estimate by length group. In this case pooling is completely

e ALKs the rectangle sampling would more seriously underestimate the range most for rectangle sampling th the pooled sampling. However, also for the case of the pooled sampling there i

u

valid as there is no difference between the rectangles. However, when adding the ALK variation the pooled sampling procedure provides range estimates that are close to theoretical values in spite of the underestimate of the range for the underlying length compositions, i.e. the variance within th

are dominating. It is clear that the rectangle sampling is underestimating the theoretical range.

(13)

Correction factor for Range Age distribution

0 0.5 1 1.5 2

0 1 2 3 4 5 6 7 8

Age

Ratio between bootstrapped and teoretical range Pooled Sampling

Rectangle Sampling

Figure 4. Correction factor for homogeneous area. Comparison between pooled bootstrap sample and bootstrapping individual rectangles.

The above results are in accordance with general theoretical results and we concluded that the pooled approach is preferable and we did not investigate the “rectangle” bootstrapping further.

Inhomogeneous area

The following only considers the pooled bootstrap procedure and investigate the performance of this procedure when the area is inhomogeneous, i.e. that the abundance density varies between rectangles within the area.

The inhomogenety is measured as the ratio between the log-value standard deviation between and within rectangles. Figure 5 shows these standard deviations for age 1 herring and whiting from the IBTS survey for 1991-1994.

(14)

IBTS Herring Age 1 for 1991-1994 Standard deviation log-values

0 0.5 1 1.5 2 2.5 3 3.5 4

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

199119911991199119911991199119911991199219921992199219921992199219921992199319931993199319931993199319931993199419941994199419941994199419941994

Within Rect Between Rect

Figure 5a. Cpue data from Herring from IBTS 1991-1994. Standard deviation of log(cpue) for age 1 ignoring hauls without herring. The data are given by roundfish area (1 to 9) and by year.

(15)

IBTS Whiting Age 1 for 1991-1994 Standard deviation log-value

0 0.5 1 1.5 2 2.5

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

199119911991199119911991199119911991199219921992199219921992199219921992199319931993199319931993199319931993199419941994199419941994199419941994

Within Rect Between Rect

Figure 5b. Cpue data from Whiting from IBTS 1991-1994. Standard deviation of log(cpue) for age 1 ignoring hauls without whiting. The data are given by roundfish area (1 to 9) and by year.

Taking an overall average from Figure 5 suggest that the ratio Standard deviation (between rectangles)/Standard deviation (within rectangles) is about 1.1 for whiting and 1.4 for herring.

The inhomogenity of an area is measured in the variation between sub-areas of the mean CPUE within a sub-area (rectangle). This is simulated as a log-normal distribution of the mean CPUE in each rectangle where all rectangles have the same mean value but the standard deviation in the log-normal distribution (a proxy for the CV) is varying. Figure 6 below illustrates the results of a bootstrapped range calculation for a simulated survey with a standard deviation (log-values) of 0, 0.5, 1.0, 1.5, and 2.0.

The pooled procedure does not show any apparent bias up to a log-standard deviation of 2

(16)

Correction Factor by area inhomogenity

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

0 1 2 3 4 5 6 7 8

Age Ratio between Bootstraped and teoretical Range

0 0.5 1 1.5 2

Figure 6. Correction factor comparing the bootstrapped range (pooled sample over the area for bootstrapping) with the range calculated by simulation (replicates) of the survey.

References

Efron, B and R.J. Tibshirani, (1993) An Introduction to the Bootstrap. CRC Press, Boca Raton.

ICES(2004) Report of the Workshop on Sampling and Calculation Methodology for Fisheries Data (WKSCMFD), 26–30 January 2004, Nantes, France. ICES CM 2004/ACFM:12

ICES(2005) Report of the Workshop on Survey Design and Data Analysis (WKSAD), 9–13 May 2005, Sète, France. ICES CM 2005/B:07.

Lehtonen, R. and E. Pahkinen. 2004."Practical Methods for design and Analysis of Complex Surveys".

John Wiley & Sons Ltd. ISBN 0-470-84769-7.

ISDBITS. 2001. Improvement of Stock Assessment and Data Collection by Continuation, Standardisation and Design Improvement of the Baltic International Bottom Trawl Survey for Fishery Resource Assessment, Final Report. EU Project No. 98/099

Pennington M.R. and M.D. Grosslein, 1978. Accuracy of abundance indices based on stratified random trawl surveys. ICNAF Res. Doc. 78/IV/77 : 42 p.

O'Brien C.M., C.D. Darby, D.L. Maxwell, B.D. Rackham, H. Degel, S. Flatman, M.A. Pastoors, E.J.

Simmonds and M. Vinther (2001a). The precision of international market sampling for North Sea plaice (Pleuronectes platessa L.) and its influence on stock assessment. ICES CM 2001/P:13.

O'Brien C.M., C.D. Darby, B.D. Rackham, D.L. Maxwell, H. Degel, S. Flatman, M. Mathewson, M.A.

Pastoors, E.J. Simmonds and M. Vinther (2001b). The precision of international market sampling for North Sea cod (Gadus morhua L.) and its influence on stock assessment. ICES CM 2001/P:14.

Pennington, M. 1983. Efficient estimators of abundance, for fish and plankton surveys. Biometrics, 39:

281–286.

Petitgas, P. 1993. Geostatistics for fish stock assessments: a review and an acoustic application. ICES Journal of Marine Science, 50: 285–298.

(17)

Simmonds E.J., C. L. Needle, , H Degel, S Flatman, C. M. O'Brien, M. A. Pastoors, A. P. Robb and M.

Vinther (2001).The precision of international market sampling for North Sea herring and its influence on assessment. ICES CM 2001/P:21.

ICES 1992. The report of the ICES Workshop on the Analysis of Trawl Survey Data. ICES CM 1992/D:6.

Lassen, H. and K. Nygaard (eds) 1999. Metoder til vurdering af fiskebestande. Nordisk Ministerråd.

DIVS 1999:813. [Part of it in Nordic languishes and part of it in English].

Beare, D., Castro, J., Cotter, J., van Keeken, O., Kell, L., Laurec, A., Mahé, J-C, Moura, O., Munch- Petersen, S., Nielsen, J. R., Piet, G., Simmonds, J., Skagen, D., and Sparre, P. J. (2003).

Evaluation of research surveys in relation to management advice (EVARES - FISH/2001/02 - Lot 1) Final Report to European Commission Director-General Fisheries.

M1006.pdf (276.1Kb)

Measuring Uncertainty in Trawl Surveys:

Implementation in DATRAS

Abstract

Introduction

Estimating the accuracy of the abundance estimator

The variance

[ Index − 2 * std . dev ; Index + 2 * std . dev ]

( )

μ

ε μ

Geostatistical variance estimates

Bootstrapping the observations

Appropriateness for DATRAS

Conclusions

How to bootstrap in DATRAS

Abundance Index

∑

=

m

n m

I *

∑

n

Length distribution bootstrapping

Bootstrap age-length keys by length class

∑

Number of bootstraps

Performance of the bootstrap procedure

Presentation of results.

Simulation Framework for Investigating Bootstrap

Performance

,

,

CPUE σ

n

n

∑

n

CPUE

σ

x N

σ

n

∈

{

}

[ ]

CPUE

CPUE

CPUE

{ }

σ

π

σ

1 * ∑ ( )

∑ −

= CPUE CPUE

n

σ

Length dist and Age-Length ys (ALK)

ributions ke

(

)

σ

{ n } for given length l

ob

m pr

( = 12 )

n

n

( = 16 )

Simulation Results

Ho o UE is the same for all

ec n

m geneous area, i.e. the mean CP

r ta gles

CPUE

= CPUE

Correction Factor for Median

[ ^Index ⁻ ² ^* ^std ^. ^dev ^; ^Index ⁺ ² ^* ^std ^. ^dev ]

¹ ^* _∑ ( )

∑ ⁻

{ ⁿ } ^for ^given ^length ^l

( ⁼ ¹² )

( ⁼ ¹⁶ )