Theme Session M: Environmental and fisheries data management, access, and integration.
Measuring Uncertainty in Trawl Surveys:
Implementation in DATRAS
Lena Inger Larsen1, Rainer Oeberst2, David Maxwell3,Michael Pennington4, Henrik Sparholt1, Hans Lassen1
1ICES Secretariat, 2Bundesforschungsanstalt für Fischerei,Rostock, Germany 3CEFAS, 4Institute of Marine Research, Bergen, Norway
Abstract
The basic data from several ICES coordinated trawl and beam trawl surveys in the Baltic Sea, the North Sea, the area west of Scotland and France are stored in the DATRAS database in the ICES Secretariat; every year and for each survey member countries report data to the database. The
DATRAS system offers facilities for the calculation of fish stock abundance indices, these indices are fed into the routine fish stock assessment work. The indices should be supplemented by an estimate of their uncertainty and the present paper suggests a bootstrap algorithm for implementation in DATRAS to calculate such estimates of variation for each stock/area index in a given year. The procedure is a two-step bootstrap routine whereby the length compositions and age-length key are bootstrapped independently, i.e. the uncertainty estimate is based on variability between catches in individual hauls in that year. The goal is to supplement the routine indices with uncertainties estimates in the future.
Introduction
Basic data from several ICES coordinated trawl and beam trawl surveys in the Baltic Sea, the North Sea, the area west of Scotland and France are stored in ICES’ DATabase of TRAwl Surveys,
DATRAS, in the ICES Secretariat (Figure 1). Every year and for each survey, member countries report data, and indices of fish stock abundance are calculated and fed into the routine fish stock assessment work.
Procedures to calculate uncertainties of these indices are not developed, although such estimates would be very useful for the population modelling used in the assessment work. Uncertainty estimates of the abundance indices would also be very useful for evaluating individual surveys, e.g. to determine whether they should be intensified to improve the precision of the fish stock assessment models.
Therefore, the EU Commission requested ICES to implement uncertainty estimation in DATRAS and supplement the routine abundance indices with uncertainty estimates.
The terms “variance estimation” and “uncertainty of the abundance estimator” are used loosely. In the context of DATRAS, these terms refer to the confidence interval, i.e., the interval where we have, x%
falling below and x% falling above. There are therefore two decisions to be made:
• Decision on the percentage (x) corresponding to the tail of the distribution of the abundance index estimator;
• Decision on the procedure to calculate the distribution
Obviously, the first decision is arbitrary but it is required for consistency between years and between surveys. For convenience and to avoid excessive computer time it was decided that present the range for the abundance indices i.e. the difference between the quartiles: Q75% - Q25%.
Therefore, herein, focus is on the second question: how to calculate the distribution. This paper analyses statistical methods and their implementation.
We considered the technical advantages and disadvantages of a range of methods, including their suitability for implementation in DATRAS, making use of the work reported in:
• ICES Workshop on the Analysis of Trawl Survey Data (ICES 1992/D:6), which reviewed survey design and index definition;
• Nordic Council of Ministers Workshop on evaluations of fish stocks (Lassen, H. 1999. Ed.);
• The EVARES project (2003, EVARES - FISH/2001/02 - Lot 1) on evaluation of research surveys in relation to management advice;
• ICES (WKSCFMD) (2004), which reviewed analytical variance estimators, bootstrapping and modelling approaches;
• ICES Workshop on Survey Design and Data Analysis (WKSAD) (ICES 2005/B:07), which considered the effect of spatial structure of the population and provides information on geo- statistical models.
In addition many individual scientists have provided input to the issue of estimating survey index variance (e.g. Pennington (1983) on the use of the Delta distribution where zero values are treated separately and positive values are assumed to follow a lognormal distribution and Petitgas (1993) on a geostatistical approach,).
Bootstrapping has been widely used within fisheries in recent years (see e.g. O’Brien et al 2001a, 2001b, Simmonds et al 2001). It is relatively easy to explain, and does not have many assumptions. The lack of assumptions on spatial distributional also suggests it will be robust to changes in spatial
distribution from year to year.
Implementing the bootstrap for survey catch numbers-at-age differs from standard examples as two bootstrap samples, age and length, are generated not one.
For many surveys the sampling design operates with many strata and thus few observations within each stratum. This is creating complications when trying to estimate variance, because the bootstrap
sampling will underestimate the real variance (if only two samples by a factor of two on average, and more than 20 samples is needed to avoid it). When there is only one sample for a stratum, analytical estimations will not be possible and bootstrap will of course give a variance of zero. Dealing with strata with few hauls is one of the main challenges in the estimating variance of survey indices.
Figure 1. Surveys in ICES area for which data are stored in DATRAS
Estimating the accuracy of the abundance estimator
The term variance estimation and uncertainty of the abundance estimator is used loosely above. In DATRAS, we are after the confidence interval, i.e. the interval where we have, say, 5% of falling below and 5% probability of being above.
In the analysis of which method would be the better we considered three approaches, the variance (2nd order moment of the distribution), geostatistical approaches and various form of bootstrapping.
The variance
This calculation of the 2nd order moment (variance) is widely used and is well known outside technical statisticians. Methods for calculating this estimator are developed for virtually all survey designs, but differ among designs, i.e. there would be a separate algorithm for each survey; known estimators include fixed station surveys. Even where there is no analytical solution for the variance estimator to the maximum likelihood estimation, this can be done numerically in a routine fashion. It is easy to implement and can run as a routine procedure.
Calculation of confidence limits would be of the type (2 is chosen for simplification of the formula)
[ Index − 2 * std . dev ; Index + 2 * std . dev ]
The approach is not robust to distributions that are far from symmetrical around the abundance estimator and therefore often a transformation is used, the classical example being the logarithmic transformation.
Reflection of the survey design could be through a multiplicative model or similar
( )
tobeusedas thebasisfor calculating thestd.devrequiredabove logμ
ε μ
V
roundfish rect
CPUE = + + +
This could be implemented by linking DATRAS to R.
Geostatistical variance estimates
Geostatistics is an expansion of the model described above which accounts for the geographical variance structure. The functional form of the variogram (variance as a function of distance between the stations) shall reflect the distribution of the species surveyed and is unlikely to be the same for all species. The functional structure of the variance would need stability between years for a routine implementation as the approach is not robust to changes in distribution between years; hence this approach may be difficult to implement on a routine basis. The analysis may involve transformation.
Geostatistics has many variants related to the underlying geographical structure of the population. A routine implementation would need to decide on one of these and experience has shown that the variogram need close scrutiny before a suitable model can be agreed. The technique is not well suited for robust routine application. Provided the geostatistics model chosen is not too complicated, the fundamentals are easily understood can be explained to non-statisticians.
The calculation of confidence limits will be similar as described above.
An implementation might be to link DATRAS to an appropriate statistical package (e.g. R or SURFER).
Bootstrapping the observations
Bootstrapping overcomes the problems with asymmetric distributions and allows a direct estimation of the confidence limits.
Bootstrapping is based on the assumption that the samples available are indicative for the population distribution (similar assumptions as embedded under section 4.1). The bootstrap procedure shall reflect both the design and the assumption of the variance population structure on which the survey design is based.
Bootstrapping can be implemented as a routine, but must reflect the design, i.e. there would be a separate algorithm for each survey. Implementation would require special software to be written to reflect the survey design and assumption of stock distribution.
Appropriateness for DATRAS
We considered the technical advantages and disadvantages of a range of methods, including their suitability for implementation in DATRAS, making use of work reported in ICES WKSCFMD (2004), which reviewed analytical variance estimators, bootstrapping and modelling approaches and ICES WKSAD (2005), which considered the effect of spatial structure of the population and provides information on geo-statistical models. The summary table from WKSCFMD is reproduced at the end of this discussion.
The design-based approaches, bootstrapping and analytical calculations, scored well on issues relating to communication. Both have been widely used within fisheries, are relatively easy to explain, and do not have many assumptions. The lack of distributional assumptions also suggests they will be robust to changes in catch distribution.
Bootstrapping has technical advantages over analytical calculations because asymmetric distributions do not cause problems when calculating a confidence interval and covariance between ages is part of the output. A further issue with the usual analytical calculations is that they are based on the
assumption of random (or stratified random) sampling, so they are not strictly valid for fixed stations designs.
The major concerns with bootstrapping are that strata with few observations can lead to poor estimates of variance and therefore a need to combine strata.
For age sampling, the survey protocols suggest there should be enough age samples per length group.
We studied if this is true in two ways: by viewing a selection of ALK’s and for all surveys by
calculating the proportion of length groups, above a relevant minimum length, in each ALK with only one age sample and with fewer than five age samples.
ALK from cod in BITS and NS-IBTS in first quarter 2006 is taken as an example in this report. It can been seen that, at least in NS-IBTS, a large proportion of the length classes are only represented by one sample (Table 1) and only few length classes are represented with five or more samples (Table 2).
In some Roundfish areas, 100 percent of the length classes have been sampled less than five times.
Table 1. The table shows number of length classes where there have only been one age sample, the total number of length classes with samples, and the percentage of length classes with one sample of the total sample.
Sur Year Quar pecie rea o of one sample
Total sample
ercen sample vey ter S s A N
s P of total
tage s BITS 2006 1 Gadus
morhua
22 1 49 2
49 2
6 1 49 2
28 2 43 4
NS-IBTS 2006 1 Gadus morhu
1 8 27 29 24
2
1
a
9 12 75
8 23 34
4 9 19 47
6 11 14 78
7 6 26 23
9 1 49 2
2 3
able 2. The table shows number of length classes where there have been less than five age samples, the umber of length classes with samples, and the percentage of length classes with one sample of the total
Year Quarter Species Area No of less than
ve sample
Total samples
Percentage of total
mple T
total n sample.
Survey
fi s
sa s BITS 2006 1 Gadus
morhua
22 9 49 18
24 7 14
25 1 49 2
49 8
8 8 43 18
NS-IBTS 2006 1 Gadus morhua
1 21 27 77 49
26 2
4
2 12 12 100
3 15 23 65
13 19 68
14 14 100
7 18 26 69
9 17 49 34
4 6
Possible model d app hes rang from gener inear m ls (GL of surv pue, to geo- statistical mode which in orate sp ial structur to more plex l hood-b d or Bayesian
ierarchical models. The GLM and geo-statistics approaches generally model indices which have
be more flexible but
Conclusions
ll the methods have advantages and disadvantages: the usefulness of geo-statistical methods has been demonstrated for individual analyses, Bayesian hierarchical models are technically very strong and
them should be promoted. But overall, the bootstrap approach was considered
Bayesian
Strata must be a partition of the space.
Resampling unit must be independent.
Advantages
Explicit, identify variance due to age and due to length, can derive statistics to analyse sampling design.
Non-parametric, can deal with complex processes, simple concept, estimates covariance.
Explicit, deal with complex situations, id var comps, estimations of uncertainty, parameters can have biological interest, can include expert knowledge.
Idem frequentist model, easier to deal with missing observations, include more complex expert knowledge and different sources of data.
Disadvantages
It becomes extremely complex to apply to more than 1 strata situation, no covariance between ages.
Sensitive to low number of samples in strata which
-base roac e al l ode Ms) ey c
ls corp at e, com ikeli ase
h
already been calculated by age, while more complex models better represent the sampling process by including the variation due to sub-sampling age and estimating an age-length relationship. This ability to capture all the sources of uncertainty is particularly strong in Bayesian models.
Routine implementation was an issue for modelling approaches, particularly geo-statistical models.
Geo-statistical models cannot be properly implemented on all surveys in DATRAS as several use ndomised not systematic sampling schemes. Other modelling approaches would
ra
could require different model set-ups for different surveys and species. This could lead to debate over the appropriateness of a model for any particular survey, and could make explaining the approach to non-scientists more difficult. Expert knowledge would be needed to run modelling approaches as the model fit and assumptions need to be checked.
Table 3. Comparison of methods – Summary. Table 6.1 from WKSCFMD 2004
Analitical Non-parametric
bootstrap Frequentist
A
further development of
most appropriate for routine analysis in DATRAS. It should be technically adequate, relatively straightforward to implement and easy to explain. It will represent the calculations currently used to produce survey indices and a consistent definition of bootstrap sampling should be possible across all the surveys in the system.
can underestimate variance or produce biased estimates due to merging of strata.
Complex assumptions, requires model testing and fitting, different sampling schemes and stocks may require different models.
Idem frequentist model, more difficult to implement, MCMC convergence problems.
Implementation Simple Simple, uses simulations. Complex. More complex, uses simulations.
Example (ref) WD1
Assumptions
WD 4, 5, 6, 7 & 8
Distributions and relationships between variables.
Desi gn- based Model - based
Sample representative of the population, sampling scheme unbiased
How to bootstrap in DATRAS
Abundance Index
the calculation covers a so-called index area. This the area within which hauls are considered and the average over the length compositions is done. In
d the entire survey area are identical. The age-length keys are aggregated out restricting the data to the index area.
ach survey and within each survey for each species E
is
several cases the index area an by a set of “otolith areas” with
The abundance index I is in principle calculated based on
∈
∑
aul (
•
=
length
l l
a l l
a
m
n m
I *
,∑
Where
n
l is the length omposition observed (#/hr) in the h c∈
=
length l
nl
CPUE ) and ml,a is the
r works in four steps: 1) First, the average length composition by sub-areas h Sea beamtrawl s , 3) t stributions calculated per sub-area in step 1 are raised to age compositions using
tep 2, and 4) these age compositions are averaged over an index area (e.g. the entire
e individual length groups) thereby maintaining covariance
d
Age 1 Age 2 Total
age-length key.
The abundance estimato
(e.g. rectangles) is calculated, 2) In parallel, ALK’s are aggregate of “oto areas”
(e.g. rou fish areas, the phrase “otolith area” is used by the Nort urvey) he average length di
ALK’s found in s
d on a separate set lith nd
North Sea), i.e. ignoring length compositions that refer to sub-areas outside the index area. This later step means that some rectangles fished will not be considered in the index calculation because the rectangles are not part of the index area.
This suggests that the haul information in a survey shall be considered as two components: the length frequency distribution of the catch and the age-length keys.
These components are bootstrapped independently. The bootstrapping approach chosen is the naïve approach (Lehtonen, R. and Pahkinen, E. 2004). The haul bootstrap unit are the entire length
istribution of a haul (not bootstrapping th d
between the length groups within a haul. The ALKs are bootstrapped as individual aged fish length group by length group.
Bootstrapping the length frequency distribution of the catch by haul will consider how the hauls are distributed within an area. As discussed below a certain amount of pooling among sub-areas is desirable and there are therefore two steps in this bootstrap procedure, 1) selecting a haul and 2) allocating this haul to a sub-area.
Also, for the age-length keys although aggregated over a larger area there is a need for pooling over length classes but the sampling intensity, i.e. number of fish per length class that is aged is maintaine in the bootstrap. This is illustrated in the text table below
ength Class Age 0 L
30 1 7 3 11
31 0 9 3 12
32 0 8 2 10
30-32 1 24 8 33
The sample from whi the bootstrap is sa led is the 30-32 cm roup while the bo pped ALk maintain the ind ual length classes 30,31 and 32 cm th their sampling sities 11,12 10 fish.
2 below show e bootstrapping d low; the flow is d ribed in the follow section.
ch mp g otstra
will ivid , i.e. wi inten
and
Figure s th ata f esc ing
NS-IBTS, EVHOE, SC, BTS
Pool hauls over
Combine ALK length classes into
e.g. 5 cm length
areas classes
BITS Bootstrap hauls with respect to CPUE by length
Bootstrap hauls with respect to CPUE by length
Re-allocate hauls to sub-areas
Bootstrap ALK within an area
Re-allocate ALK to length classes
Calculate CPUE per age and haul
Calculate indices
Areas: Roundfish area, otolith area, Scottish sampling areas, EVHOE areas
Sub-areas: Depth stratum, statisical rectangles
SC: Scottish groundfish survey
Calculating confidence interval of all the
indices
* 1000
* 1000
Based on 1000 indices
* 1000
Figure 2. Bootstrapping data flow
Length distribution bootstrapping
nly hauls within the index area for a given species are used in the bootstrapping.
The number of hauls within each sub-area will generally be kept when bootstrapping. For those surveys a with many hauls per stratum (generally 5 to 20) at stratum and the approach is straightforward. For rveys like the North Sea IBTS and BTS, which operate with rectangles as strata (sub-areas) for index
p sample will be
ability distribution for allocation of auls to sub-areas (rectangles).
O
where the index calculation is done by large strat hauls, we draw the bootstrap from the hauls in th su
calculation, too few hauls are available within a stratum (rectangle) and the bootstra drawn from a larger pool of hauls i.e. by RoundFish Area (RF).
The abundance index calculation requires that each haul is assigned to a sub-area (rectangle) and the bootstrap therefore needs a second step to allocate the selected haul to a sub-area. Because we may not want to bootstrap exactly the number of hauls that were originally taken we have interpreted the allocation of hauls by sub-areas that are in the sample as the prob
h
Table 4. Overview of sampling strata and pooling strata
Survey Species Sampling strata Pooling strata
NS-IBTS All Statistical rectangle Roundfish area combined with index area
BITS All SubDiv + depth strata SubDiv + depth strata EVHOE All EVHOE area + depth strata Combined depth strata of
same depth within Celtic
Survey Species Sampling strata Pooling strata Sea or Bay of Biscay BTS All Statistical rectangle Oto
Scottish groundfish survey Cod, whiting, haddock, monkfish
Scottish demersal sampling area
Scottish demersal sampling area
lith areas
s s
Bootstrap age-length keys by length class
The Age-Le for all grega amples
com er area (
is assumed for most surveys that the individuals for the ALK are randomly taken from all parts of the area. Furthermore, this sampling regime for ALK data assumes that the age distribution of a length
f the sampling area.
able 5. Overview of ALK sampling areas ngth-Keys
bined over a larg
the surveys are an ag Table 5).
tion of individual s from a haul
It
class does not significantly different within the different parts o
T
Survey Area of ALK
BITS ICES subdivision IBTS Roundfish area BTS Otolith areas Scottish groundfish survey Demersal sampling area
EVHOE EVHOE areas
The sampled numbers of fi s are used as basis for bootstrapping the ALK’s. Analyses data which are availa S have shown that in many cases the number of aged fish per h class is significantly he required number for bootstrapping (see Tables 1 and 2).
e, pooling to const m which to bootstrap is necessary. We pool data in m, by 1 cm classes. This pooling does not solve the
al not available in all length intervals and a length plus
low. The length h interval is larger sh per length clas
of the ble in DATRA
lengt lower than t
Therefor ruct the sample fro
length classes of 2 or 5 c roblem of large individu
the data are originally s where aged fish are p
group is included. All individuals which are larger than a defined length are summarized in this plus group. This length plus group will be used for the length frequency and the ALK.
The length where the plus group begins can be different from species to species, from survey and to survey, and from year to year due to different developments of the stock size and the progress of rebuilding of the stocks.
One possible way to define length class, l+, where the plus group begins is given be lus group starts when the sum of total aged fish beginning with the smallest lengt p
than defined fraction B of the total number of aged fish. The length l+ is defined by X
l
X B
l >
∑
+ l= .
here Xl. denotes the total number of aged fish in length class l, and X represents the total number of all length classes.
s will be defined based on data analyses of the different surveys and after discussions w the working group which coordinates the survey.
..
1 .
w
aged fish over The length plus group
ith
Number of bootstraps
he length CPUE and the ALK data will be bootstrapped 1024 tim T
an
es as recommended by Lehtonen d Pahkinen (2004).
lit ax – min value of the bootstrapping indices into 20 equal intervals. From
nfidence intervals can easily be given. There are a variety of e intervals (see for example, Efron and Tibshirani, 1993). We
d
Performance of the bootstrap procedure
Presentation of results.
For each index value, the entire length distribution of the bootstrap indices are given with bin size sp into 20, by splitting the m
this, user specified needs like CV and co methods available to calculate confidenc
feel that this choice will not greatly change the relative results for different surveys or years and therefore propose that the percentile method is implemented in DATRAS. This is the simplest metho available. For example, for a 90% confidence interval based on 1024 values, it involves reporting the 50th and 950th largest values.
Simulation Framework for Investigating Bootstrap
To illustrate the effects and as a background for discussing the choices made in the implementation of rectangles
Performance
the uncertainty calculation implemented in DATRAS we investigated a system with 16 arranged in 4*4 quadrant.
,
2,
R withinR
CPUE σ
n
The system is desc
• The number of haul per rectangle
n
R – total number of hauls = ribed by∑
R
n
Rithin each rectangle
• The mean CPUE w
CPUE
R• The between rectangle variance
σ
2R is the gle R) is drawn from rectangular distribution with
a me o to 4 hauls, i.e. for
x N
• The within rectangle variance
σ
wit2 hin; andBetween
rectangle;
n
R (the number of hauls in rectan∈
(x is a numeral) an f two hauls per rectangle and a range of 0{
=}
=⎪⎩ ∈[ ]
2
* 2
; 2 0
* 2
Prob nR x ⎪ dx forx
⎨
⎧0 if x>2*2 orx<0
CPUE
R is the mean CPUE in rectangle R andCPUE
is the overall mean CPUE from the area.The realized
CPUE
R is drawn from a lognormal distribution.{ }
⎟⎟ ⎠
⎠
⎛ 2
* 5 . 0 Pr
Between
CP
ob
σ
⎟⎜ ⎞
⎛− − ⎟⎞
=
= log log
exp log *
R
CPUE x
x x d
UE ⎜
⎝ ⎜
⎝
Between 2
π
σ
⎜The between rectangle variance is 2
1 * ∑ ( )
2∑ −
= CPUE CPUE
n
R RBetween
σ
The observed CPUEi,R of haul i in rectangle R is drawn from log-normal distribution with a cut-
om the same population as is used for the Age-length key as escribed below.
he ALK is generated as follows
etical ALK is calculated based on
o Growth rates are von Bertalanffy (L-inf =45 ;k =0.2 ; t0 =-0.5 ) strength are predefined
al with mean defined by the von nce
off twice the standard deviation.
Length dist and Age-Length ys (ALK)
The length distribution is drawn fr
ributions ke
d T
• There is only one ALK corresponding to the entire area
• The theor
o The age groups
o The length distribution for given age is log-norm
(
Bertalanffy growth and with a varia Growth =0.15
)
• This e
σ
2th oretical distribution is then rearranged to provide
{ n } for given length l
ob
a,l Lal ALK is drawn from this theoretical dist
m pr
• The actu ength group and with
order to compare the performance of the bootstrap procedure the results were compared with the ey as described above. All simulations were done with 1024 simulated
rate the theoretical solution while all bootstraps were done with 1024 plicates. The ALK were generated with 16 fish per length class, there were age data for all length
ribution for each l
( = 12 )
n
L andn
L( = 16 )
fish per length groupSimulation Results
In
result of a simulation of the surv realizations of the survey to gene re
classes.
The results are reported as “correction factor” which is the ratio
simulation Replicate
Bootstrap heoretic
T c
al results indicates that the bootstrap sample and the number of hauls/aged fish in a length lass may preferable be different, for the results reported below
otstrapped = no of fish ag d
Ho o UE is the same for all
ec n
we have used
• Hauls bootstrapped = no of hauls observed – 1
• For each length class in the ALK, No of aged fished bo e
m geneous area, i.e. the mean CP
r ta gles
The mean
CPUE
R= CPUE
and all rectangles are fished with 2 hauls.Correction Factor for Median
1.5
Age Distribution
0 0.5 1
0 1 2 3 4 5 6 7 8
Age
Ratio between teoretical and bootstrapped
Pooled Sampling Rectangle Sampling
Figure 3. Correction factor (=Bootstrapped range/Simulated (replicate) range)
This graph suggests that the bootstrap procedure is perhaps slightly biased towards too low values.
Whether we sample the individual rectangles or pooling is of little difference.
Investigating the Range (=[Q25%;Q75%]) for the length distribution suggest that bootstrap based on an s a distinct
nderestimation of the range for abundance estimate by length group. In this case pooling is completely
e ALKs the rectangle sampling would more seriously underestimate the range most for rectangle sampling th the pooled sampling. However, also for the case of the pooled sampling there i
u
valid as there is no difference between the rectangles. However, when adding the ALK variation the pooled sampling procedure provides range estimates that are close to theoretical values in spite of the underestimate of the range for the underlying length compositions, i.e. the variance within th
are dominating. It is clear that the rectangle sampling is underestimating the theoretical range.
Correction factor for Range Age distribution
0 0.5 1 1.5 2
0 1 2 3 4 5 6 7 8
Age
Ratio between bootstrapped and teoretical range Pooled Sampling
Rectangle Sampling
Figure 4. Correction factor for homogeneous area. Comparison between pooled bootstrap sample and bootstrapping individual rectangles.
The above results are in accordance with general theoretical results and we concluded that the pooled approach is preferable and we did not investigate the “rectangle” bootstrapping further.
Inhomogeneous area
The following only considers the pooled bootstrap procedure and investigate the performance of this procedure when the area is inhomogeneous, i.e. that the abundance density varies between rectangles within the area.
The inhomogenety is measured as the ratio between the log-value standard deviation between and within rectangles. Figure 5 shows these standard deviations for age 1 herring and whiting from the IBTS survey for 1991-1994.
IBTS Herring Age 1 for 1991-1994 Standard deviation log-values
0 0.5 1 1.5 2 2.5 3 3.5 4
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
199119911991199119911991199119911991199219921992199219921992199219921992199319931993199319931993199319931993199419941994199419941994199419941994
Within Rect Between Rect
Figure 5a. Cpue data from Herring from IBTS 1991-1994. Standard deviation of log(cpue) for age 1 ignoring hauls without herring. The data are given by roundfish area (1 to 9) and by year.
IBTS Whiting Age 1 for 1991-1994 Standard deviation log-value
0 0.5 1 1.5 2 2.5
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
199119911991199119911991199119911991199219921992199219921992199219921992199319931993199319931993199319931993199419941994199419941994199419941994
Within Rect Between Rect
Figure 5b. Cpue data from Whiting from IBTS 1991-1994. Standard deviation of log(cpue) for age 1 ignoring hauls without whiting. The data are given by roundfish area (1 to 9) and by year.
Taking an overall average from Figure 5 suggest that the ratio Standard deviation (between rectangles)/Standard deviation (within rectangles) is about 1.1 for whiting and 1.4 for herring.
The inhomogenity of an area is measured in the variation between sub-areas of the mean CPUE within a sub-area (rectangle). This is simulated as a log-normal distribution of the mean CPUE in each rectangle where all rectangles have the same mean value but the standard deviation in the log-normal distribution (a proxy for the CV) is varying. Figure 6 below illustrates the results of a bootstrapped range calculation for a simulated survey with a standard deviation (log-values) of 0, 0.5, 1.0, 1.5, and 2.0.
The pooled procedure does not show any apparent bias up to a log-standard deviation of 2
Correction Factor by area inhomogenity
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
0 1 2 3 4 5 6 7 8
Age Ratio between Bootstraped and teoretical Range
0 0.5 1 1.5 2
Figure 6. Correction factor comparing the bootstrapped range (pooled sample over the area for bootstrapping) with the range calculated by simulation (replicates) of the survey.
References
Efron, B and R.J. Tibshirani, (1993) An Introduction to the Bootstrap. CRC Press, Boca Raton.
ICES(2004) Report of the Workshop on Sampling and Calculation Methodology for Fisheries Data (WKSCMFD), 26–30 January 2004, Nantes, France. ICES CM 2004/ACFM:12
ICES(2005) Report of the Workshop on Survey Design and Data Analysis (WKSAD), 9–13 May 2005, Sète, France. ICES CM 2005/B:07.
Lehtonen, R. and E. Pahkinen. 2004."Practical Methods for design and Analysis of Complex Surveys".
John Wiley & Sons Ltd. ISBN 0-470-84769-7.
ISDBITS. 2001. Improvement of Stock Assessment and Data Collection by Continuation, Standardisation and Design Improvement of the Baltic International Bottom Trawl Survey for Fishery Resource Assessment, Final Report. EU Project No. 98/099
Pennington M.R. and M.D. Grosslein, 1978. Accuracy of abundance indices based on stratified random trawl surveys. ICNAF Res. Doc. 78/IV/77 : 42 p.
O'Brien C.M., C.D. Darby, D.L. Maxwell, B.D. Rackham, H. Degel, S. Flatman, M.A. Pastoors, E.J.
Simmonds and M. Vinther (2001a). The precision of international market sampling for North Sea plaice (Pleuronectes platessa L.) and its influence on stock assessment. ICES CM 2001/P:13.
O'Brien C.M., C.D. Darby, B.D. Rackham, D.L. Maxwell, H. Degel, S. Flatman, M. Mathewson, M.A.
Pastoors, E.J. Simmonds and M. Vinther (2001b). The precision of international market sampling for North Sea cod (Gadus morhua L.) and its influence on stock assessment. ICES CM 2001/P:14.
Pennington, M. 1983. Efficient estimators of abundance, for fish and plankton surveys. Biometrics, 39:
281–286.
Petitgas, P. 1993. Geostatistics for fish stock assessments: a review and an acoustic application. ICES Journal of Marine Science, 50: 285–298.
Simmonds E.J., C. L. Needle, , H Degel, S Flatman, C. M. O'Brien, M. A. Pastoors, A. P. Robb and M.
Vinther (2001).The precision of international market sampling for North Sea herring and its influence on assessment. ICES CM 2001/P:21.
ICES 1992. The report of the ICES Workshop on the Analysis of Trawl Survey Data. ICES CM 1992/D:6.
Lassen, H. and K. Nygaard (eds) 1999. Metoder til vurdering af fiskebestande. Nordisk Ministerråd.
DIVS 1999:813. [Part of it in Nordic languishes and part of it in English].
Beare, D., Castro, J., Cotter, J., van Keeken, O., Kell, L., Laurec, A., Mahé, J-C, Moura, O., Munch- Petersen, S., Nielsen, J. R., Piet, G., Simmonds, J., Skagen, D., and Sparre, P. J. (2003).
Evaluation of research surveys in relation to management advice (EVARES - FISH/2001/02 - Lot 1) Final Report to European Commission Director-General Fisheries.