Loading [MathJax]/jax/output/HTML-CSS/jax.js

Assessment methodology for imposex

Overview

Ideally, imposex data are submitted as individual measurements; for example, as the Vas Deferens Sequence (VDS) of each female snail. This provides information about variation between individuals, and allows efficient statistical models to be fitted to assess trends and status. However, sometimes the data are submitted as an annual index; for example, as the Vas Deferens Sequence Index (VDSI), the arithmetic mean of the individual VDS measurements. A more ad-hoc modelling approach is then all that is possible. This help file describes the methodology for assessing time series of individual VDS measurements. Other help files describe the approach when VDS data are submitted as annual indices, or as a mixture of individual measurements and annual indices.

For some species, imposex stage or intersex stage are reported rather than VDS, again either as individuals or annual indices1. These data are assessed using the same methods as for VDS.


Proportional odds model

The individual VDS measurements are modelled with a proportional odds model (McCullagh & Nelder, 1989). Let yij be the VDS measurement of the jth female snail in year ti,i=1...N, with yij{0,...,K} where K is the highest possible VDS class2. It is assumed that

logit(Prob(yijk))=f(ti)+θk

for 0kK1, with Prob(yijK)=1. Here, f(t) is a function that describes how imposex levels change over time (year). Various forms of f(t) are considered and these are discussed in the next section. The θk are cut points that measure the odds of being in a particular VDS class or below. Since the classes are ordered, the cut points are subject to the constraints θ0<θ1<...<θK13.

The model is fitted by maximum likelihood. However, there are rarely sufficient data in a single time series to estimate the cut points precisely, so the cut points are first estimated from a saturated model fitted to multiple time series (described later) and are then assumed fixed and known4. The only parameters estimated when fitting the proportional odds model to a single time series are thus the parameters of f(t). Parameter standard errors are estimated from the Hessian matrix5.

McCullagh P & Nelder JA, 1989. Generalized Linear Models (second edition). Chapman & Hall, London.


Form of f(t)

Several different candidate forms of f(t) are considered, depending on the length of the time series. As well as linear logistic and smooth trends, change-point models are considered. These are motivated by the patterns seen in many time series, where there are steep declines in VDS levels starting in the mid 2000s. These changes coincide with the introduction of EC Regulation 782/2003, which implemented the provisions of the International Maritime Organisation’s Antifouling Systems Convention (IMO, 2001) prohibiting application of TBT surface coatings to all vessels by 2003, and the global ban on TBT which came into force in September 2008. The steep declines usually cannot be described adequately by linear logisitic models, or even smoothers. However, change-point models provide a way of capturing the steep decline with relatively few parameters. The years 2004, 2005, 2006, 2007 and 2008 are regarded as potential change-years, since the environmental response to the TBT measures is likely to have started in this period.

Intuitively, the complexity of the candidate forms of f(t) should be based on the number of years of data, N. For example, with 8 years of data, one might consider a linear model f(t)=μ+βt and a smoother f(t)=s(t) on 2 degrees of freedom (df) (analogous to the models fitted to contaminant time series). However, this runs into numerical difficulties when a time series starts with a series of years in which all VDS measurements equal the maximum value K, or ends with a series of years in which all VDS measurements equal 0, as the amount of information in the data for estimating f(t) is then reduced6. Instead, the candidate forms of f(t) are based on Nmid, an approximate measure of the number of years of data that contain information about changes in VDS levels. Loosely, Nmid is the number of years in the ‘middle’ of the timeseries, where intermediate VDS levels are observed. Formally, Nmid is defined as follows. Let Ii = 1 if all the VDS measurements in year ti equal K, -1 if all the VDS measurements in year ti equal 0, and 0 otherwise. Let

i1={1,if I1<1N,if Ii=1imin{i:Ii+1<1},otherwise

Similarly, let

i2={N,if IN>11,if Ii=1imax{i:Ii1>1},otherwise

Then {ti,i=i1,...,i2} are the ‘middle’ years of the time series and Nmid=i2i1+1.

The linear and smooth candidate forms of f(t) are then

N2
no model is fitted
N3 and Nmid=1
mean model f(t)=μ
The VDS measurements in the entire time series either all equal K or all equal 0, so there is no trend.
N3 and Nmid=2,3 or 4
linear model f(t)=μ+βt
Nmid5
linear model f(t)=μ+βt and smooth model f(t)=s(t)
Smoothers on 2 degrees of freedom (df) are considered when 5Nmid7, on 2 and 3 df when 8Nmid10 and on 2, 3, and 4 df when Nmid11.

Change-point models are also considered provided that the time series starts before 2008 and that N3 and Nmid>1. Each change point model is of the form

f(t)={μ,if t<tchangeμ+g(t),if ttchange

where tchange is the change year and g(tchange)=0 to ensure f(t) is continuous. Let Nmid be the number of ‘middle’ years from tchange onwards; i.e. |{ti:titchange and i≥=i1}|. Then, similar to above, the form of g(t) depends on Nmid.

Nmid=2,3 or 4
linear change-point model g(t)=β(ttchange)
Nmid5
linear change-point model g(t)=β(ttchange) and smooth change-point model g(t)=s(t), with s(tchange) = 0
Smoothers on 2 degrees of freedom (df) are considered when 5Nmid7, on 2 and 3 df when 8Nmid10 and on 2, 3, and 4 df when Nmid11.

The change-point models are fitted for each change-year tchange = 2004, 2005, 2006, 2007, 2008 in turn, provided that t1<tchange; i.e. the time series started before the change-year.

All the candidate models are fitted by maximum likelihood, with the final model chosen using AICc7. For some time series, there are many candidate models and there is a danger of over-fitting the data. However, this is mitigated somewhat by the fact that the models have been tailored to patterns of change seen in so many time series. It is also preferable to overfit rather than underfit for the purposes of assessing environmental status. Linear or smooth models often overpredict VDS levels in the final monitoring year, so if these are the only models considered, status will appear to be poorer than it should be.


Estimating the cut-points

The cut-points θk,k=0,...,K1 determine the probability of being in each VDS class given the underlying level of TBT contamination (represented by f(t) above). The cut-points can be thought of as measuring the eco-toxicological response of a species to TBT contamination and might reasonably be assumed to be constant over a wide area. The cut-points can therefore be estimated with good precision by fitting a ‘full’ model to the data from many time series collected over a wide area.

Suppose that, for a particular species and area, there are VDS time series at M stations. With some abuse of notation, let ymij be the VDS measurement of the jth female snail in year tmi from station m. Then the full model is

logit(Prob(ymijk))=μmi+θk

where μmi represents the underlying level of TBT contamination in year tmi at station m. The species area combinations used for estimating the cut-points are here.


Estimating the mean VDS class

The mean VDS class in year t, denoted v(t), is

v(t)=Kk=0k Prob(Yt=k)

where Yt is a random variable describing the VDS class of individual snails in year t. The probabilities are expressed in terms of f(t) and the cut-points through the relationships

 Prob(Yt=k)= Prob(Ytk) Prob(Ytk1)=exp(f(t)+θk)1+exp(f(t)+θk)exp(f(t)+θk1)1+exp(f(t)+θk1)

with Prob(YtK)=1 as before. The mean VDS class v(t) is then estimated by plugging the estimates of f(t) and the cut-points into these formulae.

Approximate confidence intervals on v(t) are obtained by simulating the distribution of the estimates of f(t) and the cut-points and hence the distribution of ^v(t)8. In particular, an upper one-sided 95% confidence limit on v(t) is the 95% ordered value of the simulated distribution of ^v(t).