Statistical Approaches for Hidden Variables in Ecology. Nathalie Peyrard

Читать онлайн.
Название Statistical Approaches for Hidden Variables in Ecology
Автор произведения Nathalie Peyrard
Жанр Социология
Серия
Издательство Социология
Год выпуска 0
isbn 9781119902782



Скачать книгу

alt="image"/>

      where images and d is the total number of free parameters in the model in question. Using this definition, the model which minimizes the ICL will be selected.

      The data used in this section were collected by Sophie Bertrand (IRD), Guilherme Tavares (UFRGS), Christophe Barbraud and Karine Delord (CNRS). The authors wish to thank the IRD Tabasco JEAI (Jeune Equipe Associée Internationale) for permission to use these data.

A photograph of masked booby.

      1.3.1. Data

      This case study concerns the behavior of the masked booby (Sula dactylatra).

      1.3.2. Projection

      The recorded data for the boobies were provided in the form of latitude and longitude measurements, that is, in terms of angles with respect to an origin point on the Earth’s surface. The methods presented earlier use a notion of distance (such as step length). While it is possible to calculate distances traveled over the Earth’s surface using latitude and longitude coordinates, this requires the use of specific formulas for movement on a sphere. Instead, data are often projected onto a plane, enabling the use of Euclidean distance. Due to the spherical nature of the globe, the actual projection used depends on the zone of interest2. In this case, projection is carried out using UTM coordinates for zone 25 – south. In R, the sf library may be used to facilitate geographical data processing (and, notably, projection).

      1.3.3. Data smoothing

      In this case, the frequency of data acquisition was high (one point every 10 s). While there were few errors in the data (obtained using GPS), the temporal proximity of observations may result in a somewhat erratic-looking trajectory. This erratic effect is even more pronounced in the movement metrics used to detect different activities.

      To correct errors, let us take a Gaussian linear hidden Markov model, as described in section 1.2.1. Taking the equations in model [1.1], matrices A and B are taken as known and equal to the identity, while vectors μ and ν are known and equal to 0. Matrices Σm and Σo are presumed to be diagonal, but unknown. The unknown variables, represented the actual position, in this model are estimated using an EM algorithm from the MARSS package. The estimated parameters are then used to reconstruct the real trajectory by means of Kalman smoothing.

Graph depicts the result of Kalman smoothing on part of the booby trajectories.

      1.3.4. Identification of different activities through movement

      We shall begin by using a three-state model. The choice of the number of states in this case will be discussed later.

      1.3.4.1. Definition of metrics

      In this example, we have chosen to adjust two models, which differ in the way in which they treat step length and turning angles (and thus in the associated emission distributions). The two pairs of metrics considered here are as follows:

       – Step length and turning angle: a classic choice, as presented by Morales et al. (2004): the emission distributions in this case are a gamma distribution for step length and a circular (von Mises) distribution for angles. This model will be labeled length/angle in our figures.

       – Bivariate velocity change metric (Gurarie et al. 2009): the emission distributions in this case are two independent normal distributions. This model will be labeled bivariate speed in our figures.

      1.3.4.2. Defining the starting point of the algorithm

      These models do not include any covariates, and the initial distribution will not be estimated. Each model is made up of 18 parameters (12 emission distribution parameters and six transition matrix parameters). Iterative optimization applied to a space of this type (such as the EM algorithm) may be affected by the chosen starting point. In both cases, the choice of a suitable starting point for the algorithm is crucial. One relatively generic approach involves a classification of k-averages (for the selected metrics). This rapid classification can be used to identify plausible parameters for different regimes; nevertheless, it is still important to ensure that the result obtained from the algorithm has not been affected by the choice of starting point.

      1.3.5. Results

      1.3.5.1. Characterization of hidden states

      In the two packages used here, the parameters of the HMM are estimated using maximum likelihood, and the sequence of most probable hidden states is retraced using a Viterbi algorithm.