Skip to content
David Lawrence Miller edited this page Jul 13, 2012 · 3 revisions

# A quick into to DSM

Intro

This guide should serve as a quick introduction to the dsm package. The package can be used to create maps of animal distribution in space when distance sampling techniques have been used to collect the data. dsm is designed to be used in conjunction with Jeff Laake's mrds package.

The basic principle is: use a generalized additive model (GAM) to create a smooth of spatial location (along with any other environmental covariates you like) where some of the information from a distance sampling analysis of the data is incorporated (via either an offset or by setting the response to be an estimate of the abundance in a cell). More information on the possible models is given below.

It's assumed you've read Hedley and Buckland (2004) and you're relatively comfortable with the material in it.

Through this document it's assumed that the spatial covariates are called x ad y. Throughout R-like syntax is used, so, for example s() indicates a smooth function so s(x,y) is a bivariate smooth of spatial location.

Segments

In order to model the data, line transects must be "cut" up into segments. The abundance in each segment is the response for the GAM. The usual guidance on the size of a segment is to have each be approximately twice the truncation distance long. A better way of thinking about the segment sizes is to try to make them smaller enough so that the spatially referenced covariates don't vary too much.

Possible analyses

This section lists the possible models, giving their formulae (in R style). The model that can be fitted depends on the data collected. Responses can be individuals or groups and (provided group size can be estimated) the group analysis can be used to find abundance of individuals.

Count data

Simply modelling the counts in each segment, we have:

n ~ offset + s(x,y) + ...

where n is the count for a cell, s(x,y) is explained as above and the offset is given by 2*width*l*p where width is the truncation width, l is the length of the segment (assumed to all be the same length here, but that need not be the case) and p is the probability of detection from the distance sampling analysis (see e.g. Buckland et al (2001)). In this case the offset models the "effective area" of the segment. The ... indicates that we could add further environmental covariates to the model (for example, bathymetry data, weather conditions or other data from a GIS).

If response="group" then n will be the number of groups in the segment, if response="indiv" then that is individuals (in a groups analysis this will consist of multiplying up the groups by their size).

Estimated abundance

If the distance sampling data have covariates (for example if the size or sex of the animal was recorded or the observer ID was recorded) then we must first use the distance sampling analysis to estimate the abundance per-segment. This is done using a Horvitz-Thompson estimator of the abundance in the segment. So the model for this is now:

Nhat ~ offset + s(x,y) + ...

where Nhat is the per-segment abundance and the offset is now 2*width*l in the same notation as above.

Density

Documentation coming soon

group.den indiv.den no offset ps are taken from ddfobj/phat

## Data structure

dsm.fit() requires two data.frame()'s to be created. The contents of each depend on the type of analysis we are doing (see above).

  • obsdata - observation data frame must have the following columns
    • object - object identifier
    • Sample.Label - the identifier for the segment that the observation occurred in
    • size - the size of each observed group (i.e. 1 for individuals)
    • distance - perpendicular/radial distance to observation
  • segdata - the data for the segments is a data.frame() with the following columns
    • x - centreline of the transect (i.e. "across the transect")
    • y - centre in the direction of the transect
    • Effort - the effort (in terms of length of the segment)
    • Transect.Label - identifier for the transect this segment is in
    • Sample.Label - identifier for the segment (unique!)
    • any other environmental covariates that are required

Fitting the models

Fitting a simple model with dsm.fit() is pretty straight-forward. The required arguments that must be supplied:

  • ddfobject - a fitted ddf model from mrds.
  • response - the response type (see above).
  • formula - formula for the GAM, for example ~s(x,y) for a smooth of location.
  • obsdata - observation level data, as detailed above.
  • segdata - segment level data, as detailed above.

An example of an analysis can be found on this wiki at mexico-analysis.

References

Hedley, S.L. & Buckland, S.T. (2004) Spatial models for line transect sampling. Journal of Agricultural, Biological, and Environmental Statistics, 9, 181–199.

Buckland, S.T., Anderson, D., Burnham, K.P., Laake, J.L., Borchers, D.L. & Thomas, L. (2001) Introduction to Distance Sampling. Oxford University Press.

Buckland, S.T., Anderson, D., Burnham, K.P., Laake, J.L., Borchers, D.L. & Thomas, L. (2004) Advanced Distance Sampling. Oxford University Press.