Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction: From Data Management to Data Analysis

In previous chapters, you explored the wide set of tools that PostgreSQL and PostGIS offer to process and analyse tracking data. Nevertheless, a database is not specifically designed to perform advanced statistical analysis or to implement complex analytical algorithms, which are key elements to extract scientific knowledge from the data for both fundamental and applied research. In fact, these functionalities must be part of an information system that aims at a proper handling of wildlife tracking data. The possibility of a tighter integration of analytical functions with the database is particularly interesting because the availability of large amounts of information from the new generation sensors blurs the boundary between data analysis and data management. Tasks like outlier filtering, real-time detection of specific events (e.g. virtual fencing), or meta-analysis (analysis of results of a first analytical step, e.g. variation in home range size in the different months of a year) are clearly in the overlapping area between data analysis and management.

Background for the Analysis of Animal Space Use Data

Two questions need to be answered to move from an ecological question on animal space use to the analysis of those data: first, which is the relevant space (geographic versus environmental space), and second, which is the relevant spatiotemporal scale? Animals occupy a position in space at a given time t, which is called geographic space, and many ecological questions are related to this space: e.g. ‘How large of an area is used by an animal during a year’? or ‘How fast can an animal travel?’. Notably, the question on the area traversed by the animal has received much research interest, and this area is often called a ‘home range’. The home range has been defined by Burt (1943) as ‘the area traversed by the individual in its normal activities of food gathering, mating, and caring for young’. Many statistical approaches have been developed to use sets of location ‘points’ to estimate a home range area, from convex polygons to kernel density estimators (and many variants; for a review, see Kie et al. 2010).

On the other hand, by being in a certain geographic location, the animal encounters a set of environmental conditions, which are called environmental space, and questions related to the animal’s ecological relationships are to be answered in this space: e.g. ‘Which environmental characteristics does the animal prefer’? or ‘How does human land use affect the animal's space use?’. These questions are the main topic of habitat selection studies. In general, in these studies, one compares the environmental conditions used by the animal to those available to the animal. An important challenge is to decide ‘What environmental conditions were available to the animal’? This issue is tightly linked to the next question to be answered about scale.

The second important question for animal space use studies is about the relevant scale for the analysis. Small-scale studies can focus on the spatial behaviour of animals within a day or even an hour, whereas large-scale studies can look at space use over a year or even an animal’s lifetime. The required scale of the study leads to certain demands on the data as well: a small-scale study requires high-resolution collection of precise locations, whereas a large-scale study will require a sufficient tracking duration to allow inferences over this long period. It seems obvious that any description of the area traversed by an individual must specify the time period over which the traversing occurred. Only for stable home ranges is it so that after a certain amount of time the size of the range no longer increases and the area becomes no longer time-dependent. Although often assumed, the stability of a home range should be tested as often this is not the case. Also the distance travelled is very sensitive to the scale of the study, see Fig. 10.1.

Fig. 10.1
figure 1

This simulated trajectory clearly illustrates that reducing the number of acquired fixes within a time period can alter substantially the properties of the trajectory. The dashed line is the original trajectory, sampling with a ten times coarser resolution leads to a shorter trajectory in black, and a further reduction results in an even shorter trajectory in gray. Even though the total length of the trajectory decreases with a reduction in resolution, the distance between the consecutive points increases. It is thus very important for the researcher to be aware of such effects of sampling scale on trajectory characteristics

For habitat selection studies—i.e. studies in environmental space—scale not only affects the sampling of the animal’s space use, but even more important also the sampling of the environmental conditions available to the animal to choose from. Areas available to an animal in the course of several days may not be reachable within the time span of a few hours. This behavioural limitation has led several studies to consider availability specific to each used location (e.g. step-selection functions, Fortin et al. (2005)), instead of considering the same choice set available for all locations of a single individual (or even population). Thus, how the researcher defines the available choice set should be informed by the scale of the study.

The Tools: R and Adehabitat

R is an open source programming language and environment for statistical computing and graphics (http://www.r-project.org/). It is a popular choice for data analysis in academics, with its popularity for ecological research increasing rapidly. This popularity is not only the result of R being available for free, but also due to its flexibility and extendibility. The large user base of R has developed an extensive suite of libraries to extend the basic functionalities provided in the R-base package. Before going into some of these really fantastic features, it is good to point out that R’s flexibility comes at a cost: a rather steep-learning curve. R is not very tolerant to small human mistakes and demands of the first-time user some initial investment to understand its sometimes cryptic error messages. Fortunately, there are many resources out there to help the novice on her way (look on the R website for more options): online tutorials (e.g. ‘R for Beginners’ or ‘Introduction to R’), books (a popular choice is Crawley’s (2005) ‘Statistics: An Introduction using R’), extensive searchable online help (e.g. http://www.rseek.org/), and the many statistics courses that include an introduction to R (search the Internet for a course nearby; some may even be offered online).

One of the main strengths of R is the availability of a large range of packages or libraries for specific applications; by 2013, more than 4,000 libraries were published on CRAN. There are libraries for advanced statistical analysis (e.g. ‘lme4’ for mixed-effects models or ‘mgcv’ for general additive models), advanced graphics (e.g. ‘ggplot2’), spatial analysis (e.g. ‘sp’), or database connections (e.g. ‘RPostgreSQL’). Many packages have been developed to allow the use of R as a general interface to interact with databases or GIS. Other packages have gone even further and increase the performance of R in fields for which it was not originally developed such as the handling of very large data sets or GIS. Hence, many different R users have come into existence: from users relying on specific software for specific tasks who use R exclusively for their statistical analysis and use different files to push data through their workflow, to the other extreme of users who use R to control a workflow in which R sometimes calls on external software to get the job done, but more often with the help of designated libraries, R gets the job done itself. Instead, the approach advocated in this book is not centred around software, but places the data in the centre of the workflow. The data are stored in a spatial database, and all software used interacts with this database. You have seen several examples in this book using different software (e.g. pgadmin or QGIS); in this chapter, you will use R as another option to perform some additional specific tasks with the data stored in the database.

For the analysis of animal tracking data, we often use functions from adehabitat (Calenge 2006), which today consists of ‘adehabitatLT’ (for the analysis of trajectories), ‘adehabitatHR’ (for home range estimation), ‘adehabitatHS’ (for habitat-selection analysis), and ‘adehabitatMA’ (for the management of raster maps). For the general management of spatial data, we rely on the ‘sp’- and ‘rgdal’-libraries, for the advanced management of raster data; the ‘raster’-library is becoming the standard. We recommend as a general introduction to the use of spatial data in R the book by Bivand et al. (2008). To install a package with a library in R, you use the install.packages command, as illustrated here for the ‘adehabitat’-libraries:

The different adehabitat packages come with extensive tutorials, which are accessible in R through

Before using the database to analyse your animal tracking data with R, you can use adehabitat to replicate Fig. 10.1:

The simm.crw function simulates a random walk. A random walk is a movement where the direction and the distance of each consecutive location are randomised; it is therefore also referred to as a ‘drunkard’s walk’. It is beyond the scope of this chapter to give an introduction to random walks (see e.g. Turchin 1998 for more on random walks to model movement of organisms).

You can find more information on a function with a ‘?’ in front of the function; this will access its associated help pages with explanation and working examples of the function’s uses:

You can see in these help pages that there are several parameters that can be altered to make the random walk behave differently.

Connecting R to the Database

To use R for the analysis of the data stored and managed within the database, there are two approaches: first, connect from R to the database, and second, connect from the database to R with the Pl/R-interface. We will start by demonstrating the first approach and using it for some exercises. In the next chapter, you will see how this approach can be extended to connect from within the database to R with the PostgreSQL procedural language Pl/R (www.joeconway.com/plr/doc/).

First, to connect from R to the database, we make use of the ‘RPostgreSQL’ library. It is possible with ‘rgdal’ to read spatial features from PostGIS directly in R into ‘sp’’s spatial classes. However, using ‘rgdal’, it is no longer possible to perform SQL operations (such as SELECT) on these data. One solution could be to create in the database a temporary table with the selected spatial features and then use ‘rgdal’ to read the spatial features into R. However, the performance of ‘rgdal’ is considerably lower—it can be 100 times slower for certain operations—than for the database libraries, such as ‘RPostgreSQL’. Unfortunately, to date, there is no straightforward way for Windows users to read spatial data into R using SQL statements from a PostgreSQL-database. Thus, when you want to include an SQL statement, you will have to convert the data to non-spatial classes and then subsequently convert them back to spatial features in R. In the next chapter, we will discuss the pros and cons of the use of R within the database through Pl/R.

To connect to a PostgreSQL-database, we use the ‘RPostgreSQL’ library. The driver is ‘PostgreSQL’ for a PostgreSQL-database as yours. The connection requires information on the driver, database name, host, port, user, and password. Except from the driver, all other parameters may have to be adjusted for your own specific case. If you have the database on your own machine, then the host and port will likely be ‘localhost’ and 5432, respectively, as shown here. You can see the tables in the database with the dbListTables command:

The following code retrieves the first five lines from the gps_data_animals table:

You can see that R did not understand the geom column correctly.

Now, you want to retrieve all the necessary information for your analyses of roe deer space use. You first send a query to the database:

You then need to fetch those data (with the -1, you indicate that you want all data), and then ‘clear’ the result set. Virtually all spatial operations (such as projection) could also be done in R; however, it is faster and easier to have the database project the data to UTM32 (which has the SRID code 32632).

The head function allows us to inspect the first lines (by default six) of a dataframe. You see that you have successfully imported your data into R.

For dates, you should always carefully inspect their time zone. Due to the different time zones in the world, it is easy to get errors and make mistakes in the treatment of dates:

The time zone is said to be CEST and CET (i.e. Central European Summer Time and Central European Winter Time), note that the time zone will depend upon the local settings of your computer. However, we know that the actual time zone of these data is UTC (i.e. Universal Time or Greenwich Mean Time).

Let us then inspect whether the issue is an automatic transformation of the time zone, or whether the time zone was not correctly imported. With the library ‘lubridate’, you can easily access the hour or month of a POSIXct-object:

The table shows us that there is a clear change in the frequency of the daily hours between March–April and October–November, which indicates the presence of daylight saving time. You can therefore safely assume that the UTC time in the database was converted to CEST/CET time.

To prevent mistakes due to daylight saving time, it is much easier to work with UTC time (UTC does not have daylight saving). Thus, you have to convert the dates back to UTC time. With the aforementioned ‘lubridate’ library, you can do this easily: The function with_tz allows you to convert the local back to the UTC zone:

Indeed, the UTC_time-column now contains the time zone: ‘UTC’. You can run the command ‘table(month(locs$UTC_time), hour(locs$UTC_time))’ to verify that no obvious shift in sampling occurred in the data. From personal experience, we know that many mistakes happen with time zones and daylight saving time, and we therefore recommend that you use UTC and carefully inspect your dates and ensure that they were correctly imported into R.

Data Inspection and Exploration

Before you dive into the analysis to answer your ecological question, it is crucial to perform a preliminary inspection of the data to verify data properties and ensure the quality of your data. Several of the following functionalities that are implemented in R can also easily (and more quickly) be implemented into the database itself. The main strength of R, however, lies in its visualisation capabilities. The visualisation of different aspects of the data is one of the major tasks during an exploratory analysis.

The basic trajectory format in adehabitat is ltraj, which is a list used to store trajectories from different animals. For more details on the ltraj-format, you refer to the vignettes (remember: vignette(adehabitatLT)) and the help pages for the ‘adehabitatLT’-library. An ltraj-object requires projected coordinates, a date for each location, and an animal identifier:

The class-function shows us that ltraj is an object from the class ltraj and list. Each element of an ltraj-object is a data.frame with the trajectory information for each burst of each animal. A burst is a more or less intense monitoring of the animal followed by a gap in the data. For instance, animals that are only tracked during the day and not during the night will have for each day period a burst of data. The automatic schedule used for the GPS tracking of the roe deer in your database did not contain any intentional gaps; we therefore consider all data from an animal as belonging to a single burst.

The head function shows us that the data.frames within an ltraj object have ten columns. The first three columns define the location of the animal: the x and y coordinate and its date. The following columns describe the step (or change in location) toward the next location: the change in the x and y coordinates, the distance, the time interval between both locations, the direction of the movement, and the change in movement direction. The R2n is the squared displacement from the start point (or the net squared displacement [NSD]); we will discuss this metric in more detail later (see Calenge et al. 2009, and Fig. 10.2 for more explanation on these movement metrics).

Fig. 10.2
figure 2

The common trajectory characteristics stored in an ltraj. Panel a Shows the properties of one step from t to t + 1, and panel b the NSD of a series of locations (t = 1–5). The relative (rA) and absolute (aA) angles are also called the turning angle and direction of a step, respectively

Note that the animal’s identifier is not in the table. As all locations belong to the same animal, there is no need to provide this information here. To obtain the identifiers of all the data.frames in an ltraj, you use the id function:

Or, to obtain the id of only one animal

The summary function gives some basic information on the ltraj-object:

Note that it marks 0 for missing values (i.e. NAs). However, this is not correct; you will see below how to tell R that there are missing observations in the data.

You see also that the total tracking duration is highly variable among individuals; notably Animal 6 was tracked for a much shorter time than the other animals. To ensure homogeneous data for the following analyses, you will only keep animals that have a complete year of data (i.e. number of days ≥365):

Moreover, for animals that were tracked for a longer period than one year, you remove locations in excess. Of course, if your ecological question is addressing space use during another period (e.g. spring) than you would want to keep all animals that provide this information, and Animal 6 may be retained for analysis, while removing all locations that are not required for this analysis.

Now, all animals are tracked for a whole year.

With the plot function, you can show the different trajectories (see the result in Fig. 10.3):

Fig. 10.3
figure 3

Plot of the trajectories for the two first animals. You could have plotted all animals by simply running plot(ltrj)

The plotltr function allows us to show other characteristics than the spatial representation of the trajectory. For instance, it is very useful to get an overview of the sampling interval of the data. We discussed before how the sampling interval has a large effect on the patterns that you can observe in the data.

Figure 10.4 shows that the time gap (dt) is not always constant between consecutive locations. Most often there is a gap of 4 h; however, there are several times that data are missing, and the gap is larger. Surprisingly, there are also locations where the gap is smaller than 4 h. We wrote a function to remove locations that are not part of a predefined sampling regime:

Fig. 10.4
figure 4

The time interval between locations for animal 1

You now use this function on the ltraj; you specify a reference date at midnight, and the expected time lags in hours (dt_hours) is 4, and you set a tolerance of ±3 min (tol_mins):

You can now inspect the time lag for each animal again:

You see in Fig. 10.5 that there are no longer observations that deviate from the 4-hour schedule we programmed in our GPS sensors, i.e. all time lags between consecutive locations are a multiple of four. You still see gaps in the data, i.e. some gaps are larger than 4 h. Thus, there are missing values. The summary did not show the presence of missing data in the trajectory; you therefore have to specify the occurrence of missing locations.

Fig. 10.5
figure 5

The time interval between locations for animal 1 after the removal of locations outside the base sampling interval

The setNA function allows us to place the missing values into the trajectory at places where the GPS was expected to obtain a fix but failed to do so. You have to indicate a GPS schedule, which is in your case 4 h:

Indeed, now you see that the trajectories do contain a fair number of missing locations.

If locations are missing randomly, it will not bias the results of an analysis. However, when missing values occur in runs, this may affect your results. In adehabitat, there are two figures to inspect patterns in the missing data. The function plotNAltraj shows for a trajectory where the missing values occur and can be very instructive to show important gaps in the data:

Figure 10.6 reveals that the missing values in November and December are not likely to occur independent of each other (you can verify this yourself for other periods by changing the limits of the x-axis with the xlim argument). You can test with the runsNAltraj function whether there is statistical significant clustering of the missing values:

Fig. 10.6
figure 6

The occurrence of missing locations over time for the first 500 locations: missing values are 1 and successful fixes are 0

Indeed, Fig. 10.7 shows that for this trajectory, there is significant clustering of the missing fixes (you can test yourself whether this is also the case for the other animals). Thus, when you have one missing location, it is more likely that the next location will be missing too. Such temporal dependence in the probability to obtain fixes is not surprising, because the conditions affecting this probability are likely temporally autocorrelated. For instance, it is known that for a GPS receiver, it is more difficult to contact the satellites within dense forests, and so when an animal is in such a forest at time t, it is more likely to be still in this forest at time t + 1 than at time t + 2, thus causing temporal dependence in the fix-acquisition probability. Unfortunately, as said, this temporal dependence in the ‘missingness’ of locations holds the risk of introducing biases in the results of your analysis.

Fig. 10.7
figure 7

Testing the temporal independence among missing locations. The histogram represents the expected distribution of missing values if they occurred independently. The pin on the left shows the observed distribution, which shows that missing values did not occur independently from each other

Visual inspection of figures like Fig. 10.6 can help the assessment of whether the temporal dependence in missing locations will have large effects on the analysis. Other figures can also help this assessment. For instance, you can plot the number of missing locations for each hour of the day, or for periods of the year (e.g. each week or month):

Figure 10.8 shows that there is no strong bias in the time of the day; you can therefore be fairly confident that additional results will not be biased regarding the diurnal cycle. However, there are four consecutive blocks of 10 days with low fix-acquisition rate, which could be an issue. Fortunately, the other periods during the winter are providing us with enough data. You therefore expect bias on your results to be minimal. In cases when there are longer periods with missing data, it can be necessary to balance the data. It is obviously not straightforward to create new data; however, you can remove random locations in periods when you have ‘too many’ observations. In our demonstration, we will proceed without further balancing the data.

Fig. 10.8
figure 8

Plot with the proportion missing locations for each hour of the day (panel a) and for each period of 10 days (panel b)

The NSD is a commonly used metric for animal space use; it is the squared straight-line distance between each point and the first point of the trajectory (see Fig. 10.2b). It is a very useful metric to assess, for instance, the occurrence of seasonal migration patterns. An animal that migrates between summer and winter ranges will often show a characteristic hump shape in the NSD, as exemplified clearly in Fig. 10.9:

Fig. 10.9
figure 9

The NSD for each individual

The NSD profiles in Fig. 10.9 strongly suggest the occurrence of seasonal migration in all five animals. During the summer (from May till November), the animals seem to be in one area and during the winter (from December till April) in another. The NSD is a one-dimensional representation of the animal’s space use, which facilitates the inspection of it against the second dimension of time. On the other hand, it removes information present in the two-dimensional locations provided by the GPS. Relying exclusively on the NSD can in certain situations give rise to wrong inferences; we therefore highly recommend also inspecting the locations in two dimensions. One of the disadvantages of the NSD is that the starting point is often somewhat arbitrary. It can help to use a biological criterion such as the fawning period to start the year.

As an alternative for (or in addition to) the NSD, you can plot both spatial dimensions against time as in Fig. 10.10:

Fig. 10.10
figure 10

The x- and y-axis against time in panel a and b, respectively, for the first individual

To avoid the intrinsic reduction of information by collapsing two dimensions into one single dimension, you also plot both spatial dimensions and use colour to depict the temporal dimension:

Figure 10.11 confirms our interpretation of Fig. 10.9. All five animals have at least two seasonal centres of activity: winter versus summer. The movement between these centres occurs around November and around April. The comparison of Figs. 10.9 and 10.11 reveals easily the respective strengths of both figures. It is easier to read from Fig. 10.9 the timing of events, but it is easier to read from Fig. 10.11 the geographic position of these events. This demonstrates the importance of making several figures to explore the data.

Fig. 10.11
figure 11

Coloured trajectories for all individuals, locations from each month are coloured differently

Now that you have familiarised yourselves with the structure of the data and have ensured that your data are appropriate for the analysis, you can proceed with answering your ecological questions in the following sections.

Home Range Estimation

A home range is the area in which an animal lives. In addition to Burt’s (1943) aforementioned definition of the home range as ‘the area an animal utilizes in its normal activities’, Cooper (1978) pointed out that a central characteristic of the home range is that it is temporally stable. Our previous exploration of the data has shown that the space use of our roe deer is not stable. Instead, it seems to consist of a migration between two seasonal home ranges. Figures 10.9 and 10.10 suggest that space use within these seasonal ranges is fairly stable. It is thus clear that the concept of a home range is inherently tied to a time frame over which space use was fairly stable, in our case two seasons.

An animal’s home range has been quantified by the concept of the ‘utilization distribution (UD)’. Van Winkle (1975) used the term UD to refer to ‘the relative frequency distribution for the points of location of an animal over a period of time’. The most common estimator for the UD is the kernel density estimator. In Fig. 10.12, we remind the reader of the general principle underlying such analysis. Several methods for home range computation are implemented in the ‘adehabitatHR’ library; the kernelUD function calculates the kernel utilization density from 2D locations:

Fig. 10.12
figure 12

A one-dimensional example of a kernel density estimator for three points. For each of the observations (i.e. 3, 4 and 7), a distribution (e.g. Gaussian curve) is placed over them (the dashed lines); these distributions are aggregated to obtain the cumulative curve (the full black line). This cumulative curve is the kernel density estimator of these points. The width of initial distributions used is the smoothing factor and is a parameter the researcher has to select. Kernel home range estimation works in a similar way in 2D with for instance a bivariate normal distribution to smooth the locations

The function kernelUD requires a SpatialPoints object or a SpatialPointsDataFrame, and it returns a SpatialPixel object, adehabitat relies on the spatial classes from the ‘sp’ library. A familiarity with the spatial classes from ‘sp’ will therefore be helpful for your analysis of animal space use data in R (see Bivand et al. 2008, or the vignettes available for ‘sp’). You create the SpatialPointsDataFrame from a dataframe, which you obtained from the ltrj using the ld function (i.e. list to dataframe). Note: ‘SpatialPoints’ cannot contain missing coordinates; therefore, you keep only those rows where you have no missing values for the x coordinate (i.e. !is.na(trj$x), the ‘!’ means ‘not’ in R). The resulting pixel map in Fig. 10.13 shows the areas that are most intensely used by this individual.

Fig. 10.13
figure 13

Kernel UD of the first individual, in yellow, is the intensely used areas

During our data exploration, we found that our roe deer occupy a separate summer and winter range: Are both ranges of similar size? For this, you have to compute the home range separately for summer and winter. In Fig. 10.9, you can see that the summer range is occupied at least from day 150 (beginning of June) to day 300 (end of October) and that the winter range is occupied from days 350 (mid-December) till 100 (end of March). You can use these dates to split compute the kernel for summer and winter separately; the function kernel.area computes the area within a percentage contour. We demonstrate the computation for the 50, 75 and 95 %:

You can visualise these results clearly with boxplots:

Figure 10.14 shows that more than a change in the mean range size, there seems to be a marked change in the individual variation between seasons. During the winter season, there seems to be much larger individual variation in range size than there is in summer; however, more data will be required to further investigate this seasonal variation in range sizes.

Fig. 10.14
figure 14

The boxplots of the areas (in ha) of the seasonal ranges from left to right the 50, 75, and 95 % kernel contours. Each panel depicts summer (S) on the left, and winter (W) on the right

Habitat Use and Habitat Selection Analysis

In the previous exercise, you saw that roe deer change the location of their activities from summer to winter. Such seasonal migrations are often triggered by changes in the environment. You can then wonder which environmental conditions are changing when the animal moves between its seasonal ranges. For instance, snowfall is a driver for many migratory ungulates in northern and alpine environments, and winter ranges are often characterised by less snow cover than the summer ranges in winter (e.g. Ball et al. 2001). Thus, you would expect that roe deer will be moving down in altitude during the winter to escape from the snow that accumulates at higher altitudes.

If roe deer move to lower altitudes during winter, then they probably also move closer to roads, which are usually found in valley bottoms. You would not necessarily expect roe deer to show a seasonal response directly toward roads, but you do expect this as a side effect from the shift in altitude. Such closer proximity to roads can have substantial effects on road safety, as animals close to roads are at a higher risk of road crossings and thus traffic accidents. Ungulate vehicle collisions are an important concern for road safety and animal welfare. From an applied perspective, it is thus an interesting question to see whether there is a seasonal movement closer to roads, which could partly explain seasonal patterns often observed in ungulate vehicle collisions.

You first add the environmental data to your inspected trajectory with the merge function:

You inspect then whether there is a relationship between the distance to the roads and the altitude:

Figure 10.15 shows an interesting relationship between altitude and distance to roads. Each individual shows two clusters, which are possibly corresponding with the two seasonal ranges you detected before. Within each cluster, there is a positive relationship between altitude and distance to roads; i.e. at higher altitudes, the distance to roads is greater. However, when you compare both clusters, it seems that the cluster at higher altitude is often also closer to roads (except for Animal 1). Overall, it seems that in your data, there is no obvious positive relationship between altitude and distance to roads.

Fig. 10.15
figure 15

The changing distance to roads as a function of the altitude for each individual

Let us now investigate the hypothesis that there is a seasonal change in roe deer altitude:

Figure 10.16 shows that there are marked seasonal changes in the altitude of the roe deer positions. As you expected, roe deer are at lower altitudes during the winter than they are during the summer. This pattern explains the occurrence of the two clusters of points for each individual in Fig. 10.15.

Fig. 10.16
figure 16

The altitude of roe deer locations as a function of the day of the year. You see marked seasonal changes

You can now proceed by testing the statistical significance of these results. You use the same seasonal cutoff points as before:

The function lm is used to fit a linear model to the data (note: for this simple case a Student’s t test would have been sufficient).

These results show that as expected the roe deer move to lower altitudes during the winter:

 

Estimate

Std. error

t value

Pr(>|t|)

(Intercept)

1,581.6609

2.6212

603.41

0.0000

as.factor(season)winter

−572.3900

4.2519

−134.62

0.0000

In winter, the roe deer are on average 572 m lower than during the summer, which is a 33 % decrease.

A treatment on model validation falls outside the scope of this book. We refer the reader to introductory books in statistics; several such books are available using examples in R (e.g. Crawley 2005; Zuur et al. 2007).

In this example, you have focused on the habitat use of roe deer. Often researchers are not only interested in the habitat characteristics used by the animals, but also in the comparison between use and availability—i.e. habitat selection. In habitat-selection studies, the used habitat characteristics are compared against the characteristics the animal could have used or the available habitat. Thus, to perform habitat-selection analysis, you have to sample from the available points and obtain the habitat characteristics for these points.

The available locations are commonly sampled randomly at two scales: within the study area to study home range placement, or within the individual home range to study home range use (respectively, called second- and third-order habitat selection following Johnson 1980). We will demonstrate third-order habitat selection and use a minimum convex polygon (MCP) to characterise the area available for each roe deer, from which we sample 2,000 random locations. The mcp-function in R requires the use of a SpatialPointsDataFrame when using multiple animals (for a single animal a SpatialPoints object suffices):

Then, you sample for each individual randomly from its available range, and you place the coordinates from these sampled locations together with the animal’s id in a data.frame. Operations like this, in which something needs to be repeated for a number of individuals, are easily performed using the list format, which is also the reason that ‘adehabitatLT’ uses a list to store trajectories. However, a database works with data.frames; therefore, you will have to bind the data.frames in the list together in one large data.frame.

Figure 10.17 shows the areas you considered available for each individual roe deer, from which you sampled the random locations:

Fig. 10.17
figure 17

The available areas for each roe deer estimated by a MCP. The first 100 locations sampled randomly from the area of individual 1 are represented by black dots

The easiest way to obtain the environmental data for these random points is to simply upload them into the database and extract the required information from there. To facilitate the ordering of your random locations, you add a column nb numbered 1 to the number of random points. The function dbWriteTable writes a table to the database; you can specify a schema (analysis) in addition to the table name (rndpts_tmp):

Next, you use the database to couple the locations in this table to the environmental data stored in the database. The easiest way to do this is by first adding a geometry column for the random locations:

You extract the altitude and land cover for the random locations from the rasters stored in the database with the following queries (the details of these SQL queries were discussed earlier in this book):

You extract the distance to the closest road for the random locations from the roads stored in the database with the following query (this query can require a few minutes):

You add these environmental data to the randomly sampled available locations:

Now that you no longer need these locations in the database, you can remove the table:

A discussion on habitat selection falls outside the scope of this chapter. More information on exploratory habitat selection using R can be found in the vignette(adehabitatHS); for a general discussion on the use of resource selection functions on habitat selection, we refer the reader to the book by Manly et al. (2002). We will merely visualise the difference in the habitat types between the used and the available locations.

To ensure proper comparison of used and available habitats, you provide the same levels to both data sets. Moreover, to allow our seasonal comparison, you also need to allocate the random locations to the summer and winter season:

Now, you will compute for each individual the number of locations inside each habitat type:

The function widesIII in the ‘adehabitatHS’ library computes the selection ratios for individually tracked animals; it also provides a number of statistical tests.

For a more extensive discussion of selection ratios, we refer you to the aforementioned references. Here, you limit yourselves to visualising the selection ratios for both seasons. Figure 10.18 shows that the roe deer seems to select more for pastures (class 18) during summer, whereas for most roe deer, their use of broad-leaved forests seem to be higher during the winter months (class 23). Great care should be taken not to over-interpret the unreliable results from classes that are hardly present in the study area (e.g. classes 26–31). These types of figures and tables are highly suitable to inspecting categorical data such as land cover. Continuous data such as altitude are better represented using histograms.

Fig. 10.18
figure 18

The ratio of the proportion used and the proportion available (known as the selection ratio). Values above 1 are used more than their availability and vice versa. Blue points are for winter and red for summer. You can find the corresponding habitat types in the corine_land_cover_legend table from your database

In the previous demonstrations, you have been using R to visualise and analyse data stored in the database, and you also used R as an interface to interact with the database. An alternative approach is to use R from within the database, which we will demonstrate in the next chapter.