Introduction

The arctic and boreal biomes of the circumpolar North are undergoing dramatic changes in climate, geographic distribution, ecosystem function, and food web structure (ACIA 2005; Lovejoy and Hannah 2005; IPCC 2007; Lawler et al. 2009). Mapping the current extent of spatial overlap among sympatric species will be of important conservation concern as we monitor changes in the distributions of small mammal species in the future (Prost et al. 2013). In Alaska, small mammals are managed as non-game species under the Wildlife Action Plan (Fritts et al. 2006). This management plan recently called for the increased study of non-game and underrepresented species, especially birds and small mammals. Specific requests included efforts aimed at mapping species distributions, establishing spatial ecological system baselines, documenting biological diversity, and identifying lands vital for the conservation of wildlife in the face of increased human impacts in Alaska (Fritts et al. 2006).

In terrestrial communities, small mammals comprise a large portion of the primary consumer trophic level and represent the interface between fine-scale changes on the ground, including those related to water, soils, toxins, and micro-climate conditions (Hallett et al. 2003). Rodents are essential prey for a variety of carnivores and raptors, and also play invaluable roles in seed dispersal, nutrient cycling, plant growth, and herbivory (Newton 1979; Gough et al. 2007; Gilg et al. 2009; Olofsson et al. 2012). Insectivorous shrews, although less important as prey, are valuable in controlling invertebrate populations (Buckner 1964). Yet, despite the ecological importance of small mammals, high-resolution studies across the extent of Alaska are conspicuously lacking.

Most descriptions of small mammal distributions in Alaska have been coarse, non-quantitative, or incomplete, whereas spatially-explicit, GIS-based quantifications using modern statistical methods to analyze community composition and species richness patterns have not been conducted for the state (MacDonald and Cook 2009; Gotthardt et al. 2013; Hope et al. 2013, www.natureserve.org, www.iucnredlist.org). Using a novel niche modeling technique, we provide such a detailed, quantitative, spatial analysis that addresses many of the regional management goals for small mammals. These products should prove beneficial for land managers as they act to promote ecological stability through species diversity (Lawler et al. 2003; Hooper et al. 2005).

The ecological niche, which encompasses the environmental constraints of a species, is best suited for predicting the uncertain ecological outcome of species interactions. As conceptualized by Hutchinson (1957), the ecological niche is the space bounded by an n-dimensional hypervolume such that no two species can occupy exactly the same space (Cushman 2010). Dimensions include an infinite set of abiotic and biotic variables including optimal temperatures, precipitation regimes, land-cover, elevation, soil chemistry, and resource proximities, to name a few. Only by quantifying the current dimensions of niche space and interspecific overlap will it be possible to correctly predict how species may respond in a community context to a combination of altered food availability and a shifting geographic arrangement of species (Wang et al. 2004; Williams and Jackson 2007; Hope et al. 2010; Murphy et al. 2010; A. P. Baltensperger and F. Huettmann unpublished).

Spatial modeling adds the multiple dimensions of landscape space to the quantification of ecological niche breadth (Kerr et al. 2011). Beyond general linear models, machine-learning algorithms such as RandomForests, TreeNet, Mars, CART, and MaxEnt are especially adept at estimating species distributions by incorporating the environmental conditions at species’ detection locations into spatial predictions (Wiersma et al. 2011). Unlike resource selection functions, which only include a limited set of variables (e.g. Johnson et al. 2004) machine-learning can include hundreds of variables and all of their interactions to identify dominant signals in the data (Breiman 2001a, b; Cutler et al. 2007). RandomForests is therefore capable of incorporating many dimensions of the ecological niche simultaneously (Cutler et al. 2007; Booms et al. 2010, Evans et al. 2011). As such, machine-learning modeling techniques are some of the newest and most comprehensive methods for deciphering complex, confounding, and non-linear relationships among variables that drive ecological processes (Breiman 2001b; Cutler et al. 2007; Kelling et al. 2009; Huettmann and Gottschalk 2010; Li et al. 2011).

To outline the potential for inter-specific competition in small mammal assemblages, we focus on the construction of detailed niche-based distribution maps for 17 species (Table 1), using them to identify the current arrangement of small mammal communities and to create a species richness map for small mammals in Alaska. This research, in concert with subsequent analyses of dietary niche overlap using stable isotopes (A. P. Baltensperger et al. unpublished) and future projections of species distributions (A. P. Baltensperger and F. Huettmann unpublished) will quantify the multi-metric ecological niche spaces occupied by small mammals, and provide projections as to how the roles of organisms are likely to shift in a future dominated by changes in climate and land-use (Wang et al. 2004).

Table 1 Integrated Taxonomic Information System (ITIS)-derived scientific names, common names, and Taxonomic Serial Numbers (TSN) for modeled species

Methods

Study area

Alaska covers an area of 1.7 million km2 and extends from 71.4°N latitude at Pt. Barrow to 54.2°N at Amatignak Island, and 130.0°W longitude in Portland Canal in the Alexander Archipelago to 172.4°E on Attu Island in the Aleutian Archipelago. The state contains a diversity of geographic features including several mountain ranges—notably the Alaska, Brooks, Coastal, Aleutian, and Chugach Ranges (Fig. 1a)—and elevations up to 6,036 m. Alaska’s vast land area contains hundreds of glaciers and thousands of lakes that are drained by several large river systems (Molina 2001). Extreme variations in climate and geography have resulted in diverse ecosystems that include: arctic sedge tundra, boreal forest, deciduous hardwoods, peat wetlands, temperate rainforest, coastal grasslands, alpine tundra, shrub-lands, and others (Viereck et al. 1992).

Fig. 1
figure 1

Map of study areas depicting a Alaskan ecoregions and physical features, and b small mammal sampling locations between 2010 and 2013. Locations are organized along latitudinal and longitudinal megatransects across the state

Data collation

We compiled records of small mammals from digital georeferenced collections totaling over 112,000 occurrence records in Alaska. A subset of these was used as training data to create distribution models for 17 species of rodents and shrews in mainland Alaska (Table 1). Data were collated from archived occurrence datasets, primarily from the Global Biodiversity Information Facility (GBIF; www.gbif.org), but also from a variety of natural history museum collections that do not necessarily serve their data to GBIF. This compiled set of presence-only records was filtered to remove duplicates, coincident detections of species at the same location, and those records without geographic precision to at least five decimals (sub-1-m accuracy). Because of the presence-only nature of archived datasets that lack a geographically stratified design, we aimed to minimize the effects of sampling bias by using only one record per species within a 2-km radius of any given location. After manually removing these inaccurate or duplicate records, a total of 4,408 unique georeferenced small mammal records collected between 1900 and 2012 remained and comprised the final model training dataset (Appendix 1—Supplementary material 1).

Field collection

As part of a larger effort to expand wildlife occurrence databases in Alaska, and to sample small mammal tissues for stable isotope analyses, we conducted 20 small mammal inventories along two mega-transects (Assogbadjo et al. 2005) across the state between 2010 and 2013 (Fig. 1b). During 2011, we sampled small mammal diversity along a 1,500-km latitudinal transect between the Arctic Ocean and the Gulf of Alaska. In 2012, we completed small mammal biodiversity sampling at seven locations along a longitudinal transect of the Yukon River during a 1,250-km canoe expedition from the Dalton Highway to Mountain Village. Additional sampling was conducted at the mouth of the Canning River on the Arctic coast during 2010 and near the mouth of the Chandalar River in 2013.

At each location we attempted to detect rodents and shrews using 200–300 traps (Sherman live traps, Museum Special snap traps, and pitfall traps) set at 10-m intervals along two or three trap-loops throughout available habitats within 1 km of the plot center. Traps remained open for five nights at each site (10 nights at Canning River) so that all sites were sampled with at least 1,500 trap-nights (number of traps × number of nights). Different trap types have different detectability rates, but the diversity of traps allowed for the sampling of a variety of taxa; Sherman live-traps primarily captured rodents, Museum Special traps captured rodents and some shrews, and pitfall traps captured only shrews. We received International Animal Care & Use (IACUC) approvals (172650-2, 172650-16) and Alaska Department of Fish and Game (ADF&G) Collection Permits (10–135, 11–114, 12–106, 13–162) for all field protocols, and specimens were archived at the University of Alaska Museum of the North.

We recorded the species detected at sampling locations for each day and plotted the accumulated species richness against the cumulative number of trap-nights. Linked with predictive modeling, this mobile, low-impact sampling scheme was designed as an efficient and cost-effective means of independently sampling biological diversity across a large geographic extent. Detections of small mammals in these surveys were later used to independently validate the accuracy of species distribution models created from the small mammal training dataset.

Model development

We used RandomForests (Salford Systems, Inc., San Diego, CA, USA; www.salford-systems.com) to create spatial distribution models for each of the 17 species of mainland small mammals in Alaska. RandomForests is a machine learning software that uses binary recursive decision trees to parse data points into terminal categories that minimize within-group variance (Cutler et al. 2007; Elith et al. 2008; Supplemental Material 2). Machine learning methods are non-parametric, and are especially adept at incorporating multi-variate interactions to analyze large, datasets without consistent sampling protocols (Prasad et al. 2006; Cutler et al. 2007; Elith et al. 2008; Evans et al. 2011). As such, they are an effective means to describe and predict the complexity of ecological systems (De’ath and Fabricius 2000; Prasad et al. 2006; Evans et al. 2011; Baltensperger et al. 2013). Results are data-driven and not fit to a priori assumptions as would be the case using frequentist, Bayesian, or maximum entropy (MaxEnt) methods (Breiman 2001a; Cutler et al. 2007; Elith et al. 2008; Phillips et al. 2006).

Presence points as well as ‘pseudo-absence’ points for each species were attributed with 33 environmental predictor layers (Table 2) using the intersect (isectpntrst) command in the free software, Geospatial Modeling Environment 7.2 (GME; H. Beyer; www.spatialecology.com/gme). Environmental predictor variables included continuous raster (60-m accuracy) and categorical polygon layers, all of which had the potential to affect the biogeography of small mammals. These effects may occur directly at the ecosystem or landscape scales (e.g. habitat, proximity to resources, topography, etc.), or indirectly at landscape or regional scales (e.g. climate, ecoregion, etc.; Table 2).

Table 2 List of predictor variables used in models, type of data (raster or polygon), and their online sources

Because this was a presence-only dataset, lacking available absences, it was necessary to generate a set of pseudo-absences to represent areas where target species weren’t likely to be found. Random sets of pseudo-absences resulted in inaccurate models, so pseudo-absences were instead derived from the presence locations of all other non-target species (Elith and Leathwick 2007; VanDerWal et al. 2009).

We assumed that a presence of any of the non-target species, without the coincident occurrence of the target species within a 1-km radius, represented a pseudo-absence for the target species (Elith and Leathwick 2007). Although not ideal, given potential differences in sampling among other collection efforts, this was the best available option given the limitations of presence-only datasets and has been shown to perform as well as or better than other pseudo-absence scenarios (Breiman 2001a; Elith and Leathwick 2007; VanDerWal et al. 2009).

The combined presence/pseudo-absence datasets for each species were then modeled in RandomForests (Appendix 2—Supplementary material 2). We grew each model to 1,000 trees and used all other software default settings. RandomForests then created a coded model called a ‘grove,’ containing the algorithm quantifying patterns in the training dataset. Aspatial performance was assessed using a set of ‘out-of-bag’ training points (a subset of points automatically left out of model construction; Breiman 1996). Using this out-of-bag dataset, predictive performance of each model was calculated using the area under the curve (AUC) based on the receiver-operating characteristic (ROC), which quantified the percentages of correctly-predicted presences and absences in each model (Zweig and Campbell 1993; Fielding and Bell 1997; Huettmann and Gottschalk 2010). RandomForests was also used to rank the relative importance of environmental variables in models (Supplemental Material 2).

The grove files generated by RandomForests, containing the predictive algorithm, were then applied to a regular lattice of points (also attributed with the environmental variables) spaced at 5-km intervals across Alaska. Model outputs generated relative indices of occurrence (RIO; a ranking of pixels from 0 to 1 representing the likelihood of belonging to the ‘presence’ class) for each point in the regular lattice based on its underlying environmental variables. For better continuous spatial visualization, RIO values were smoothed between neighboring points across the extent of the study area using the Inverse Distance Weighting tool with 300-m resolution in ArcGIS 10.0 (ESRI, Inc., Redlands, CA, USA) and clipped to the state coastline, yielding a spatially continuous predictive distribution raster map of each small mammal species for Alaska. All GIS models and predictor layers were archived and are freely available on the online data repository dSpace (www.dspace.org) at the University of Alaska Fairbanks Elmer E. Rasmuson Library.

Model validation

One advantage of our predictions is that they carry known accuracy estimates since they come from a consistent, testable, and transparent prediction process. We used independent field data sampled at 20 locations across Alaska to validate the spatial predictive accuracy of all maps. Observed presences and absences of species in the field were compared with model-predicted values at field locations for each species. We used a symmetric threshold of RIO = 0.5 for differentiating between model-predicted presences and absences and calculated the percentage of field points correctly predicted as presences and those correctly predicted as absences by each model. Using these accuracy percentages, we calculated Cohen’s kappa (a statistical measure of agreement between modeled and observed values) for each species (Cohen 1960; e.g. Baltensperger et al. 2013).

Community composition analysis

In order to identify the degree of spatial niche overlap between species, we created a set of 50,000 random points across Alaska and attributed each point with the RIO values from the 17 species models. We used the chart.correlation command from the Hmisc package (F. Harrell; https://github.com/harrelfe/Hmisc) in R 2.12.1 (R Core Team 2013) to create a correlation matrix between species. This function yielded Pearson correlation coefficients (ρ2) for all interspecific relationships. Species-pairings with correlation coefficients ≥0.5 were considered to be positively correlated and likely to co-occur in space, whereas pairings with a coefficient <−0.5 were negatively correlated and unlikely to co-occur. Coefficients between 0.5 and −0.5 were regarded as uncorrelated. Clusters of correlated species were visualized in tree form using the varclus command in Hmisc, so that we could easily identify groups of sympatric species. Using binary reclassified distribution models we also produced maps depicting the regions of Alaska where these communities are predicted to occur.

Biodiversity hotspot analysis

A composite biodiversity map was created for small mammals in Alaska by summing individual species models of known accuracies to create an implied predictive species richness map. Continuous species models were reclassified in a binary format so that cells with RIO < 0.5 (indicating the predicted absence of a species) were assigned an absolute absence value of 0, whereas cells with RIO ≥0.5 were assigned an absolute presence value of 1. The reclassified binary species models were summed in ArcGIS Raster Calculator to yield a raster whose cells indicated the total number of species predicted to occur there. We also calculated Pearson’s correlation coefficient (Zar 2010) to assess the agreement between species richness values predicted by the composite biodiversity model and the number of species observed in the field.

We highlighted regions where ≥11 species (≥85 % of maximum predicted species richness) were predicted to occur and arbitrarily designated these as biodiversity hotspots. The resultant biodiversity map was intersected with a land ownership map of Alaska to determine which government agencies and Native corporations are responsible for managing lands on which the highest levels of small-mammal species richness occur. Ownership of biodiversity hotspots was further parsed into individual management units for each managing entity and land areas were calculated for each species total.

Results

Field sampling

Over the course of 30,700 trap-nights (Fig. 2), we captured 624 small mammals belonging to 18 species at 20 locations along two geographic mega-transects spanning Alaska (Fig. 1b). Only one species (American water shrew; Sorex palustris) of mainland Alaskan small mammals was not detected at any location (MacDonald and Cook 2009; Fig. 2). We documented several species in regions of the state where they had not previously been identified, representing range extensions for some. These new records included the capture of the rare and understudied Alaska tiny shrew (S. yukonicus; but see Hope et al. 2010 for taxonomy) in the Yukon-Tanana Uplands for the first time, as well as the documentation of the westernmost occurrences of yellow-cheeked voles (Microtus xanthognathus) near the village of Russian Mission, and long-tailed voles (M. longicaudus) in the White Mountains north of Fairbanks (MacDonald and Cook 2009).

Fig. 2
figure 2

Composite histogram of all species detected at 20 sampling locations between 2010 and 2013. Each location was sampled with 1,500 trap-nights

Northern red-backed voles (Clethrionomys rutilus; but see Carleton et al. 2014 for taxonomy) were the dominant species at all but five locations near the geographic limits of their distributions where root voles (Microtus oeconomus), singing voles (M. miurus), and northern collared lemmings (Dicrostonyx groenlandicus) were caught in greater abundance (Fig. 2). The dominant shrew species at most sites was the cinereus shrew (Sorex cinereus), except at Mountain Village where only tundra shrews (S. tundrensis) were captured.

Species richness curves averaged across all sites showed a roughly logarithmic increase in the number of species detected over the standard sampling period (Fig. 3). After just 300 trap-nights, a mean of 1.9 species was detected, but an additional 1,200 trap-nights resulted in the detection of fewer than two additional species and a mean total of 3.3 species per plot. However, no asymptote for species detection was attained after 1,500 trap-nights, indicating that the extent of total species richness had not been sampled.

Fig. 3
figure 3

Mean number of species detected at sampling locations after cumulative number of trap-nights. Error bars denote 95 % confidence intervals

Model accuracy

Distribution maps created from each of the 17 species models (Appendix 3—Supplementary material 3) demonstrated high degrees of accuracy when evaluated aspatially within each model using OOB cross-validation methods in RandomForests (Table 3), as well as spatially using the independent field-derived validation dataset (Table 4). Areas under the ROCs were greater than or equal to 0.90 for all species with the exceptions of water shrews and cinereus shrews (Table 3). All but two models (northern collared lemmings and northern bog-lemmings; Synaptomys borealis) demonstrated overall aspatial accuracies greater than 50 %. The percent of training presence points correctly identified as presences in the models (sensitivity) exceeded 90 % for 14 of the 17 species, whereas the percentages of absences correctly identified (specificity) were somewhat less accurate but nevertheless exceeded 50 % for all but two species (Table 3).

Table 3 Model training dataset sample sizes and aspatial (internally cross-validated) model performance metrics for 17 species of small mammals in Alaska
Table 4 Sample sizes and validation statistics for the independent field validation dataset compared to model predictions for 17 species of small mammals in Alaska

Model validation

Field validations of model predictions indicated the accurate spatial performance of most predictive models. Sensitivities and specificities were greater than or equal to 50 % for all models with the exception that just 11.1 % of cinereus shrew absences in the field were correctly identified as such by the model (Table 4). In general, sensitivities exceeded specificities, but sample sizes of presences for several species were small, making meaningful interpretation of validations difficult. A more conservative performance measure, Cohen’s kappa, for long-tailed voles and singing voles was between 0.6 and 0.8, indicating ‘substantial’ agreement between models and field observations (Table 3; Landis and Koch 1977), whereas kappas for northern collared lemmings, root voles, montane shrews (Sorex monticolus), and barren-ground shrews (S. ugyunak) were between 0.4 and 0.6 and demonstrated ‘moderate’ agreement. Validations between model predictions and field detections for six other species yielded kappas between 0.2 and 0.4, and less than 0.2 for an additional four species indicating ‘fair’ and ‘poor’ agreement, respectively (Table 4). Nevertheless all models performed better than random.

Species distributions and community compositions

Predicted distribution models of small-mammal species (Appendix 3—Supplementary material 3) were grouped by varclus analysis into five communities of similarly-distributed species (Fig. 4). The first community group, referred to hereafter as the ‘cold-climate community’, was composed of species found at high latitudes as well as high elevations mainly across the North Slope and throughout the Brooks Range (Figs. 4, 5a). None of the four species in this cluster were predicted to occur with any certainty in the center of the state throughout the central portions of the Yukon and Kuskokwim River valleys, where members of the interior and southern communities were concentrated. The second cluster, or ‘northern community,’ was composed of species that occurred across much of the region north of the Alaska Range (Figs. 4, 5b). These species were distributed patchily in a metapopulation arrangement across a variety of regions. Members of the third group, or ‘continental community’, included species occurring primarily near the Canadian border and apparently near the latitudinal extents of more southerly ranges (Figs. 4, 5c). The fourth, or ‘interior community,’ included two species that were both primarily restricted to a narrow swath of dry boreal forest between the Brooks and Alaska Ranges (Figs. 4, 5d). Northern red-backed voles were predicted to belong to this community, even though their range was much more expansive. The fifth species cluster, or ‘southern community,’ was composed of species predicted to occur mainly south of the Brooks Range (Figs. 4, 5e). Top variables were largely consistent among models and on average were ranked in the order of Soil Type, Ecoregion, Landfire Landcover, December Sea Ice, and June Sea Ice (Appendix 4—Supplementary material 4).

Fig. 4
figure 4

Results of varclus analysis depicting small mammal community clusters. Species pairs with root node correlation coefficients >0.25 are considered to be part of the same community and have the same color. Species pairs with root node correlation coefficients <0 are negatively correlated

Fig. 5
figure 5

Biodiversity hotspot maps depicting model-predicted small mammal species richness values for five geographic community clusters: a cold-climate, b northern, c continental, d interior, e southern, and f composite species richness map for Alaska. Maps are summations of individual species maps (Appendix 3—Supplementary material 3) converted to binary maps using relative index of occurrence (RIO) = 0.5 as a threshold to differentiate between the presence or absence of each species at each pixel

Regional biodiversity hotspots

A composite biodiversity map derived from the summation of 17 binary species models identified four main small-mammal species richness hotspots in Alaska (Fig. 5f). Model predictive accuracy, assessed using Pearson’s correlation coefficient, indicated moderate positive correlation (r = 0.6) between modeled and observed species richness values for Alaska. Statewide, the majority of lands coinciding with biodiversity hotspots (>10 species) are managed by the State of Alaska (20,199 ha), and the Bureau of Land Management (BLM) and Regional Native Corporations maintain an additional 7,271 ha and 5,587 ha, respectively (Table 5). The largest and most diverse of these hotspots occurred across the Yukon-Tanana Uplands near the Canadian border. Most of this area is managed by the State of Alaska, including the largest area predicted to contain the highest statewide level of small mammal diversity (13 species) in Game Management Unit 25 (Table 5). We detected six species in 1,500 trap-nights nearby at the Upper Tanana site in 2011 (Fig. 2). A significant portion of the Yukon-Tanana Uplands hotpot also occurs on land managed by the BLM, including in the Steese National Conservation Area, where we detected seven species in 1,500 trap-nights at the White Mountains site in 2011 (Fig. 2). Doyon Regional Native Corporation, the National Park Service, and the U.S. Fish and Wildlife Service also maintain thousands of hectares containing high small mammal diversity in this region (Table 5).

Table 5 Land ownership status and area (ha) for biodiversity hotspots in Alaska containing at least 11 species of small mammal

The second small mammal hotspot occurred in the mountainous region between the headwaters of the Koyukuk, Kobuk, and Noatak Rivers in the central Brooks Range. Most of this land is managed by the National Park Service and the State of Alaska (Fig. 5f). A third hotspot cluster was located east of Kotzebue Sound in the Selawik National Wildlife Refuge, and on BLM and State of Alaska lands. Other diversity hotspots included several areas to the northwest of the Alaska Range in Denali National Park and on nearby BLM and State of Alaska lands (Fig. 5f). Regions predicted to contain low small mammal diversity included the North Slope, lower Yukon River, Yukon-Kuskokwim Delta, and Bristol Bay. Independent field results largely support these predictions. For example, we detected only northern collared lemmings at the Canning River site on the North Slope, just two species in the Nulato Hills along the lower Yukon, and three species at Mountain Village on the Yukon-Kuskokwim Delta (Fig. 2).

Discussion

The goals of this research were to compile species occurrence records, predict species distribution and richness patterns, and to delineate the geographic community structure of small mammals in Alaska, while providing a modeling framework for other multi-species systems. We found that the distributions of the mainland small mammal species of Alaska can objectively be structured into five main community groups (Fig. 4), each with a unique set of geographic patterns (Fig. 5) but similar ecological predictors (Appendix 4—Supplementary material 4) that depict the influence of climate, soils, and vegetation on the arrangement of species across the state. We have created fine-resolution, statewide distribution maps for 17 mainland small mammal species in Alaska that represent the most accurate continuous depictions of occurrences to date (Appendix 3—Supplementary material 3). We also created species richness curves for sampling locations (Fig. 3), objective delineations of small mammal community structure (Figs. 4, 5) and a small-mammal species richness map that is the first of its kind for small mammals in Alaska (Fig. 5f). The moderate to high accuracy of these models attests to the efficiency of machine learning techniques when applied to archived datasets not collected using consistent methods.

Species richness sampling

The style of rapid assessment or ‘bio-blitz’ (Wilson 2006) sampling employed here allowed for small, mobile trapping teams to efficiently sample a geographically significant portion of Alaska in just two main field seasons. The detection of all but one of the small mammal species in the region is a testament to the efficacy of this design. Sampling efforts also added a large number of records to the statewide species occurrence dataset, expanding known species ranges and filling in training datasets gaps.

Trapping efforts detected roughly half of the model-predicted number of species occurring in most regions. The under-sampling of total species diversity at the plot level was perhaps the trade-off of a geographic mega-transect strategy designed to maximize diversity detection at the statewide scale. Despite a variety of trap styles aimed at detecting a diversity of species, and higher than average trap-nights, some species may have been especially trap-shy and remained undetected despite this intense effort. Because the number of species detected continued to increase with additional trap-nights, studies aiming to detect levels of species richness at the study site scale would be served well to trap in excess of 1,500 trap-nights.

Model progress and accuracy

All models performed remarkably well given their ability to correctly identify species presences. The models created here represent improvements in detail and accuracy over other maps for small mammals in Alaska including NatureServe and the International Union for Conservation of Nature (IUCN; www.iucnredlist.org) range maps, deductive and inductive distribution models by the Alaska Gap Analysis Project (AKGAP; http://gapanalysis.usgs.gov/species/data) models (Gotthardt et al. 2013), and other recent species niche models (Hope et al. 2013). Commonly-used range maps are coarse in scale and reflect only basic minimum convex polygon outlines of the extents of species occurrences without accounting for the influence of environmental variables in defining niche space.

The AKGAP deductive models were derived solely from habitat suitability associations and these models tended to over-predict wildlife distributions (Gotthardt et al. 2013). Although inductive AKGAP models incorporated 20 environmental variables into predictive models, this is 13 fewer than used here, and inductive models tended to under-estimate distributions. Nearly all of our species models had higher overall accuracies than models for the same species generated by AKGAP (Gotthardt et al. 2013). Our models had AUC values similar to those of AKGAP and exceeded those for the five species modeled by Hope et al. (2013).

Nevertheless, all of these ecological niche-modeling approaches provide valuable species distribution predictions that likely fall on a spectrum between depictions of the fundamental and realized niche spaces. Models that over-predict distributions represent more of a fundamental niche versus our models, which likely depict a more restricted realized niche. Because ecological niche models often do not include parameters to account for the details of physiology, movement, and adaptation (Bush 2002), real distributions are probably closer to some combination of these models. Future distribution mapping efforts should focus on combining several modeling approaches into a single ensemble model framework that utilizes the best components of each to produce the most accurate spatial models (Elith and Leathwick 2007; Hardy et al. 2011).

The improved accuracy of our models can be attributed to the use of more accurate presences and more representative pseudo-absence datasets based on the locations of non-target-species where target species did not also occur. This practice is an improvement over the common alternative of using randomly generated pseudo-absences or Maxent-generated absences, and resulted in more accurate models that generalized well without fitting too tightly to the training data (Elith and Leathwick 2007; Gotthardt et al. 2013). Our emphasis on correctly predicting presences may have come at the cost of reduced absence prediction, as many specificity values and consequently some overall accuracies were rather low in comparison. This effect may be a necessary detriment of using pseudo-absences in lieu of ‘true absences’ recorded in the field. Nevertheless, given the complexity of archived datasets, we have created models representing accurate predictions of species occurrence. We recommend using non-target surveys to aid in generating appropriate pseudo-absence scenarios for the creation of other multi-species, presence-only, distribution models.

Environmental predictors

The top three predictors, Soil Type, Ecoregion, and Landcover, were similar for all species, and along with some climate-related layers were consistently the most important predictors used in model algorithms (Appendix 4—Supplementary material 4). Their prevalence demonstrates a consistent bottom-up effect of climate and soils interacting to produce habitats that drive biodiversity and community assemblage patterns. These results suggest that shifting habitat conditions resulting from changes in climate will likely have strong influence in determining distributions of wildlife and inter-specific relationships at northern latitudes.

Although these were the most important predictors on a geographic scale, their values may be overestimated at finer scales in the field. We should also note that the top three predictors are all categorical variables. Because of their categorical structure, RandomForests can easily utilize stark differences between categories to partition data points, effectively inflating the importance of these variables in the models. Nevertheless, these results provide data-mining-based foundations for more detailed hypothesis-driven analyses aimed at identifying mechanisms driving patterns of wildlife distribution.

Community structure

Small mammal species in Alaska can be organized into five main community groups that reflect their current distributions and potential for interactions with other species. Varclus provides a repeatable method for outlining large-scale spatial relationships among wildlife species and for documenting changes in community arrangement over time. For example, the spatial pattern for the cold-climate community is an approximate inverse prediction of the interior community’s distribution, and members of these two communities do not often co-occur. As the membership of communities and the spatial arrangement between them changes with the warming climate, a consistent analysis such as varclus will be useful for documenting specific changes in the community composition and overlapping distributions of wildlife species around the world.

Although varclus community clusters indicate the most common arrangement of species at a geographic scale, they do not reflect the extent of species assemblages that may occur across different habitats. The statewide species richness map depicts overlapping distributions of >6 species over a large portion of the state, clearly demonstrating frequent cross-over between community clusters on the landscape (Fig. 5f). Correlations between species within each cluster were high, but in some cases inter-cluster correlations for some species combinations were also large. The tightest geographic relationships occurred in the southern community, and indeed all of these species were frequently detected together in the field (Fig. 2).

Trapping records included several instances in which species belonging to different community groups co-occurred at a single location. For example, northern red-backed voles, root voles, and cinereus shrews—members of three different communities—occurred together at three sampling locations (Fig. 2), indicating wider geographic niche breadths and more generalist distribution patterns. Dominant species like northern red-backed voles and cinereus shrews are increasingly being found beyond their historical distributions, leading to the higher likelihood of novel species contacts and newly emerging interspecific relationships (Hope et al. 2013). Recent stable isotope analyses have shown that in areas where species distributions overlap, dietary plasticity and niche partitioning may allow dominant and secondary species to coexist without significant competition (A.P. Baltensperger et al. unpublished). Monitoring how changes in the extent of geographic overlap between species may alter community membership can serve to identify the landscape-level effects of environmental change on wildlife persistence (Hope et al. 2013).

Regional biodiversity patterns

The region with the highest level of small mammal diversity was the Yukon-Tanana Uplands, where a maximum of 13 species were predicted to co-occur. This region appeared as a major biodiversity hotspot for several reasons. First, it is closest to the North American interior both geographically and ecologically. It is an extension of the interior Canadian boreal ecoregion and represents the farthest reach of many species that may be slowly expanding their ranges northward from the contiguous United States and Canada (Parmesan and Yohe 2003; Root et al. 2005). This includes members of the continental community, as well as members of the interior and southern communities. Many of these species are also not usually found outside of the interior and historically did not occur in Alaska prior to the last glacial maximum (MacDonald and Cook 2009; A.G. Hope personal communication).

Second, this region contains a wide range of elevations and habitats, resulting in a variety of available niches. With the diversity of habitats, it is likely that more common species such as northern red-backed voles, cinereus shrews, root voles, and other members of the northern community such as brown lemmings and tundra shrews would be found there. Because of the high elevations, the models predict that singing voles should also occur there. The only mainland species not predicted to live in this region are all three of the cold-climate species, whose distributions are far removed from this area. A similar geographic ecotone containing a variety of habitat types may also account for the hotspot in the Central Brooks Range between the headwaters of the Koyukuk, Kobuk and Noatak Rivers, as well as the hotspot cluster on the lee side of the Alaska Range. Such small mammal biodiversity hotspots occurring at the ecological crossroads along biome boundaries support the notion of these areas as important biodiversity reservoirs worthy of conservation in a changing climate (Neilson 1991).

Management implications

The conservation of biodiversity is important for a number of reasons. Although many of these species occur together across the state and appear to fill similar ecological roles, our understanding of the mechanistic functions and niche overlap of animals in ecosystems is limited (Churchfield et al. 1999; Hooper et al. 2005; Fritts et al. 2006; Prost et al. 2013). Nevertheless, apparent ecological redundancy has the benefit of insuring against the uncertainty of climate change. Maintaining a range of species that provide different ecological services and that may respond differently to environmental disturbances can have a stabilizing effect on food webs and ecosystems as they evolve (Aarssen 1997; Hooper et al. 2005; Duffy et al. 2007). Maximizing diversity also increases the likelihood that species that have disproportionately large effects on ecosystem functionality will persist (Aarssen 1997; Hooper et al. 2005). Furthermore, active conservation of a diversity of prey species occupying a variety of niches is an essential part of conserving predator diversity, and ultimately for maintaining ecosystem-wide trophic structure and functionality (Noss 1990; Lawler et al. 2009).

Because two-thirds of the land in Alaska is public, the vast majority of small-mammal hotspots occur on federal and state lands, granting an opportunity to pursue biodiversity conservation on a large scale. For land managers, the results of these types of analyses should provide them with the spatially-explicit tools and knowledge to prioritize species richness as a conservation management goal. Documenting current distribution and baseline community patterns of primary consumers at a geographic scale is the first step towards identifying the effects of impending environmental changes on the bottom-up flow of nutrients into wildlife communities (Noss 1990). Of course, species responses will not be uniform, but will depend on the capacity of each to tolerate, adapt, or disperse given rapid, large-scale ecological change (Parmesan and Yohe 2003; Williams and Jackson 2007; Hope et al. 2013). Monitoring shifts in individual species distributions over time will provide tangible accounts of how species are responding across space, and will be vital for assessing the temporal stability and adaptive capacity of natural systems (Hooper et al. 2005; Hope et al. 2013). Based on other predictive modeling efforts (Magness et al. 2008) we advocate for the establishment of a permanent network of small mammal survey sites, distributed across the state, but especially in the areas of highest diversity (e.g. Yukon Tanana Uplands), and checked at annual or decadal intervals to serve as the foundation for such long-term monitoring efforts (Noss 1990; Hope et al. 2013). Not only could a network of stations monitor species richness, but using a consistent trapping grid protocol would also allow for the calculation of species densities. These could also be modeled across space to create detailed maps of population status for multiple species. Both would be sound applications of the best professional research practices to wildlife management across a continually changing landscape.

Although species distributions and community compositions are likely to shift with the climate over time, providing wildlife with the opportunity to disperse to new areas within their niche envelope will be paramount for their persistence into the future (Bush 2002; Williams and Jackson 2007). Even as the climate, soils, and habitat conditions change, if land managers can promote the continued connectivity of important refugia along latitudinal and elevational corridors, then species incapable of coping with new environmental conditions can disperse to unexploited areas of their fundamental geographic niche (Bush 2002; Hope et al. 2013). Predicting where and how the environment will change, determining how species are likely to respond, and conserving these areas for the future are the biggest challenges currently facing species diversity conservation worldwide.