Introduction

Northern Canada is among of the most remote and isolated areas on Earth, making biological surveys costly and difficult. For insects, other complications involve short sampling windows (e.g., short summers) that reduce the chance of obtaining specimens in the appropriate life-history stage (especially adults) for species-level identification. Consequently, important components of northern biodiversity remain poorly studied, particularly in highly diverse organisms such as aquatic insects.

Mayflies (Ephemeroptera), stoneflies (Plecoptera), and caddisflies (Trichoptera)—the EPTs—along with Diptera, are the most prominent insect orders in northern aquatic ecosystems (Merritt et al. 2008). Collectively, these insects are a key component of energy flow in aquatic systems, as they constitute an important link between the algal/detrital food base and higher trophic levels, including the fishes that feed upon them (Benke et al. 1984). EPTs are widely distributed in the northern Nearctic Region, with representatives occurring on most islands of the Canadian Arctic Archipelago, which makes them a suitable group for large-scale biodiversity assessments of that region.

DNA barcoding has received much attention for its potential for a rapid and relatively unbiased prospecting of biodiversity (Hebert and Gregory 2005). In addition, the approach has proven particularly useful for species-level identification in insects, regardless of gender (Ekrem et al. 2010) or developmental stage (Gullan and Cranston 2010; Zhou et al. 2009, 2010). The mitochondrial gene cytochrome oxidase subunit I (COI) is the genetic marker of choice for animals due to the benefits of working “universal” primers, length, alignment ease, and degree of intra- and intergenetic polymorphisms (Hebert et al. 2003). In addition to its utility for distinguishing known species, the COI region has also been found useful for revealing cryptic diversity (Hebert et al. 2004; Smith et al. 2008; Ståhls and Savolainen 2008; Monaghan et al. 2009; Velonà et al. 2015), as well as biogeographic and phylogeographic patterns (Pons et al. 2006; Hajibabaei et al. 2007; Craft et al. 2010). Thus, a barcoding approach promises to be a useful tool both for conducting large-scale biodiversity assessments, and exploring recent population divergence patterns (e.g., during Pleistocene glacial cycles).

Recent important works using the COI barcode approach have increased knowledge of biodiversity in various orders of aquatic insects in northern North America (Ball et al. 2005; Webb et al. 2012) and northern Europe (Boumans and Brittain 2012; Kjaerstad et al. 2012). In particular, notable are COI barcoding studies of EPTs from the vicinity of Churchill, Manitoba, conducted over a 6-year period, which established a baseline for biodiversity assessments in other parts of northern Canada (Zhou et al. 2009, 2010; Ruiter et al. 2013). While it is clear that the northern EPT fauna is much richer than previously supposed, these efforts were concentrated in a relatively small geographic area, leaving the vast majority of northern Canada poorly surveyed.

This study aims to redress this deficiency by undertaking the first broad-scale study of EPTs in northern Canada. The specific objectives of this study are to: (1) assess EPT diversity across multiple, widely separated locations using a COI barcoding approach, (2) compare species-specific cases of recent population-level divergence using a molecular clock approach, and (3) determine whether such genetic divergences are related to past climatic events. This is the first broad-scale study of EPTs in northern Canada that uses a DNA barcoding approach to assess species richness and compare patterns of intraspecific divergence.

Materials and methods

Fieldwork

Specimens of three orders of aquatic insects (EPT) were collected from 12 widely distributed sites in northern Canada during June and July of 2010 and 2011. Four sites were distributed among each of three northern ecoclimatic zones, as follows: Boreal—Goose Bay (NL), Moosonee (ON), Yellowknife (NT), Norman Wells (NWT); Subarctic—Schefferville (QC), Churchill (MB), Kugluktuk (NU), Ogilvie Mountains (YT); and Arctic—Iqaluit (NU), Lake Hazen (NU), Cambridge Bay (NU), and Banks Island (NT) (Fig. 1). Sampling took place for approximately two weeks at each site, with boreal sites sampled earliest in the season (followed, respectively, by subarctic and arctic sites) to ensure that insects were sampled at more-or-less the same stage of development in each ecoclimatic zone. Easternmost sites were sampled in 2010 and westernmost sites in 2011.

Fig. 1
figure 1

Sampled sites across the three main ecoclimatic region (arctic, subarctic and boreal) in northern Canada. Dark circles eastern sites: Goose Bay (GB), Iqaluit (IQ), Lake Hazen (LH), Moosonee (MO) and Schefferville (SV). White circles western sites: Banks Island (BI), Churchill (CH), Cambridge Bay (CB), Kugluktuk (KT), Norman Wells (NW), Ogilvie Mountains (OM) and Yellowknife (YK)

Standardized sampling protocols were implemented on an equal number of lentic (standing water) and lotic (running water) habitats at each site. Each habitat was sampled using a D-framed aquatic dip net for 20 min to collect the immature (i.e., aquatic) stages of EPTs. Net contents were emptied into shallow pans of water, with larvae of target taxa removed using fine forceps. A further 15 min was spent sweeping the riparian vegetation surrounding each habitat to collect adult EPTs. Aquatic and terrestrial specimens were fixed mainly in 95% ethanol to facilitate both morphological identification and DNA barcoding. However, in instances when ethanol supplies became exhausted toward the end of a sampling period at some locations, material was fixed in 80% isopropyl alcohol or propylene glycol, and later transferred to 95% ethanol. Vouchers are deposited in the Entomological collection at the Royal Ontario Museum, Toronto, Canada.

Laboratory procedures

Approximately 10,000 EPT specimens were collected and sorted under magnification with a Nikon stereomicroscope. Specimens were initially identified to the lowest possible taxonomic level following taxonomic keys in Merritt et al. (2008), Wiggins (1996) and Ruiter (1995). Selected adults and mature larvae were sent to taxonomic specialists for confirmation of species-level identifications: Ephemeroptera (S. Burian, Southern Connecticut State University), Plecoptera (R. Baumann, Brigham Young University and B. Kondratieff, Colorado State University), and Trichoptera (D. Ruiter, Grants Pass, Oregon). Finally, 800 specimens—700 larvae and 100 adults—representing all the morphospecies identified by us, were selected for DNA extraction and COI barcode sequencing. Specimens from Churchill, Manitoba, were not barcoded because all morphospecies collected by us corresponded with those already identified and barcoded from that site by Zhou et al. (2009, 2010). Accordingly, sequence data from those studies were used as a proxy for the EPTs collected in the present study.

Genomic DNA was extracted from the hind leg of selected specimens using the DNeasy Blood & Tissue Kit (Qiagen, Inc.). Extracted DNA was eluted in 150 μl of elution buffer AE (10 mM Tris–HCl, 0.5 mM EDTA, pH 9.0). We amplified the cytochrome oxidase I (COI) gene for all 800 specimens using the primer set LCO1490 (5′-ATTCAACCAATCATAAAGATATTGG-3′)—HCO2198 (5′-TAAACTTCAGGGTGACCAAAAAATCA-3′) following the methods described by de Waard et al. (2008) and Hajibabaei et al. (2005). PCR products were visualized and excised from a 1% agarose gel, followed by, spin column centrifugation for PCR product clean-up, and Sanger sequencing using the BigDye Terminator kit v3.1. All sequencing reactions were performed using a Hitachi Applied Biosystems 3730 DNA Analyzer.

Genetic analyses

Obtained sequences were individually compared against the NCBI GenBank and BOLD systems v3 database using BLASTn to identify the species name using a 98% similarity cutoff. A sequence threshold of 2% was chosen to delimit haplotype clusters to compare with previous studies of northern EPT’s by Zhou et al. (2009, 2010). For sequences with greater than 2% divergence, voucher specimens were sent to taxonomic specialists of each group to confirm identification. In order to improve geographic sampling, we included COI sequences for 107 species produced by Zhou et al. (2009, 2010) from Churchill (Fig. 1; Online Resource 1). Sequences produced in this study were deposited in GenBank (NCBI) under accession numbers KJ674822—KJ675402, and the information of each sequence is provided in Online Resource 2.

Sequences were grouped according to order and then aligned and edited using Geneious® 6.1.6 (Biomatters Inc.) and ClustalX (Higgins and Sharp 1988) with default parameters. The best substitution model for each alignment was obtained using jModelTest v2 (Darriba et al. 2012) and the Akaike Information Criterion (AIC). Phylogenetic reconstructions for each insect order were obtained using BEAST v.1.7.5 (Drummond et al. 2012), by selecting a constant coalescent tree prior and a strict clock model with the rate fixed to 1.0, in order to estimate relative divergence times. Twenty million MCMC generations were run, sampling every 2000 states, to produce log and tree files with 10.000 states each. Convergence and chain mixing were assessed by visually inspecting each parameter trace from the log files using Tracer v.1.5 (Drummond and Rambaut 2007) considering effective sample sizes above 200 as a good indicator. After specifying a burnin of 10%, a maximum clade credibility tree (MCC) with mean heights was produced using TreeAnnotator V.1.7.5. MCC trees were visualized in FigTree v.1.4 to identify widespread species (Suppl. Figs. S1–3). Once species were identified, individual alignments were re-analyzed using the same procedure as above, but with an MCMC chain of 5 million generations, sampling every 500 states. MCC trees were observed and edited with the package APE (Paradis et al. 2004) for R v3.0.1 (R Core Team 2015). The time to the most recent common ancestor (TMRCA) was rescaled post-analysis based on a 3.54% divergence rate per Myr (0.0177 substitutions per site per Myr) proposed for insect COI sequences (Papadopoulou et al. 2010).

Statistical analyses

To compare the TMRCA for eastern and western populations of selected widespread species, the treeModel.rootHeight parameter was extracted from each species BEAST log file in R, discarding the initial 10% of the MCMC samples as burnin. We estimated the amount of overlapping time between the TMRCA of each species measuring the distance between means D with the formulae:

$$D = \frac{{\mu_{1} - \mu_{2} }}{{\sigma_{1} }}$$

where μ 1 is the mean TMRCA of the “reference” species, μ 2 is the mean TMRCA of the “compared” species, and the σ 1 standard deviation of the “reference” species. With D, the proportion of overlap p can be calculated using a normal density function P (pnorm() function in R):

$$p = 2P \left[ {{\text{X}} \le \left( {\frac{\left| D \right|}{2}} \right)} \right]$$

where X is a random variable from a normal distribution. Since the distribution of the MCMC TMRCA estimates was not symmetrical, we estimated p on each side of the distribution for each pair of species.

To account for the effect of randomness in the observed distribution of subpopulations, we performed a permutation test with 10,000 iterations. The test consisted of randomizing the distribution of all terminals into “East” and “West” and counting the number of times the observed distribution of subpopulations h o was found for each iteration. To obtain a p value, we used the following formula,

$$p = \frac{{\mathop \sum \nolimits_{i = 1}^{n} ri}}{n} \quad {\hbox{if}}\; r_i=h_o=1$$

where \(r_{i}\) represents the number of times h o was found in iterations. Permutation analyses and the statistical figures were made in R Core Team (2015).

Results

COI barcode subunit 1 sequences ranging between 650 and 660 bp were recovered for 587 individuals (73%) from 10 sites of 12 initial sites. Barcode data for Churchill, Manitoba, EPTs were derived from Zhou et al. (2009, 2010). Only one EPT morphospecies—an immature trichopteran of the family Limnephilidae —was collected from Lake Hazen on Ellesmere Island; however, we were unable to obtain sequences for that species.

Sequences were obtained from 87% of Ephemeroptera, 83% of Plecoptera and 71% of Trichoptera specimens sampled. Of these, a total of 155 species were identified based on BLAST using a 2% divergence cutoff. The number of species by order was: Ephemeroptera (58 spp.), Plecoptera (41 spp.), and Trichoptera (56 spp.) (Figure 2).

Fig. 2
figure 2

Barplot comparing EPT diversity between three studies in nothern Canada; Danks (1981) and Zhou et al. (2009) and the present study

The best substitution model obtained for each order-level alignment was the generalized time reversible (GTR, Tavaré 1986). Nineteen species from MCC trees (Online Resources 3–5) had populations distributed on both east and west of Hudson Bay. Of these, only seven exhibited both geographic and genetic structure (Fig. 3): Ephemeroptera (Acentrella lapponica, Ameletus inopinatus, Baetis tricaudatus and Ephemerella aurivillii), Plecoptera (Skwala compacta), and Trichoptera (Ceratopsyche alternans and Onocosmoecus unicolor). Nine other species exhibited marked genetic divergence, but without clear geographic structure: Ephemeroptera (Heptagenia pulla, Epeorus vitreus, Leptophlebia cupida, Baetis hudsonicus, Baetis brunneicolor) (ESM 3), Plecoptera (Nemoura arctica, Isoperla transmarina) (ESM 4), and Trichoptera (Limnephilus picturatus) (ESM 5). Finally, three other widely distributed species—all trichopterans— exhibited no clear genetic or geographic structure (Asynarchus lapponica, Anabolia bimaculata, Grensia praeterita) (ESM 5).

Fig. 3
figure 3

Bayesian maximum clade credibility trees of seven EPT species present with divergence eastern and western populations in northern Canada. Numbers above branches indicate the posterior probability (clade support). Numbers below branches indicate the genetic distance (in substitutions per site) between populations. Numbers within triangles indicate individuals. White triangles represent collapsed lineages of eastern populations, while gray triangles those of western populations

The best substitution models for each of the species-level alignments were: the GTR model for A. inopinatus (–logL 1071.1 SE 0.102), C. alternans (–logL 869.4 SE 0.106), E. aurivillii (–logL 975.3 SE 0.084), and O. unicolor (–logL 1141.1 SE 0.1639); and the HKY (Hasegawa et al. 1985) model for B. tricaudatus (–logL 1119.5 SE 0.081), A. lapponica (–logL 887.5 SE 0.091) and S. compacta (–logL 1004.7 SE 0.077). Posterior probabilities (clade support) for each subpopulation (i.e., east versus west) ranged from 0.94 to 1.0 except for two species: Ameletus inopinatus and Ephemerella aurivillii. In these cases, western subpopulations had values of 0.62 and 0.64, respectively, for each species (Fig. 3). The interpopulation divergence percentage, measured from branch edges (with length in substitutions per site) connecting both subpopulations, was 0.4% for E. aurivillii, 0.6% for Ceratopsyche alternans, 1.18% for A. lapponica, 1.38% for A. inopinatus, 2.65% for B. tricaudatus, 2.82% for S. compacta, and 8.66% for O. unicolor.

The posterior mean of the TMRCAs of eastern and western populations ranged between 0.23 and 2.41 Ma (Fig. 4). The maximum percentage of TMRCA overlap was between C. alternans and E. aurivillii, with almost 76% (Table 1). Most species pair combinations overlapped more than 10%, except for C. alternans and B. tricaudatus, which overlapped less than 9%. Onocosmoecus unicolor had the least TMRCA overlap with any other species; no more than 5% overlap (Table 1). Permutation tests for each order-level tree indicate that distributions are significantly different from random assignments (Ephemeroptera P = 0.0369, Plecoptera P = 1e−04, and Trichoptera P = 0.0425).

Fig. 4
figure 4

Time to the most recent common ancestor (TMRCA) of eastern and western populations of seven EPT species widely distributed in northern Canada. Species populations show divergence times during the Pleistocene within a close range except for Onocosmoecus unicolor. A total of 10,000 iterations were performed to account for the effect of randomness in the observed distribution of subpopulations of all terminals into “East” and “West.” Box plot and black circles represent 95 and 5% of the iteration results, respectively

Table 1 Comparison of time to the most common recent ancestor (TMRCA) for species pairs

Discussion

The vastness of northern Canada, in combination with the difficulties of conducting fieldwork in remote locations, has presented a great challenge for revealing patterns of aquatic insect biodiversity and community structure in that region. Moreover, the taxonomy of most aquatic insects is based on adults (especially males), which are typically short lived and difficult to collect at northern latitudes. In contrast, the immature stages of aquatic insects are relatively easy to collect, but are difficult or in most cases impossible to identify to species-level using available keys.

The biodiversity of northern aquatic insects was relatively poorly known before the advent of molecular taxonomic techniques. For example, Danks (1981) reported a total of just 38 species of EPTs distributed across the entirety of North American’s Arctic zone from Alaska to Greenland. This represented the sum total of knowledge up to that time, although relatively few additional records were published through the end of the century. In contrast, Zhou et al. (2009, 2010), based on collections made during a 6-year period, reported 112 species from just a single site in the subarctic ecoclimatic zone of northern Manitoba. Although subarctic environments are expected to support higher diversity than arctic ones, it is nonetheless surprising to find a nearly threefold increase in species richness from just a single subarctic site—albeit, one that borders the northernmost margin of the Boreal ecoclimatic zone. In our broader-scale study conducted over a much shorter timeframe (i.e., only 4 months of sampling over two years), we recovered a total of 88 EPT species from four arctic sites and four subarctic sites (Fig. 2)—exceptional diversity given the limited sampling effort. Indeed, this figure represents well over twice the historical benchmark for arctic EPTs developed over the previous century, reflecting, in part, the higher ratio of cryptic diversity revealed by DNA barcodes relative to traditional taxonomy (Hebert et al. 2004; Witt et al. 2006). When comparing our data with the longer-term DNA barcoding studies of Zhou et al. (2009, 2010), we find about 35% more species of Ephemeroptera and about four times more species of Plecoptera in our study. The differences in plecopteran diversity might reflect order-specific habitat preferences, as most plecopteran species were found in more northerly locations (data not shown, Cordero et al. in prep.). In addition, our study shows higher diversity evenness across orders compared to Zhou et al. (2009, 2010). This may reflect ecoregional differences at a broader spatial scale compared to essentially a single location—Churchill, Manitoba.

Special comment is warranted about the relatively low success rate of obtaining sequences from particular specimens. While we obtained sequences for all the adult specimens sampled, we had a considerably lower success rate for larvae. This difference is perhaps related to the fact that immature EPTs were collected en masse, sometimes with hundreds of specimens preserved during a particular collecting event. Although an effort was made to frequently change or refresh alcohol during the initial fixation period, we suspect these efforts failed when too many specimens were included in a single collecting jar. The problem was especially acute for the Trichoptera, where sequences were obtained from less than half the specimens sampled from Moosonee and Norman Wells. Large sample sizes from those locations no doubt contributed to the problem; however, the situation was compounded in case-dwelling larvae—perhaps because they were more resistant to preservation when they remained within their retreats during fixation. Our limited success of obtaining sequences from larvae highlights the importance of balancing the conflicting objectives of collecting the maximum number of specimens while conserving ethanol, which is difficult to obtain and transport in remote northern locations. Nonetheless, even with the relatively low success rate, we are confident that—with up to 10 specimens sampled for each morphospecies—the great majority of EPT species present at each site were successfully identified.

One of the promises of DNA barcoding is that it integrates multiple biological disciplines, such as taxonomy, biological databasing, ecology, systematics, phylogenetics, and phylogeography (Monaghan et al. 2009). Studies that have succeeded in such endeavors have not only characterized diversity in exceptional detail for some taxa (e.g., Smith et al. 2008) but have also complemented important aspects about their natural and evolutionary histories. DNA barcoding often yields high levels of cryptic diversity (Hebert et al. 2004; Smith et al. 2008; Velonà et al. 2015). Cryptic diversity is often explained by imperceptible morphological, ecological, or behavioral differences among closely related lineages (Bickford et al. 2007). We find, based on the order-level phylogenetic trees (ESM 3–5), 22 species of EPTs with COI divergence >2.5% that are not represented in GenBank or BOLD databases. Hence, these results suggest that as much as 14% of the diversity we recovered represents hitherto unrecognized diversity, yet to be examined by taxonomy experts. A striking example is seen in the trichopteran Onocosmoecus unicolor, where a COI difference of >8% was found between eastern and western populations (Fig. 3) and a TMRCA > 2 Myr. This particular caddisfly has had a long and checkered taxonomic history, with at least six synonyms currently recognized (Wiggins 1996; Rasmussen and Morse 2014). Given the amount of morphological variation within O. unicolor as currently defined (Wiggins and Richardson 1986), it seems likely that at least two valid species are currently included under that name.

The Pleistocene glacial period is a well-known speciation driver, which can lead to cryptic diversity through the generation of incipient species after recurrent periods of isolation (Hewitt 2000, 2004; Knowles 2001; Hawlitschek et al. 2012; Sánchez-Ramírez et al. 2015). A possible example of such cryptic diversity can be seen in Isoperla transmarina (see ESM 4), wherein divergence data suggests the present of at least two separate species under that name. During the last glacial maximum (LGM), present-day arctic and subarctic regions in Canada were covered by massive continental ice sheets (Pielou 1991; Clark and Mix 2000). This implies that these communities likely persisted in refugia, later recolonizing northern territories as climatic conditions became favorable (Bernatchez and Wilson 1998; Keppel et al. 2012). Our coalescent-based population-level exploration revealed seven species with strong east–west genetic structure (Fig. 3). Furthermore, molecular clock analyses indicate that these geographic populations diverged during the Pleistocene Epoch (Fig. 4), probably as result of isolation in eastern (possibly the Appalachian Mountains and the Atlantic coastal plains) and western (most likely Beringia) refugia (Hopkins 1967). Besides molecular clock TMRCA estimates, we find a substantial degree of temporal overlap (Table 1) among the divergence times of these populations. The synchronicity in divergence times might be viewed as an additional indicator of the effect of an extrinsic factor on population divergence, such as Pleistocene glacial dynamics. Furthermore, we detected 12 other species displaying strong genetic divergence (e.g., high clade posterior probabilities, see ESM 3–5), but with no east–west geographic structure. This pattern could be the result of post-glacial dispersal.

In this study, we show that a rapidly conducted biological survey using a DNA barcoding approach has a great utility for assessing biodiversity in remote and difficult to access territories such as northern Canada. DNA barcoding studies executed over a broad-geographic area not only aids biodiversity discovery, but also helps reveal how EPT biodiversity originated and is maintained in northern Canada. Further, as many aquatic insects are difficult to identify, particularly in the aquatic stage, a combined barcoding-phylogeographic approach can serve as a powerful tool to gain preliminary evolutionary and ecological insights that can be tested using additional data (e.g., multiple genes, more localities) and emerging analytical tools.