Introduction

The petroleum industry in China has grown rapidly with the increasing population and growing demand for raw oil as an energy source. The resulting exploration, production, transportation, and storage of raw oil have introduced extensive oil contamination to terrestrial ecosystems throughout China. Therefore, there is an urgent need to manage and remediate these oil-contaminated ecosystems. But, traditional mitigation technologies such as chemical and physical treatments are difficult to apply due to the inaccessibility and high volumes of the released oils. A greener and more sustainable method is to utilize indigenous microorganisms that are capable of oil degradation. Oil-degrading microorganisms are ubiquitously distributed in the environment, and successful bioremediation of terrestrial and marine oil spills has been reported in a number of studies (Atlas 1995; Brandt et al. 2002; Hazen et al. 2010; Kostka et al. 2011; Pritchard and Costa 1991; Song et al. 1990; Sun and Cupples 2012; Sun et al. 2010; Swannell et al. 1996). Although oil bioremediation appears to be promising, a comprehensive analysis of the in situ microbial degradation of petroleum components is still needed. Culture dependent and independent methods have been extensively applied in studies assessing the microbial degradation of petroleum components (Grassia et al. 1996; Gray et al. 2010, 2011; Liang et al. 2010; Orphan et al. 2000; Ren et al. 2011; Viñas et al. 2005). Nevertheless, characterization of indigenous microbial communities is still challenging due to complex in situ physicochemical conditions. Additionally, traditional molecular techniques are often restricted to the analysis of a small number of clones and thus underestimate the indigenous microbial community. The emergence of advanced molecular techniques, especially high-throughput techniques, provides an opportunity to better address microbial diversity in oil-contaminated environments. As an example, GeoChip, a high-throughput functional gene array, was used to evaluate the microbial functional genes involved in oil contaminant degradation (Liang et al. 2010). Pyrosequencing is another high-throughput method that can provide high numbers of DNA reads (Margulies et al. 2005), making it possible to detect microbes of low abundance in the environment. Pyrosequencing has been used to characterize in situ microbial communities in oilfields (Bell et al. 2011; Dos Santos et al. 2011). For instance, microbial communities in production and injection waters from Algerian oilfields were described. Many unclassified bacterial and archaeal sequences have been found in oil-contaminated sites through pyrosequencing (Lenchi et al. 2013). Bacterial communities were also investigated in permafrost soils along the China-Russia crude oil pipeline by pyrosequencing. The results revealed an immensely higher microbial diversity than anticipated (Yang et al. 2012). These pyrosequencing-based studies demonstrated that high-throughput sequencing provide a more complete picture of indigenous microbial communities than traditional molecular techniques.

Drilling and production activities associated with the petroleum industry result in the contamination of nearby surface and groundwater (Kharaka et al. 2005; O'Rourke and Connolly 2003; Osborn et al. 2011). This contamination can include various petroleum hydrocarbons (e.g., benzene, toluene, ethylbenzene, and xylene (BTEX) and polycyclic aromatic hydrocarbons (PAHs)), metals, salts, and other contaminants. The accumulation of substantial amounts of these materials in soil and groundwater can lead to severe environmental, economic, and health consequences (Olsgard and Gray 1995). A comprehensive understanding of the different microbial communities and how their activities can be enhanced is important for optimizing conditions for biodegradation and identifying amendments that can stimulate in situ oil bioremediation. Although diverse bacteria capable of degrading petroleum components have been isolated and characterized, the vast majority of oil-degrading bacteria remain undiscovered due to the limitations of culture-dependent and culture-independent techniques. In this study, we collected samples of oil-contaminated soils from six large oilfields across different geoclimatic regions within China. High-throughput sequencing of 16S ribosomal RNA (rRNA) was utilized to profile indigenous microbial communities in each soil sample. This study aims to provide novel data sets that can be used in conjunction with contamination profiles and geochemical parameters to better understand microbial processes at contaminated sites.

Materials and methods

Site information and sampling strategy

Soil samples were obtained from oil-contaminated sites at six different oilfields (fields in areas where oil is extracted), each located in different geographical regions of China: Daqing (DQ) and Jilin (JL) oilfields in northeast China, Changqing (CQ) and Xinjiang (XJ) oilfields in northwest China, Shengli (DY) oilfield in the Yellow River area in north China, and Jiangsu (JS) oilfield in east China (Fig. 1). These regions vary in climate. DQ (46° 44′ N, 124° 55′ E) and JL (41° 21′ N, 124° 47′ E) both have a temperate continental monsoon climate. XJ (41° 10′ N, 83° 35′ E) has a temperate continental arid climate. CQ (36° 32′ N, 107° 20′ E) has a temperate continental monsoon climate. JS (32° 57′ N, 119° 02′ E) has a subtropical humid monsoon climate, and DY (37° 29′ N, 118° 15′ E) has a warm temperate continental semi-humid monsoon climate. Contaminated soils were collected 2–10 cm beneath the surface, adjacent to crude oil pumping wells in which contamination had occurred for decades. Uncontaminated control soil samples were collected from undisturbed pristine soils. Sampling sites for the same oilfield were located within an area of 0.8 km2 (a circle with a radius of 300 m). The sample site information and sample names are summarized in Table S1.

Fig. 1
figure 1

Map of six large oilfields in China: Daqing (DQ) and Jilin (JL) in the northeast, Xinjiang (XJ) and Changqing (CQ) in the northwest, Dongying (DY) in the north, and Jiangsu (JS) in the east

Chemical analysis

Soil samples from each site were homogenized by thorough mixing and then stored in a refrigerator at 4 °C until further processing. Then, they were air-dried for 48 h and passed through a 2-mm sieve to remove leaves, plant roots, and gravel. Next, 5 g of dry soil was placed into a 150 mL Erlenmeyer flask and mixed with 25 mL of distilled water (1:5 soil/water ratios). The mixture was shaken for 5 min and then left to equilibrate for 20 min. The pH was measured using a calibrated HACH HQ30d pH meter (HACH, Loveland, USA). The supernatant was filtered through a 0.45 μm filter membrane. Soil cation (K+, Ca2+, Na+, NH4 +, and Mg2+) and anion (F, SO4 2−, NO3 , and Cl) concentrations were determined by ion chromatography (DIONEX ICS-1500, Sunnyvale, USA). Soil total organic carbon (TOC) and total nitrogen (TN) were determined by an elemental analyzer (Vario EL/micro cube, Hanau, Germany). Soil moisture was determined by drying a 5 g soil sample at 105 °C for 24 h to achieve a constant weight.

DNA extraction, PCR amplification, and Illumina sequencing

Total genomic DNA was extracted directly from soil samples using a FastDNA® spin kit (MP Bio, Santa Ana, USA) following the manufacturer’s protocol. The DNA concentration was then determined using a Nanodrop ND-2000 UV-Vis spectrophotometer (Thermo Scientific, Wilmington, USA). DNA was stored at −80 °C until further analysis. The primer set F515 (5′-GTGCCAGCMGCCGCGGTAA-3′) and R806 (5′-GGACTACVSGGGTATCTAAT-3′) was designed to amplify the V4 hypervariable region from the total genomic DNA. Sequencing libraries were prepared from 200 ng of the resulting amplicon for each sample. Sequencing libraries were generated using the Illumina Truseq™ DNA Sample Preparation Kit (Illumina, San Diego, USA) following the manufacturer’s recommendations. Agarose gel electrophoresis was performed (120 V, 40 min, 1.5 % agarose gel) to select DNA fragments. After purification through a spin column (QIAGEN, Dusseldorf, Germany), DNA fragments containing adapter molecules on each end were selectively enriched by PCR (10 cycles) using the Illumina PCR Primer Cocktail. 16S rRNA tag-encoded high-throughput sequencing was carried out on the Illumina MiSeq platform by Novogene (Beijing, China). The reads were deposited into the NCBI short reads archive database (SRR1562515). Pairs of reads from the original DNA fragments were merged based on the method described previously (Magoč and Salzberg 2011). Sequencing reads were assigned to each sample according to the unique barcode assigned to each sample during preparation. The sequence were analyzed via the Quantitative Insights Into Microbial Ecology (QIIME) software package and UPARSE pipeline (Caporaso et al. 2010). The reads were first filtered through QIIME quality filters using the default settings for Illumina processing. Then, the UPARSE pipeline was used to assign operational taxonomic units (OTUs) at a minimum cutoff of 97 % similarity. For each OTU, a representative sequence was selected and used to assign taxonomic composition by using the Ribosomal Database Project (RDP) classifier (Wang et al. 2007). The Chao 1, Shannon, and the Simpson indexes were determined for 11 libraries to estimate species richness as described previously (Schloss et al. 2009).

Statistical analysis

Canonical correspondence analysis (CCA) was performed to identify the chemical properties having the most significant influence on microbial communities. Physicochemical parameters with significant correlations were examined by a Monte Carlo permutation. The triplot was generated by CANOCO 4.5 (Biometrics Wageningen, The Netherlands), and the figures were generated by CanoDraw 4.0 (Biometrics Wageningen, The Netherlands). Geochemical parameters and relative abundances of dominant genera and phyla were used for cluster analysis. The “clustsig” package in R was used to perform the cluster analysis. The unweighted pair group method with arithmetic mean (UPGMA) was used to compare the physicochemical parameters from different samples. Bray-Curtis was used to compare the dominant genera from different samples. The open source software Cytoscape 2.816 was employed to visualize the 38 most abundant genera based on their relative abundances.

Results

Physicochemical characterization of the samples

The geological characteristics of the sampling sites are reported in Table S1. The physicochemical properties of the soil are reported in Table 1. A number of environmental parameters were measured in all of the soil samples, including the cation and anion concentrations. In addition, pH, moisture, TOC, and TN were also measured in all of the samples. Most of the samples were alkaline, with pHs greater than 7, except one sample from the JS oilfield (JS4, pH = 6.01). All of the samples from the XJ oilfield contained high concentrations of Cl−1, SO4 2−, and Na+. Two out of four samples from the DY oilfield contained high concentrations of Cl−1 and Na+. TOC concentrations varied across different oil-contaminated samples. For instance, TOC was high in all contaminated samples from the CQ, JS, and DY oilfields but moderate in those from the DQ, XJ, and JL oilfields. This difference may be attributed to the presence of volatile contaminants in these soils. It is notable that TOC in the unpolluted control samples of the CQ, JS, and DY oilfields was much lower than their contaminated counterparts, suggesting that TOC could be used as an indicator of oil pollutant levels. The soil geochemical profiles were grouped based on UPGMA cluster analysis. Specifically, TOC, an indicator of the oil pollutant level, along with a suite of other important geochemical parameters that could shape the microbial community composition (e.g., soil moisture, sulfate, and nitrate) were used for cluster analysis. The geographic location may have been a factor for grouping (Fig. 2). Meanwhile, unpolluted control soil samples were distantly related to their contaminated counterparts, indicating that the oil pollutant levels might be another factor for grouping.

Table 1 Physicochemical properties of the oil-contaminated soils from six Chinese oilfields
Fig. 2
figure 2

Cluster analyses showing the comparison between the geochemical parameters of the sampling sites

Diversity of microbial communities in oil-contaminated sites

The Illumina MiSeq platform generated 340,042 quality sequences from 24 soil samples. The RDP classifier was used to assign these sequences to different OTUs using a 3 % nucleotide cutoff. The number of OTUs and the Chao 1, Shannon, and Simpson indexes are summarized in Table 2. On the basis of the OTU number, one soil sample from the DY oilfields (DY2) had the richest diversity, followed by three samples from the DQ oilfields. Samples from the CQ oilfield exhibited relatively less richness with three of these having the lowest OTU numbers. The OTU numbers ranged from 949 (CQ2) to 2516 (DY2) in different samples. The Chao 1, Shannon, and Simpson indexes of diversity were also determined to evaluate the biodiversity and phylotype richness in these samples (Table 2). The patterns of the Chao 1, Shannon, and Simpson values were very similar to the OTU numbers. The wide variation in these indexes shows that microbial communities in different oilfields, and even between samples in the same oilfields, were different.

Table 2 OTU number and Chao 1, Shannon, and Simpson indexes in all samples

Taxonomic profiles

The sequences in each sample were assigned to different taxonomic levels (from phylum to genus) based on the RDP classifier. Figure 3 summarizes the relative abundances of the phylum level for each sample. A total of 28 archaeal and bacterial phyla were identified in all samples. No single phylum was dominant across all samples; instead, various phyla dominated in different samples. Actinobacteria (15.7 % of total reads) and Acidobacteria (10.8 % of total reads) were the most predominant phyla in eight samples across different oilfields while Proteobacteria (19.5 % of total reads) was the most predominant phylum in five samples from four oilfields. The other dominant phyla were Firmicutes, Acidobacteria, Verrucomicrobia, Planctomycetes, and Bacteroidetes (Table S2). As reported in a previous study, the 515f/806r primer pair is nearly universal to archaea and bacteria (Walters et al. 2011). In accordance with this observation, the sequences assigned to archaeal phyla were also detected in the current study. Among the archaeal phyla, Euryarchaeota was the most dominant phylum in three samples (JS1, JS2, and CQ4) and was highly enriched in JS3 (11 %). However, Euryarchaeota was found at low levels (<1 %) in other samples. Crenarchaeota occurred at high levels (relative abundance >1 %) in only a small number of samples. The dominance of archaeal phyla in some samples indicated that archaea might play an important ecological role in some oilfields. It is noteworthy that 21 % of the analyzed sequences were not assigned to any known phyla, indicating that the surface soil harbors many not-yet-described bacterial phyla.

Fig. 3
figure 3

Taxonomic classification of the bacterial and archaeal reads retrieved from 24 samples at the phylum level from 16S rRNA Illumina sequencing

Core genera

The dominant phyla were further compared at the genus level to further uncover the differences between these microbial communities. A total of 587 different genera were identified in 24 samples. Different genera showed relatively high abundances (relative abundance >1 %) in different soil samples. Among these enriched genera, several were dominant in some samples, with relative abundances above 10 %. Arthrobacter showed a relative abundance of 22.5 % at site CQ1 and 8.7 % at site CQ5. However, in other samples taken from the same oilfield (CQ3 and CQ4), this genus only accounted for 0.7 and 1.1 %. Dietzia also exhibited very high levels across the CQ oilfield, accounting for 10.7 % at site CQ1 and 53 % at site CQ5. Additionally, archaeal phylotypes were highly abundant in some soil samples. For instance, Halalkalicoccus showed high relative abundances at site JS1 (15.7 %) and JS2 (3.4 %). Interestingly, Halalkalicoccus also demonstrated a high relative abundance at site CQ4 (9.5 %) but not in any other samples from the CQ oilfield. Marinobacter was dominant in two samples taken from different oilfields: JS3 (20.5 %) and XJ12 (20.6 %). Other abundant genera, such as Rhodococcus, Pseudomonas, Mycobacterium, Opitutus, and Sphingomonas, were occasionally dominant in one or more soil samples. The relative abundances of the genera in all of the samples are summarized in Table S3. Highly abundant genera were selected from each sample (a total of 38 genera for the 24 samples) and grouped based on their abundances using Bray-Curtis cluster analysis (Fig. 4). At the genus level, grouping patterns were not related to the geographical location, indicating that different microbial communities may have evolved in adjacent sampling sites. Another observation is that microbial communities of the uncontaminated control samples were only distantly related to their contaminated counterparts. Briefly, five groups were observed based on the cluster analysis.

Fig. 4
figure 4

Cluster analyses of the dominant genera in the different samples based on Bray-Curtis analysis. Different colors indicate groups that are significantly different (p < 0.05)

Further, we also applied a profile clustering network analysis (Fig. 5) to obtain a deeper insight into the differences of the microbial community in the oil-contaminated soils at the genus level. Representative microbial communities were selected from each group based on Bray-Curtis cluster analysis (Fig. 4). These five groups included oil-contaminated soils from all six oilfields. The network analysis generated by a Cytoscape network showed the most abundant 38 genera and highlighted the relative distribution and abundances. Arthrobacter was ubiquitous and was the most abundant genus in many groups, including group 1 (CQ1 and DQ3), group 2 (DQ1, JL1, and DY3), and group 3 (JS4 and JL3). It was also abundant in group 4 (DY1 and XJ7) and group 5 (JS1, CQ4, and XJ12). Dietzia was the most abundant genus in group 1, but it was also abundant in group 3 and group 5. Several genera, including Halalkalicoccus, Natronomonas, Marinobacter, Halosarcina, and Alcanivorax, were more abundant in group 5 but were less abundant in other groups, suggesting that group 5 may harbor unique microbial community.

Fig. 5
figure 5

Profile clustering Cytoscape network visualizing the 38 most abundant genera among the five groups. A comparative node (blue) indicates the size of a node that would represent the relative abundance in a group. For example, standard node 1 shows that this size would represent 1 % of the corresponding genus in a group. Standard node 15 shows that this size would represent 15 % of the corresponding genus in a group

Relationship between the microbial community and the environment

The results of CCA (Fig. 6) showed the relationship between environmental variables and microbial communities. CCA ordination showed that specific geochemical conditions shaped the variations in microbial community composition. The CCA axis 1 and soil moisture had a strong positive correlation, while sulfate had a negative correlation with CCA axis 1. The relatively small magnitude of the TOC, TN, nitrate, and pH vectors indicates that these parameters were not as strongly correlated to community composition as were moisture and sulfate. Some bacteria, notably clustered within the genera Pseudomonas, Rhodococcus, Arthrobacter, Acinetobacter, Alkanindiges, and Opitutus, were positively correlated with TOC. Remarkably, many of these genera contain species that are able to degrade petroleum components (for details, please see “Discussion”). This indicated that TOC may favor the growth of petroleum-degrading bacteria. Many archaea including Halalkalicoccus, Natronomonas, Halosarcina, and Natronococcus were positively correlated with soil moisture, suggesting that these archaea may favor environments with more soil moisture.

Fig. 6
figure 6

Ordination diagrams from canonical correspondence analysis (CCA) of dominant genera abundances and geochemical parameters. Response variables of overall microbial community (a) and specific phylotypes (b). Red arrows indicate the direction and magnitude of geochemical parameters associated with bacterial community structures. Each sample was represented by colored circles according to oilfields. Environmental abbreviations: TN total nitrogen, TOC total organic carbon. Bacteria abbreviations: Polaromo Polaromonas, Rhodo Rhodococcus, Natronoc Natronococcus, Acineto Acinetobacter, Alkan Alkanindiges, Sphin Sphingomonas, Arthro Arthrobacter, Haloterr Haloterrigena, Alcan Alcanivorax, Pseudo Pseudomonas, Natronom Natronomonas, Halakal Halalkalicoccus, Halomona Halomonas, Halosarc Halosarcina, Marinoba Marinobacter, Halobac Halobacter

Discussion

Due to oxygen penetration and the presence of sulfate and nitrate, the top soil and subsoil may harbor dynamic redox conditions. This environment may thus be a good model for investigating the response of microbial communities to oil contamination under different terminal electron-accepting conditions. The microbial communities varied significantly between different samples, indicating that the microbial communities may evolve in response to oil contamination and other in situ geochemical profiles. Consistently, the physicochemical conditions across different sites varied significantly, displaying a wide range of ionic concentrations and TOC contents. Such variations inevitably shaped the indigenous microbial communities, leading to the enrichment of those microorganisms best adapted to the geochemical conditions.

Microbial community composition

Our current knowledge of oil-degrading bacteria is still scarce because the vast majority of oil-degrading bacteria remain uncultured. Therefore, we applied high-throughput sequencing as a broad coverage approach to acquire knowledge of these uncultured microbial phylotypes. Relatively high proportions of the sequences in our samples, ranging from 5.1 to 47 % with an average of 20.6 %, could not be classified within any known phylum. In a recent study, a similar proportion (average 21 %) of sequences from diesel-contaminated soils, as derived from pyrosequencing (Sutton et al. 2013), were identified as unclassified bacteria and archaea. This high proportion of unclassified sequences suggested that a vast number of microorganisms in these environments belonged to unrecognized or novel bacterial and archaeal species. The evidence is not yet sufficient to say whether their presence is due to the biogeographical distribution or functional selection by oil contamination.

On the phylum level, a broad variety of different phyla were dominant across the 24 samples. Actinobacteria and Acidobacteria were identified as the most dominant phyla in 16 samples while Proteobacteria was identified as the most dominant phylum in only five soil samples. This observation contradicts many previous investigations in which Proteobacteria were frequently identified as the most abundant phylum in oil-contaminated environments. For instance, Proteobacteria was identified as the most dominant phylum in Algerian oilfield injection waters (Lenchi et al. 2013), an oil-contaminated mangrove (Dos Santos et al. 2011), and permafrost soils along the China-Russia crude oil pipeline (Yang et al. 2012). In this study, Proteobacteria exhibited a relatively low relative abundance in some samples, including sites CQ5, JS1, JS2, JS4, XJ6, and XJ7. In these samples, Proteobacteria accounted for less than 12 % of the total reads. Actinobacteria and Acidobacteria have been detected in high abundance in long-term diesel-contaminated soil (Sutton et al. 2013) and other soils contaminated with aliphatic or aromatic compounds (Allen et al. 2007; Kasai et al. 2005; MacNaughton et al. 1999), suggesting that these phyla contain members important for in situ oil bioremediation.

Occurrence of known oil-degrading bacteria

Analyses based on the genus level classification of the sequences allow for the further verification of functional evolution and identification of potential oil-degrading microorganisms in a community. Most of the dominant genera identified here could be further identified as aerobes, suggesting that oxygen was prevailed in these open microbial systems and facilitated the proliferation of aerobes. Although no microorganisms were isolated in this study, it may be possible to predict their metabolic capabilities from closely related cultivated isolates or uncultured clones.

A wide spectrum of genera was found at relatively high abundances in these oil-contaminated samples. It is fair to hypothesize that the dominance is correlated with oil biodegradation. Specifically, long-term oil contamination may impose a selective pressure on the indigenous microbial communities, favoring the growth of certain oil-degrading microorganisms. Consistent with this hypothesis, many of the highly abundant genera contain known oil-degrading species. Among these enriched genera, members of genus Arthrobacter were detected at high levels (>1 % relative abundance) in eight samples. As demonstrated from the Cytoscape network (Fig. 5), Arthrobacter were ubiquitously distributed in all of the groups. Arthrobacter species were highly enriched (>5 % relative abundance) in three samples taken from three different oilfields (Table S3). The ubiquity of Arthrobacter species across all of the oilfields suggested that these bacteria could be very important for oil degradation in contaminated soils. Arthrobacter spp. have been reported to degrade various hydrocarbons, including aromatic hydrocarbons (Stevenson 1967) and PAHs (Puškárová et al. 2013). Additionally, Arthrobacter spp. were noted for their capability to produce emulsifying agents (Rosenberg et al. 1979) and biosurfactants (Morikawa et al. 1993), which could be used as oil removers. For example, Arthrobacter RAG-1 produced an extracellular nondialyzable emulsifying agent when grown on hexadecane, ethanol, or acetate media (Rosenberg et al. 1979), and Arthrobacter sp. strain MIS38 produces a biosurfactant, termed arthrofactin (Morikawa et al. 1993). Two other phylotypes belonging to the genera Pseudomonas and Rhodococcus showed high relative abundances in many samples. These two genera also contain a number of species that are capable of producing biosurfactants (Bicca et al. 1999; Iqbal et al. 1995; Ivshina et al. 1998; Patel and Desai 1997). The presence of emulsifiers has been shown to enhance hydrocarbon degradation in the environment (Atlas 1993; Atlas and Bartha 1992). These emulsifier-producing bacteria hold great potential for oil remediation: During petroleum bioremediation, microorganisms utilize long and short chain hydrocarbons and numerous aromatic compounds as energy and carbon sources. However, all of these compounds have low solubility in water, making them less accessible to microorganisms. The emulsifying agents produced by these bacteria can improve the solubility of crude oil, making the substrates more accessible and enhancing the biodegradation potential of the crude oil.

Dietzia was at relatively high abundances in many soil samples including two samples from the CQ, one sample from the JS, and all of the samples from the XJ oilfield. Members of Dietzia isolated from other oil-contaminated areas have been shown to degrade hydrocarbons (Alonso-Gutiérrez et al. 2011; Bødtker et al. 2009; Plakunov et al. 2008; von der Weid et al. 2007; Wang et al. 2011a). Interestingly, Dietzia psychralcaliphila sp. were isolated from cold water (6 °C) and identified as psychrophilic hydrocarbon-degrading bacteria (Yumoto et al. 2002), while Dietzia cinnamea was isolated from tropical soil contaminated with crude oil (von der Weid et al. 2007), suggesting that Dietzia can survive under a wide range of temperatures. This could explain why Dietzia spp. were dominant across different geoclimatic regions in this study. Sequences related to Marinobacter accounted for more than 20 % in two soil samples (JS3 and XJ12) but less than 1 % in all other samples. Marinobacter is a widely distributed bacterium that is frequently isolated from costal marine sediments and seawater (Bowman et al. 1997; Pinhassi et al. 1997). Marinobacter species are important hydrocarbon-degrading bacteria that display versatility in their metabolic capabilities. They are capable of using a wide range of carbon sources including petroleum components, especially in marine environments (Al-Awadhi et al. 2007; Harwati et al. 2007; Yakimov et al. 2007). The presence of Marinobacter in the JS oilfield might be attributed to the location of the sampling site, which is close to a coastal area. However, the enrichment of Marinobacter in the XJ oilfield in inland China suggested that Marinobacter may not be limited to marine environments but may also inhabit terrestrial environments. Opitutus species demonstrated high abundances in five samples from three different oilfields. The enrichment of Opitutus is noteworthy because Opitutus spp. were the only obligate anaerobic bacteria widely enriched in this study. Unlike the oil-degrading bacteria mentioned above, Opitutus were seldom correlated with crude oil bioremediation. The only potential link between Opitutus and biodegradation is from a study where Opitutus was identified in a microbial system contaminated by the artificial release of ethanol, benzene, and toluene (Ma et al. 2013). In that study, Opitutus exhibited a relatively high abundance after 10 months of exposure to the spill. The presence and dominance of Opitutus in the current study are ambiguous and need further investigation.

Enrichment of halophilic archaea

An important observation from this study is that a number of archaeal phylotypes were found at high levels in some contaminated soils. Archaeal phylotypes were especially enriched in soil samples taken from the JS oilfield. For instance, 67.9 and 29.3 % of the total reads from JS1 and JS2, respectively, were associated with the phylum Euryarchaeota. However, only 0.4 % of the total reads were associated with Euryarchaeota in uncontaminated samples in this region (JS4), suggesting that oil contamination may stimulate the enrichment of such Euryarchaeota-related phylotypes. In addition, Euryarchaeota has been detected in some other oil-contaminated environments. For instance, Euryarchaeota have been identified as dominant culturable members in low-temperature biodegraded oil reservoir (Grabowski et al. 2005). Euryarchaeota have been detected as the most predominant phylum in oil sands tailing ponds (An et al. 2013). All these observations indicated a potential link between members of Euryarchaeota and oil biodegradation. In this study, Euryarchaeota-related sequences were associated with the genera Halalkalicoccus, Natronomonas, Haloterrigena, and Natrinema. These halophilic archaea occurred at high levels in all contaminated samples of the JS oilfield (JS1, JS2, and JS3) and site CQ4. Methanogenic archaea have been frequently detected in oil-contaminated aquifers and soils (Nilsen and Torsvik 1996; Ren et al. 2011). In this study, however, methanogens were only found in very low abundance in each sample. The most dominant archaeal phylotype was Halalkalicoccus, which are alkaliphilic halophilic archaea, dominating in sites JS1 (15.7 %) and JS2 (3.4 %), and were also found at high levels in one sample from the CQ oilfield (CQ4 (9.5 %)). The dominance of Halalkalicoccus was concomitant with the dominance of other halophilic archaea, such as Natrinema, Natronomonas, and Haloterrigena, suggesting that certain in situ geochemical conditions, probably the alkaline soil in the JS oilfield, may stimulate the enrichment of such halophiles. Halophilic archaea have been linked to hydrocarbon degradation in many studies (Al-Mailem et al. 2010; Bertrand et al. 1990; Tapilatu et al. 2010). For instance, five extreme halophilic archaeal strains related to Haloarcula and Haloferax have been reported to degrade hydrocarbons in an uncontaminated hypersaline pond (Tapilatu et al. 2010). In another study, the halophilic genera Haloferax and Natronomonas were reported to play a role in the natural attenuation of a petroleum-contaminated saline-alkali soil (Wang et al. 2011b). Our observation of the high abundance of halophilic archaea in some of the oilfields provides additional evidence that halophilic archaea may play a central role in the natural attenuation of oxygen-rich petroleum-contaminated soil. Further investigations to explore their metabolic capabilities are underway, including the isolation of these halophilic archaea and stable isotope probing.

Correlation between the environmental parameters and bacterial community

Phylogenetic analyses suggest that microbial community composition is largely affected by environmental parameters but provide little information about the physicochemical conditions that might be responsible. Therefore, CCA was used to explore the relationship between microbial communities and environmental factors. Soil moisture appeared to be one of the most important environmental parameters and varied significantly in samples taken from different oilfields. This variation may be attributed to the different geoclimatic zones in which the oilfields are located. For example, soil samples taken from the XJ and CQ oilfields have comparatively lower soil moisture than other sites because these two oilfields are located in a continental arid and monsoon area, which sees low rainfall and high evaporation rate. Higher soil moisture may increase the bioavailability of certain oil components and other nutrients, which may then favor the enrichment of oil-degrading microorganisms. Notably, a number of archaeal phylotypes were positively correlated with soil moisture, indicating that these phylotypes also prefer more moist environments. Pearson correlation analysis confirmed the correlation between soil moisture and microbial diversity, as shown in Fig. S1. Alpha diversity, measured as number of phylotypes (OTUs at 97 % similarity), was positively correlated with soil moisture (r = 0.6094, P < 0.001). One environmental implication of these results is that the increasing soil moisture in oil-contaminated sites in arid regions could potentially facilitate the bioremediation of oil contaminants.

Sulfate was also strongly linked to microbial community variance. Sulfate can be utilized as an electron acceptor by a number of diversity of anaerobic bacteria such as sulfate reducing bacteria (SRB). In this study, we detected a wide diversity of SRB in sites from different oil-contaminated soils. For example, Desulfosporosinus, Geobacter, and Desulfitobacterium were detected in all of the samples. Although aerobic environments prevailed in the surface soil, some microenvironments may exist in the soil where oxygen is depleted due to consumption by aerobes and other physicochemical processes. This would create suitable habitats for these strictly anaerobic SRBs. In many oil-contaminated environments, biological sulfate reduction contributes to the utilization of approximately 70 % BTEX compounds (Kniemeyer et al. 2007). The sulfate reducing bacteria found in oilfields could use these hydrocarbons and organic acids as electron donors for sulfate reduction. Specifically, Desulfosporosinus, Geobacter, and Desulfitobacterium detected in this study have frequently been linked to the degradation of petroleum components (Fowler et al. 2014; Kunapuli et al. 2010; Sun and Cupples 2012; Sun et al. 2014a, b; Winderl et al. 2008, 2010). Given that a wide diversity of SRB were detected in these oilfields, it follows that sulfate could be used to stimulate the growth of hydrocarbon-degrading microorganisms and increase biodegradation rates. In addition, the relatively small magnitude of the pH, nitrate, TOC, and total nitrogen vectors indicates that these environmental parameters are not as strongly correlated to community composition as sulfate and soil moisture.

This study illustrates how the application of emerging molecular techniques can contribute substantially toward the advancement of our knowledge of indigenous microbial communities in oil-contaminated areas. Many different phylotypes exhibited high abundance in all six oilfields in contrast to uncontaminated soils, indicating that they may be functionally important in the bioremediation of hydrocarbons and the production of biosurfactants in raw oil-contaminated sites. The determination of the correlations between physicochemical conditions and microbial communities indicated that soil moisture and sulfate were the most significant factors that influenced the soils.