1 Introduction

Estuaries are complex coastal ecosystems with constant dynamic fluxes of fresh and marine water, which influences the microalgae communities—important primary producers of intertidal, shallow and subtidal sediments (Underwood and Paterson 1993; Badarudeen et al. 1996). Microalgae, mainly diatoms (class: Bacillariophyceae), ubiquitous in estuaries, are mostly of bilaterally symmetrical—pennate forms, which are usually attach to a substrate (plants, sediments, pebbles, etc.), whereas the radially symmetric centric diatoms are predominantly planktonic (Werner 1977). Coastal environments in general, especially estuaries, characteristically rich in dissolved organic matter, account for most of pennate diatoms, in their mud and sandflats, tidal pools and marshes, etc. The centric diatoms are predominant in open waters, characterized by lower concentrations of dissolved organic matter, and to a lesser extent occur in benthic habitats. Diatoms adhering to substrates (known as benthic diatoms) are a well-known bio-indicator of health of an aquatic ecosystem. Over the past few decades, community ecology of diatoms through environmental monitoring has gained momentum. The spatial distribution and composition of diatom species are mainly influenced by climatic, geological and anthropogenic factors such as land uses in the catchments and varying levels of nutrients (Pan et al. 1996; Townsend and Gell 2005). Diatoms are sensitive to even the slightest changes in its habitat and are aptly being used as bio-indicators of aquatic ecosystems. Community composition of benthic diatoms is an outcome of complex interactions between abiotic and biotic factors (Stevenson 1997). Diatom community structure, ecology and distributional patterns in relation to environmental gradients (Soininen and Eloranta 2004) has already been explored, and the current focus is on its exploration as an ideal biofuel feedstock (Graham et al. 2012) under varying nutritional modes from autotrophic to heterotrophic or mixotrophic. Doing so would help in minimizing the uncertainties associated with cultures under fluctuating qualities of water. Moreover, relating environmental conditions and species composition would help in optimizing the growth strategy for higher lipid productivity by mimicking the actual field conditions.

Many studies are under-way in utilizing diatoms—both planktonic and benthic forms as viable biofuel feedstocks. Biofuel derived from unicellular microalgae are popularly known as third-generation biofuels. About 3000 microalgal strains were isolated, screened and tested for its lipid productivity and potential toward biodiesel production under the Aquatic Species Program (ASP), a pioneering effort of US Department of Energy (DOE), during 1980–1996. The ASP program in its close-out report in 1998, recommended 50 promising strains attributed with higher growth rate, lipid productivity and survival capability under harsh environmental conditions. Sixty per cent of those selected 50 promising microalgal species were diatoms demonstrating its potential as promising third-generation biofuel feedstocks (Hildebrand et al. 2012; Sheehan et al. 1998). Despite the rigorous investigations involving a large number of microalgae for favorable lipid productivity, ASP apprehended likely higher costs compared to conventional fossil fuels. Biofuel research in 2000s on microalgae mainly focused on determining and optimizing algal growth, lipid estimation and characterization of single axenic monocultures of marine and fresh water microalgae (both green microalgae and diatoms) procured from culture banks/repositories or random isolation of a single local strain apart from nutrient parameters optimization under laboratory conditions (Yusuf 2007; Widjaja et al. 2009; Liu et al. 2011; Tang et al. 2011). Efforts on increasing the biomass productivity by reducing the predation or contamination due to pests lead to innovative designs of photobioreactors (PBR’s). Variants of PBR’s designed for improved biomass productivities are now available (Grobbelaar 2009; Tang et al. 2012; Sforza et al. 2012; Huesemann et al. 2013). But the major hindrance in fulfilling the dream of biofuel, a promise to reality, hovers around the critical dependence on obtaining higher algal productivity in a less sophisticated, low-cost sustainable model which would reduce the burden of fixed and operational costs that would eventually improve the economic viability with the technical feasibility of algae-based biofuels. NASA’s Offshore Membrane Enclosures for Growing Algae (OMEGA project 2010–2012), while being innovative, also revived the scope on algae-based biofuel technology by extracting clean energy from microalgae through cultivation in the open Sea using wastewater in flexible plastic sheets which would effectively sequester CO2, treat waste water while producing biofuel (Wiley et al. 2013). Currently, there is a renewed interest in diatom-based biofuels, especially after unraveling the multiple benefits including scope for higher lipid accumulation (Ramachandra et al. 2009; Hildebrand et al. 2012; Levitan et al. 2014; Fu et al. 2015; Vinayak et al. 2015). Many diatoms species are prudent to be promising due to their ability to grow on non-arable lands, ability to sequester CO2, higher nutrient hoarding capabilities, shorter cycling period than higher vascular plants (Hildebrand et al. 2012) and most importantly their nutrient removal efficiency from urban domestic wastewater (Kumar 2008; Marella et al. 2017) or aquaculture discharge water (Venkatesan et al. 2006). Recent researches in this direction (d’Ippolito et al. 2015; Marella et al. 2017) focused on screening and growing diatoms using waste/saline waters or domestic/municipal sewage (Ramachandra et al. 2013; Mahapatra et al. 2014; Ramachandra et al. 2015) and even industrial effluents (Chinnasamy et al. 2010; Abdel-Raouf et al. 2012; Kamyab et al. 2016; Idris et al. 2018). Such screening of microalgae technically called as “phyco-prospecting” (Chu 2017) coupled with phyco-remediation would help in isolating a strain that can withstand extreme environmental conditions (Abomohra et al. 2017) while reducing the burden of using fresh water and expensive synthetic chemicals/fertilizers for growing diatoms at the large scale. Also, for a scaled-up production system running for a good portion of the year, having a strain that is consistently productive under a variety of environmental conditions is far more desirable than strains having higher productivity under optimal conditions for a shorter period of time (Hildebrand et al. 2012).

A recent estimate indicates that out of 72,500 strains of microalgae identified so far, only 44,000 species have been described with characteristics (De Clerck et al. 2013; Chu 2017). This shows the quantum of new species left unexplored. Thus, prioritizing diatom strains for sustainable biofuel production are prioritized based on abundance, productivity with higher resilience to fluctuating environmental conditions. This approach necessitates effective screening mechanism involving ecological distribution studies with habitat mapping under diverse ecological zones with varied environmental conditions. Hence as a preliminary step, field investigations of spatial distribution and community structure of estuarine benthic epipelic diatoms was carried out with respect to varying water quality. The variation in water quality or the local nutrient input in each habitat is heavily influenced by flora and fauna specific to a habitat. The community structure of benthic diatoms with varied levels of organic loadings would provide insights into a diatom species tolerance to different nutrient levels. The species abundance data from ground conditions along with respective lipid profiles would help in the successful design of economically viable bioreactors. A thorough review of literatures pertaining to the lipid content of diatoms grown in nitrogen replete conditions was carried out. Multivariate statistical analyses were performed to understand the functional relationship between varying nutrient levels and environmental parameters like light intensity, salinity and pH recorded under laboratory conditions. The integration of actual field parameters consisting of nutrient and physicochemical parameters to the lipid content was done to determine the decisive environmental variables in the accumulation of lipid. This is a first of its kind study to relate diatom tolerances and its associated nutrient loadings under varied habitat conditions with lipid productivity potential. This could be a less time-consuming screening mechanism for exploiting mixed diatom consortia. especially toward phyco-prospecting using conventional wastewaters as a means of integrated decentralized phyco-remediation and energy production systems. The current study was carried out with the following objectives:

  1. 1.

    To understand the diatom community assemblages and their preferred habitats in relation to hydrological and environmental parameters across different lentic and lotic ecosystems of the Aghanashini estuary.

  2. 2.

    To understand relationship of diatom species composition with the environmental variables through multivariate statistical analyses.

  3. 3.

    To evaluate the biofuel prospects of prioritized algal strains.

2 Materials and methods

2.1 Study area

The Aghanashini (lat. 14°27′53.6″N–14°31′18.8″N, long. 74°29′26.8″E–74°20′54.4″E) is a west flowing river from the central Western Ghats mountain ranges in the Uttara Kannada district of Karnataka state, India. The 121 km long river that has confluence with the Arabian Sea, a pristine and highly productive estuary, through which the high tide moves almost 27 km upstream, more in the driest months of summer (Boominathan et al. 2008). Totally eight lentic and lotic habitats (4 + 4) were chosen for sampling in this estuary based on the floral and faunal assemblages and water quality. Among these sampling locations, bat roosting site (BRS), egret roosting site (ERS), Kagal-oyster shell bed (KOSB) and mudflats (MF) fall under the estuarine lotic habitat, whereas other four sites are lentic with prolonged water retention time despite diurnal tidal variations. The lentic sites are estuarine rice fields (gaznis) and are separated from the estuary by embankments fitted with sluice gates (for adjusting the water levels) and are always in submerged/water-logged state (Fig. 1).

Fig. 1
figure 1

Aghanashini estuary with different habitats considered for the study

The gaznis chosen for study are the ones modified for shrimp farming, a digression from the traditional rice cropping. Mangroves, both natural formations, and planted ones of different ages are widespread all over the intertidal zones. A detailed description of each of the study site with its habitat type and characteristic flora and fauna associated site-wise is given in Table 1.

Table 1 Description of the study site

2.2 Diatom and water sampling

Water and biological samples (triplicates) were collected during pre-monsoon season (March–April 2017). Epipelic diatoms (attached to the sediments) were collected from all the stations, during low tide hours (for easy accessibility), using a thin spatula, from the sediment surface not deeper than 0.5 cm depth, to ensure the sampling of only live and motile diatoms. The samples were immediately fixed with Lugol’s iodine and processed in the laboratory following the standard protocols of (Taylor et al. 2007a, b). Acid digestion and pre-processing of diatom samples were carried out following KMnO4 and hot HCl method (Kelly and Whitton 1995). Species identification was done based on morphological features, following standard identification keys (Van Heurck 1896; Simonsen 1968; Patrick and Reimer 1966; Krammer and Bertolet 1986; Karthick et al. 2013). Species richness and relative abundances across different sampling locations were determined by counting a maximum of 400 valves per sample following (Dares 2004) enumeration protocol. Onsite parameters analyzed during the study were air temperature (AT), water temperature (WT), pH, salinity and dissolved oxygen (DO). In situ parameters of temperature, pH and salinity were measured using probes, whereas DO was measured chemically following Wrinkler’s method (APHA 2005). Nutrient analysis of water samples was carried out in the laboratory. Nitrates (NO3), phosphates (PO43−) and reactive silicates (SiO44−) were analyzed following Standard Methods for Analysis of Water and Waste water (APHA 2005).

2.3 Diatom tolerance, sensitivity and lipid content

For more than four decades now, tolerance and sensitivity of diatoms have been assessed through various diatom indices considering the environmental parameters and its associated diatom community structure. Some noteworthy diatom-based ecological studies (Kelly and Whitton 1995; Pan et al. 1996; Stevenson et al. 1996; Potapova and Charles 2002; Soininen and Eloranta 2004; Weilhoefer and Pan 2006; Potapova and Charles 2007; Chessman et al. 2007; Taylor et al. 2007a, b; Mitbavkar and Anil 2008; D’Costa and Anil 2010; Tan et al. 2014; Breuer et al. 2016; Hausmann et al. 2016; Rath et al. 2018), which provide the details on the tolerance and sensitivity of a diatom species. These data integrated with the lipid content details would provide insights into lipid productivity potential of candidate diatom strains. An effort has been made to compile the information on diatom and environmental parameters with the lipid content (% dry cell weight). Multivariate analyses techniques such as canonical correspondence analysis, agglomerative hierarchical clustering, and regression analysis were performed by considering parameters like nutrients and other physical conditions as independent variables and lipid content as a dependent variable. This would also aid in lipid estimation indirectly, provided the physicochemical parameters and the diatom distribution details of a location are known.

2.4 Statistical analyses

One-way analysis of variance (ANOVA) carried out on diatom species composition and the nutrient (nitrate and silicate) levels showed significant differences in species composition with respect to nutrient levels with spatial variability across the study regions. Turkey’s pair-wise comparison was performed to understand the level of significance of nutrient’s interaction with species richness. Shannon’s diversity and Simpson’s dominance indices were calculated for all the study locations. A multivariate hierarchical clustering analysis was performed to understand the similarities among the stations based on their environmental and biological parameters using Paleontological Statistics software PAST V 3.0 (Hammer et al. 2001). The influence of each environmental factors on species composition recorded from the present study was determined by multivariate ordination approach using canonical correspondence analysis in PAST V 3.0. Agglomerative hierarchical clustering carried out based on results of the present study as well as the literature integrated data was done using R Studio version 1.1.423. Multivariate regression modeling of the lipid content with Nutrients (Nitrates, Phosphates and Silicates), light intensity, salinity and pH was performed using Vegan package in R Studio version 1.1.423. The regression showed significance at p < 0.05 (Cumming 2013; Demirtas 2018) for all critical independent growth variables. Data cleaning for multivariate regression analysis was done by removing the scatter (average ± SD) of physicochemical parameters.

3 Results and discussion

3.1 Hydrological variations of lentic and lotic systems

Estuaries are dynamic ecosystems with diurnal and seasonal variations in hydrological and environmental parameters. However, in situ parameters of temperature and salinity did not reveal a drastic variation as the sampling was done for a season—pre-monsoon in both lentic and lotic systems. The average air temperature across different habitats was 30.85 ± 2.60 °C. Water temperature is an important parameter that influences many abiotic chemical processes like dissolution–precipitation, oxidation–reduction, adsorption–desorption and physiology of biotic community in a habitat (van Aken 2008). It also influences the rate of photosynthesis of an aquatic system (Fatema et al. 2014; Yin 2002). All the eight lentic and lotic habitats had warm waters and high salinity, as sampling was done during summer months of the year. The changes in water temperature are influenced by solar insolation, fresh water influx, evaporation, cooling and mix up with ebb and flow from adjoining neritic waters (Madhu et al. 2007). The average water temperature recorded across the habitats was 31.21 ± 2.61 °C. In comparison, George et al. (2012) and Martin et al. (2011) reported 31–32 °C during March–April at the Tapi estuarine region, Gulf of Khambhat, Indian west coast (Cochin estuary). Godhantaraman (2002) reported similar surface water temperature with a variation range of 1–2 °C during pre-monsoon season at Parangipettai (south–east coast of India). The pH is a master variable that determines many chemical, biological and kinetic processes in natural waters (Millero 1986). The pH of unpolluted natural water bodies can broadly vary between 3 and 11; however, pH of those water bodies is in the range of 5–9, which generally supports diverse assemblage of aquatic species (Alabaster et al. 1984). The pH remained alkaline at all the habitats 7.9 ± 0.09 except for salt marsh sedges (SMS) study station which showed an acidic pH due to hypoxic environment with very low DO level due to dead and decaying organic matter. Satpathy et al. (2010) observed a pH range of 7.7–8.3 along the Kalpakkam estuarine waters of southeast coast of India. Madhu et al. (2007) observed a steady pre-monsoon pH of 7.1 ± 0.14 in Kochi backwaters corroborating well with the present study.

Salinity defines the relative proportions of fresh and saline waters found in different parts of the estuary. Salinity levels of a region are influenced by various factors such as the location of sampling station in the estuary, the daily tides and the volume of fresh water flowing into the estuary. Generally during pre-monsoon, salinity rises due to higher temperatures and increased evaporation rate (Mann 2000; Sumich and Morrissey 2004; Levinton 2017). As the sampling was done during the pre-monsoon season, salinity of both lentic and lotic habitats exhibited similar salinity levels (27.66 ± 1.50 ppt), except saltpan which had hypersaline condition of 58 ppt. Madhu et al. (2007) found salinity of 30 ppt during the pre-monsoon season in Cochin estuary, which matches well with the salinity ranges of the present study.

Dissolved oxygen is a critical parameter that determines the ability of an aquatic system to support aquatic biota. The average DO levels of lotic habitats (MF, KOSB, BRS, ERS) were 7.15 ± 0.18 mg/L, which are slightly higher when compared to lentic habitats. Among the lentic habitats, salt pan (SP) and salt marsh sedges (SMS) had the least DO levels of 2.70 ± 0.09 mg/L due to extreme salinity at the salt pan and hypoxic condition and very low water level at the salt marsh sedges, respectively. Increased salinity is known to reduce the dissolution of oxygen in water thus reducing the DO levels drastically. This could be the reason behind saltpan station exhibiting very low DO levels. The DO values in lotic habitats were higher than the DO values reported in many other Indian estuaries. George et al. (2012) observed a lower level of DO in Tapi estuary, at 5.8 ± 0.9 mg/L, during pre-monsoon season which was attributed to higher levels of domestic and industrial effluents. The DO levels of Aghanashini sites were higher than the DO level recorded in the Tuticorin estuary by Balakrishnan et al. (2017). However, lower values of DO were reported earlier for similar ecosystems in tropical estuaries of India and southeast Asia by Madhu et al. (2007), Martin et al. (2011), Ouyang et al. (2006) and Juahir et al. (2011), which could be attributed to the large influx of domestic and industrial effluents in those estuaries. In comparison, the Aghanashini estuary is more pristine with neither industrial establishments in the vicinity, nor any major township. Estuarine paddy field (APF) and Bargi gazni (BG) sites exhibited moderate DO levels than other two lentic habitats where DO averaged 7.08 ± 1.0 mg/L. This higher level of DO could be due to the inflow of more fresh water, since all the lotic habitat stations were near upstream of the estuary.

Nitrates (NO3–N), the oxidized form of nitrogen, are also an indicator of the level of anthropogenic stress. Nutrients, especially nitrates (NO3 –N) and silicates (SiO44−), showed wider fluctuations in their ranges across different habitats, whereas the phosphate variations were comparatively marginal across habitats (Table 2). The phosphate values varied within a range of 0.21–2.38 mg/L. Mud flats (MF), egret roosting site (ERS) and bat roosting site (BRS) had very high nitrate levels of 8.41 ± 0.30 mg/L, 9.82 ± 0.97 mg/L and 6.72 ± 0.20 mg/L, respectively, due to site-specific litter loadings from mangroves and sedges as well as leaf litter and fecal droppings of faunal species. The mudflats make a prominent intertidal region, constituting the main habitat of Paphia malabarica, the dominant bivalve which is harvested in bulk during pre-monsoon, in addition to six other bivalve species in lesser quantities (Boominathan et al. 2008). These bivalves are filter feeders that feed on phyto- and zooplankton which are higher in these regions, more in the pre-monsoon. Additionally, large flocks of migratory and wintering shorebirds like sea gulls and egrets also visit these habitats. Thus, the organic debris loading from the dead bivalves, phytoplankton and bird droppings and the excretes of the bivalves have contributed to higher nitrates in the mudflats. Bat and egret roosting sites also had higher nitrate levels due to droppings.

Table 2 Physicochemical parameters of different habitats

The level of silicates is a notable determinant of primary productivity, and the main sources of silicates in estuaries are through land-based runoffs during monsoon. Silicate (SiO44−) levels in estuarine and coastal regions are driven by factors such as physical mixing of fresh and saline waters, siltation in the upstream, rock weathering, adsorption from sedimentary particles, and nutrient upwelling that results in chemical interaction with deep clayey sediments and biological fixation by phytoplankton’s, especially by diatoms and silicoflagellates (Richardson et al. 2000; Shah et al. 2008; Prabu et al. 2008; Satpathy et al. 2010). The silicate in all the stations, covering lentic and lotic habitats, ranged from 4.17 to 10.03 mg/L; mudflats exhibited the highest (10.03 ± 0.48 mg/L), whereas in all lotic habitats the silicates averaged 4.2 ± 0.06 mg/L. Lentic habitats of APF and BG showed silicate values averaging 4.4 ± 0.29 mg/L, like BRS and ERS. Salt pan and salt marsh sedges had a silicate concentration of 7.34 ± 0.20 mg/L and 5.92 ± 0.13 mg/L, respectively. When the overall seasonality of silicate levels in other estuaries is considered, Martin et al. (2008), Satpathy et al. (2010), Balakrishnan et al. (2017) reported comparatively lesser silicate values during the pre-monsoon. These nutrient values of lentic and lotic systems in the Aghanashini estuary vary, as it is relatively an isolated system with wide variations in habitat conditions and trophic status having higher nutrient loadings. The physicochemical parameters recorded at different stations are detailed in Table 2.

3.2 Understanding the species dynamics: a prelude to phyco-prospecting

Prioritizing the diatom strains based on their response to fluctuating environmental conditions would entail understanding species at local levels. Choosing a diatom strain or a consortium of strains that are consistently productive across seasons under fluctuating environmental conditions would be more advantageous in realizing sustainable biofuel production using microalgae (Hildebrand et al. 2012). Hence, understanding the species distribution and its tolerance levels would give insights for choosing a diatom strain as a potential biofuel feedstock. A total of 80 different species were recorded from all the habitats with species richness varying from 6 to 29 across sampling sites. The spatial distribution of diatom species of lentic and lotic habitats is depicted in Fig. 2a, b. The similarity of diatom assemblages among the replicates collected from the same stations was high. Salt pan (SP) had the lowest diversity with only six species, while relative abundance was the highest with 16.67% followed by Kagal-oyster shell bed (KOSB) site where species richness and relative abundance were 7 and 14.28%, respectively. Bargi gazni had a species richness of 12 and a relative abundance of 8.33%.

Fig. 2
figure 2

Spatial mapping of diatoms in lentic and lotic habitats

A total of 27 out of 80 species were present at least in minimum two to maximum five different habitats, whereas the remaining 53 species were highly endemic as well as unique to only one of the eight habitats. The 27 most prevalent species could be considered as more of a cosmopolitan kind of species that have adapted with higher resilience in widely varying nutrient and habitat conditions. Amphora salina, Amphora ovalis, Cyclotella meneghiniana, Navicula forcipata, Nitzschia obtusa and Pleurosigma angulatum exhibited cosmopolitan nature with their presence in at least four out of total eight habitats. Species like Bacteriastum cosmosum, Achnanthes oblongella, Amphora cymbifera, Cocconeis pelta, Navicula weissflogii, N. amphisbaena, N. scutelloids, Eunotia pectinalis, Licomophora tincta and Sellaphora americana were confined to only mudflats, with their relative abundances of < 5%. These species could be considered as sensitive to habitat changes, their presence being governed by one or more of the prevailing favorable environmental conditions. Figure 3 reflects composition of species distributed across different habitats.

Fig. 3
figure 3

Graph showing species richness and relative abundance across habitats

3.3 Site-wise variation in species dynamics

Habitat condition plays a vital role in determining the presence or absence of a species or a group of species in a habitat depending on the level of dominance and tolerance to the prevailing environmental conditions. Diversity and dominance are determined using Shannon–Wiener’s Diversity (H′) and Simpson’s dominance (D) indices using the species richness data of the epipelic diatoms collected from sampling locations. Higher diatom diversity was found in mudflats (MF) (H′ = 3.25) followed by bat roosting site (BR) (H′ = 3.13). Least diversity was observed in the salt pan (SP) station (H′ = 1.7) followed by Kagal-oyster shell beds (KOSB) (H′ = 1.82). Salt marsh sedges (SMS), egret roosting site (ERS), abandoned paddy fields (APF) and Bargi gazni (BG) had almost equal diversity ranging between 2.42 and 2.91. Simpson’s dominance index yielded higher values for SP and KOSB station indicating the existence of highly tolerant dominant species. MF and BRS stations had the least dominance index (D = 0.04 and D = 0.05) and rich species diversity (Fig. 4), confirming the inverse relationship of diversity and dominance at the respective sampling locations.

Fig. 4
figure 4

Diversity and dominance indices of diatoms from different habitats

3.4 Influence of nutrients on species presence

The level of significance of nutrient composition in determining the species presence and abundance at a region is evaluated using ANOVA, which indicates F = 6.864 (p < 0.05) and df = 11.83 of nitrate and silicate levels on species richness. Turkey’s pair-wise comparisons on nitrate and silicate levels to species richness showed a higher level of significance (p < 0.01) (Krzywinski and Altman 2013) with stress level of 1% with p = 0.0006 for nitrates and p = 0.001 for silicates. This highlights that diatom species composition depends on the levels of nitrates and silicates at the habitat.

3.5 Understanding the influence of habitat conditions on species distribution

Diatom growth at a location depends on salinity, temperature and pH in addition to nutrients. The similarities among the habitats with respect to nutrient fluxes and species composition are assessed through a multivariate hierarchical clustering analysis. Clustering of study regions was done using unpaired group mean average algorithm (UPGMA) using PAST V 3.0. The study stations were grouped/clustered based on their relative similarity between the environmental conditions and species commonness among the regions. Saltpan having extreme salinity and poor nutrient conditions had shown the least similarity with other stations which is evident from the highest distance index in the generated dendrogram, followed by salt marsh sedges which exhibited hypoxic conditions due to dead and decaying organic matter. The shortness of the length of the arms between Bargi gazni and Kagal-oyster shell bed in the dendrogram highlights the commonness in the environmental parameters and species richness. Hierarchical cluster dendrogram (Fig. 5) illustrates a higher similarity in nutrient ranges and species compositions among bat roosting site and abandoned paddy fields with egret roosting site and mudflats habitats.

Fig. 5
figure 5

Multivariate hierarchical clustering of different habitats

3.6 Quest for tolerant species

Understanding the interactions of habitat and environmental parameters on species composition gains importance in determining the tolerance level of a species for a sustainable future biofuel production system. Tolerance is developed when a species is exposed to prolonged unfavorable conditions, and the species tends to build up resilience to withstand harsh environmental conditions. The combination of environmental variables that have significant influence on the dispersion of the scores of species (Nabout et al. 2006) is chosen through CCA—ordination technique. CCA ordination was done considering the biological parameters obtained from all the eight stations having diverse habitat conditions with the varied environmental parameters to understand the level of influence of each of the environmental parameters and habitat conditions on diatom species composition, and the results are presented in Fig. 6. Axis 1 represents the environmental variables of phosphates and silicates, while air temperature, water temperature and pH gradients are represented by axis 2 (Table 3). The stations Mudflats (MF) and Kagal-oyster shell beds (KOSB) were highly oriented toward axis 1, which demonstrated the influence of phosphate and silicate levels in their corresponding species composition variations in these two lotic habitat stations. Navicula lanceolata, Stauroneis pachycephala, Nitzschia acicularis, Surirella striatula and Melosira lineatus were strongly influenced by the levels of phosphates and silicates in MF and KOSB stations.

Fig. 6
figure 6

CCA triplot showing relationship between environmental variables and diatom species composition (acronyms of the species names given in Appendix)

Table 3 Axis scores of physicochemical variables in CCA ordination (p < 0.05)

The lentic habitats—salt pan (SP) and Bargi gazni (BG) depicted a strong positive correlation with salinity, air and water temperature. The presence of Pleurosigma balticum, Melosira species, Nitzschia sigma and Nitzschia spp. in SP and BG stations were highly influenced by salinity, air and water temperature. The presence of Pleurosigma angulatum in different stations (KOSB, BRS, ERS and BG) was governed by the changes in pH, air and water temperature and moderately influenced by nitrates, DO and salinity. Bat roosting site (a lotic habitat) had a strong negative correlation with air temperature, water temperature and salinity. The diatom species Navicula johnsonii, Synedra ulna, Nitzschia fasciola, Diploneis smithi and Nitzschia obtusa exhibited a strong negative correlation with air and water temperature, pH and salinity. The species composition of Cyclotella operculata, Cyclotella meneghiniana and Amphiprora alata were strongly driven by the variation in nitrate levels at different study stations. The cosmopolitan species Amphora salina, Amphora ovalis, Epithema gibberula and Navicula forcipata are present in more than two sampling stations irrespective of the variations in the nutrients and in situ physicochemical parameters. Pleurosigma delicatulum, Stauroneis sp., Navicula longicephala and Amphiprora paludosa were locally endemic/unique being restricted to abandoned paddy fields (APF), and Raphoneis amphiceros, Nitzschia longissima and Sellaphora bacilloides were unique to egret roosting site (ERS). The light micrographs of processed diatom frustules after acid digestion are given in Fig. 7.

Fig. 7
figure 7

Light micrographs of processed diatom frustules a Diploneis ovalis, b Navicula expansa, c Navicula weissflogii, d Surirella tenera, e Nitzschia obtusa, f Gomphonema sp., g Navicula forcipata, h Achnanthes sp., i Amphiprora sp., j Stauroneis sp., k Cymbella sp., l Cyclotella meneghiniana, m Navicula sp., n Stauroneis sp., o Bacillaria paradoxa, p Cymbella sp., q Epithema gibberula, r Licomophora sp., s Gyrosigma eximum, t Amphora sp., u Navicula cryptocephala, v Gyrosigma Marcum, w Pleurosigma angulatum, x Amphora sp., y Navicula pusila

The diatom species compositions across different habitats with varied nutrient levels reveal preference of tolerant and sensitive species to a particular habitat. The species presence, relative abundances and CCA ordination of species with environmental parameters would aid in estimating the relative abundance and sensitiveness at a given study site. For instance, Amphora salina, Amphora ovalis, Epithema gibberula, Cyclotella meneghiniana, Coscinodiscus subtilis, Nitzschia obtusa, Pleurosigma angulatum, Navicula forcipata, Nitzschia panduriformis and Nitzschia sigma were present at least in four of the eight different habitat stations with > 10% of relative abundances. This highlights the tolerance and versatility of species endowed with resilience to survive in fluctuating and dynamic environmental conditions. Earlier studies on Amphora sp., Nitzschia sp., had reflected heterotrophic abilities. Nitzschia sp. was found to be obligatory heterotrophic in habitats strongly favoring heterotrophic growth on decaying piles of seaweeds with the lower light penetration and higher organic substrate (Linkins 1973). The centric Cyclotella meneghiniana was also known to possess heterotrophic capabilities and was found to be the most dominant in a sewage maturation pond (Schoeman 1972, 1979). Pleurosigma angulatum and Coscinodiscus subtilis, typical marine species, were abundant in more than 4 habitats, with higher relative abundances of 15.4% and 9.4%, respectively. Desrosieres (1969) reported Coscinodiscus sp. as a strong eutrophic indicator. Round (1991) had reported Pleurosigma sp. and Amphora sp. as epipelic diatoms found predominantly on mudflats. Targeting such obligatory heterotrophic and naturally abundant species under rugged environmental conditions for biofuel production would greatly reduce the risk of contamination and yield higher productivity.

On the contrary, Sellaphora Americana, Sellaphora bacilloides, Nitzschia longissima, Raphoneis amphiceros, Navicula longicephala, Spermatogonia sp., and Stauroneis sp., Pleurosigma salinarum and Nitzschia dissipata were found at only one location with the lower relative abundance (< 5%). As these species occurred sparsely and showing insignificant correlation with varying physicochemical parameters, these diatom taxa are sensitive, surviving in only locations with favorable environmental conditions. Insights into the variations in diatom community structure in terms of species abundance, tolerance and sensitivity would aid in selecting a candidate strain or a consortium available indigenously for scale-up toward sustainable biofuel production.

3.7 Multivariate analyses of habitat conditions for estimating lipid productivity potential

Lipid productivity and habitat condition details were also compiled from published literatures (Barclay et al. 2007; Sheehan et al. 1998; Potapova and Charles 2002; Weilhoefer and Pan 2006; De La Peña 2007; Taylor et al. 2007a, b; Besse-Lototskaya et al. 2011; Chen 2012; Delgado et al. 2012; Fore and Grafe 2002; Rimet 2009; Scholz and Liebezeit 2013; Tan et al. 2014; D’Ippolito et al. 2004, 2015; Zhao et al. 2016; Hausmann et al. 2016; Fields and Kociolek 2015; Tan et al. 2017). Agglomerative hierarchical clustering (Fig. 8) was carried out relating environmental conditions to the lipid content, which indicated four distinct diatom species clusters. Species such as Achnanthes sp., Melosira sp., Navicula sp., and Nitzschia sp., forming one dominant cluster (cluster 4) seems to be the potential lipid accumulators. Next, dominant cluster (cluster 2) consists of sensitive species like Synedra sp., Cocconeis sp., Diploneis sp., Gyrosigma sp., some sensitive species of Nitzschia sp., and Navicula sp. which are similar to the current study. Cluster 3 grouped the planktonic marine diatoms of Chaetoceros sp., Thalassiosira sp., Skeletonema sp. and Phaeodactylum tricornutum indicates a clear distinction in its lipid content and other growth characteristics when compared to benthic diatom forms. Cluster 1 is formed by only two species Amphiprora hyalina and Nitzschia dissipata that were characterized by its projected lipid productivity at higher silicate concentrations and less light intensity. Thus, the cluster analysis provided insights toward phyco-prospecting of algal species based on its preference to environmental parameters as well as their lipid content. Multivariate regression of environmental parameters with lipid content gave probable relationship with the significant coefficient of determination (R) at p < 0.05 for nutrients—nitrates (p = 0.036), phosphates (p = 0.042), silicates (p = 0.013) as well as salinity (p = 0.029) and pH (p = 0.027). Correlation with p < 0.01 was observed for temperature (p = 0.004) and light intensity (p = 0.006). The overall p of the linear regression was found to be p = 0.043. The estimated coefficients, standard error, t and p values of each of the independent variables are given in Table 4.

$$Y = - 1.868\left( {X_{1} } \right) + 15.423\left( {X_{2} } \right) - 5.478\left( {X_{3} } \right) + 7.091\left( {X_{4} } \right) - 91.734\left( {X_{5} } \right) + 250.31\left( {X_{6} } \right) + 103.35\left( {X_{7} } \right) - 5372.14$$
(1)

where Y = lipid content in %dcw; X1 = nitrate concentration in mg/L; X2 = phosphate concentration in mg/L; X3 = silicate concentration in mg/L; X4 = light intensity (µmol m−2 S−1); X5 = salinity (ppt); X6 = temperature (°C); X7 = pH.

Fig. 8
figure 8

Agglomerative hierarchical clustering based on critical growth parameters and lipid content

Table 4 Coefficients, SE, t and p values of multivariate regression analysis

Equation 1 helps in determining the lipid content of select diatoms considering the environmental parameters of a location. It is evident from the literatures that only very few species belonging to common diatom genus like Amphora, Amphiprora, Cocconeis, Chaetoceros, Diploneis, Melosira, Pleurosigma, Navicula, Nitzschia, Gyrosigma, Thalassiosira and Skeletonema have been investigated under different stress conditions, while many species are still unexplored. This empirical model (Eq. 1) aids in estimating the lipid indirectly based on the site conditions as well as species type and could be used as screening criteria for candidate strain selection for large-scale biofuel production. The current research unravels the unexplored potential diatoms, especially salt-tolerant marine diatoms which give scope for further research on these indigenous species for extraction of both lipids and value-added products. Thus, the study could be considered as an essential prelude to phyco-prospecting indigenous diatom strains that are capable of accumulating higher lipids even at fluctuating environmental conditions. The statistical analyses revealed a set of tolerant species that can withstand a wide range of nutrient fluctuations, a characteristic feature beneficial for large-scale outdoor cultivation. If such tolerant species were targeted for scale-up studies, either as single strain or a consortium of tolerant strains would certainly avert the risks of contamination due to open air exposure. Moreover, the empirical equation (Eq. 1) generated would aid in the assessment of lipid content that can be used for screening a potential candidate diatom strain without performing expensive laboratory-scale cultivation using synthetic media.

4 Conclusions

Dwindling stock of fossil fuels with the threat of changes in the climate necessitated exploration of renewable, viable and sustainable feedstocks capable of providing self-reliant energy solutions, especially those from algae-based biofuels. However, economic viability has been posing a major challenge in realizing large-scale production of biofuels from microalgae. In view of this, the present research focused on understanding the ecological characteristics and growth patterns of estuarine benthic diatoms under varying loads of nutrients in real-time conditions. Efforts toward understanding the regional dynamics of diatom community structure, their habitat preferences and tolerance toward fluctuating environmental conditions would greatly help in gaining knowledge on species-specific abundances, especially those of benthic diatoms in one or more habitats of a complex natural ecosystem like the tropical estuary of the Aghanashini river, in relation varying levels of nutrient concentrations and other physicochemical parameters. Canonical correspondence analysis results of the present study demarcated tolerant clusters of diatoms over the sensitive ones, that can withstand highly fluctuating environmental conditions prevailed in the lentic and lotic system. Hierarchical clustering revealed highly productive clusters that are capable of accumulating higher lipids under certain environmental conditions over other species. Regression modeling performed to understand the probable lipid productivity potential by integrating physicochemical and nutrient parameter provided an empirical equation relating lipid and other critical factors affecting lipid content of diatoms. This empirical modeling is capable of providing lipid content details right at the sampling stage without cultivating diatoms under laboratory conditions. Thus, the multivariate statistical tools were utilized in the present study to extract valuable leads toward selection of candidate species/consortium of species for large-scale production in future microalgal-based third-generation biofuels production systems.