Keywords

1 Introduction

Machine learning is a fascinating computer science technology that allows computers to work without being explicitly programmed (Mitchell 2006). Machine learning, as the name suggests, allows computers to learn in the same way that humans do, and it is widely used in all aspects of life. This field of artificial intelligence learns like a person and improves its accuracy over time using data and algorithms. The three categories of machine learning methodologies are supervised, semi-supervised, and unsupervised (Grira et al. 2004). Models are used in machine learning technologies to create precise predictions (Liakos et al. 2018). In supervised machine learning, as input data is entered into the model, the weights are adjusted until the model is adequately fitted (Choi et al. 2018). This is done as part of the cross-validation procedure to avoid over fitting or under fitting the model. In supervised learning, neural networks, naive Bayes, linear regression, logistic regression, and random forest are just a few of the approaches used. The purpose of supervised learning, also known as supervised machine learning, is to efficiently classify data or predict outcomes. As input data is entered into the model, the weights are adjusted until the model is adequately fitted. This is done as part of the cross-validation procedure to avoid over fitting or under fitting the model.

Machine learning is used to assess and cluster unlabelled data sets using a method known as unsupervised machine learning (Chegini et al. 2019). These algorithms can detect patterns or groups of data without requiring human input. For camera trap data and conflict analysis, such techniques are employed in picture and pattern recognition. Unsupervised learning techniques include neural networks, k-means clustering, and probabilistic clustering. Semi-supervised learning falls between supervised and unsupervised learning. During training, a smaller, labelled data set guides classification and feature extraction from an unlabelled data set. Semi-supervised learning can be used to overcome the lack of labelled data (or the inability to afford to label enough data) (Chapelle et al. 2009).

It is now easier than ever to produce precise and unbiased predictions regarding the state of the environment. Three events occurred simultaneously to cause this: To begin with, previously lacking information about ecosystems is now easily accessible. Ecology is swiftly transitioning from a period of scarce data to one awash in information as a result of the advent of big data. There has been a major cultural shift in the scientific community in recent years toward making ecological data accessible to the public (Shameer et al. 2021a). These recent methodological discoveries have also enabled us to better combine the ever-increasing volumes of data we are collecting with our understanding of how natural systems work. These improvements in process understanding are critical for good ecological forecasting, as we face a future with no analogue conditions. Finally, the increasing availability of high-performance infrastructure for scientific computing and an increase in processing capacity in general serve as the technological foundation for both of the aforementioned tendencies. The quick uptake of machine learning in ecology can be attributed to these three novelties. While machine learning methods were not widely used (Olden et al. 2008) until recently, the popularity has skyrocketed in the past several years. Given its limited application thus far, deep learning in ecology has only been put to a few select uses.

Soberón (2010) argues that accurate distribution mapping in a sustainable habitat can be achieved by the integration of environmental data with species occurrence data. SDM, also known as ecological niche modelling (ENM) (Peterson 2006), helps in the conservation of less well-known species by resolving range assessment and optimal habitat prediction (Whittaker et al. 2005; Warren and Seifert 2011; Fourcade et al. 2014). With the help of presence data, MaxEnt (Phillips et al. 2006) can mimic the ecological niche of a wide variety of taxa, including flora and fauna (Raman et al. 2020a, b). With insufficient information, the MaxEnt machine language has proven capable of estimating the range, preferred environment, and niche suitability of species (Phillips et al. 2006; Elith et al. 2011). Distribution models will give us detailed information about the habitat, which will help us learn more about the needs and less important parts of the ecosystem that affect a species with a high management plan score. This article aims to provide an overview of prediction models, providing numerous examples of how they are used to anticipate the habitat of lesser-known species and how climate change affects species distribution.

2 MaxEnt Modelling

The MaxEnt software implements the maximum-entropy method, a technique for modelling the spatial distribution of biological organisms (Phillips et al. 2006). MaxEnt uses machine learning to make accurate predictions about whether a certain species will be found in a certain area based only on where that species has been seen before. This method of prediction is gaining popularity since it outperforms similar algorithms in terms of precision. Using environmental factors as backdrop points, the software may be used to determine the greatest entropy in a geographical data set of target species. This is comparable to the concept of increasing the log likelihood of species presence data and removing it from the penalty term, which is akin to the concept of AIC. Each of the used environment variables is assigned a weight based on the amount of complexity it adds. In addition, an empirically derived regularization parameter will be integrated into the weighting. The total of these weights defines how a penalty should be given to the probability to prevent over fitting. The MaxEnt’s regularization parameters are derived from a study conducted by Phillips and Dudik (2008); however, users have the ability to adjust this value, which is advised in the default circumstance.

The best model is the one with the highest entropy under particular conditions. MaxEnt is the model of choice when it comes to extrapolating species distributions with remarkable exactness (Bosso et al. 2018; Soucy et al. 2018; Zhang et al. 2018). Assuming a uniform distribution, the software begins to operate and runs continuous iterations, increasing the chances of finding an appropriate spot (Merow et al. 2013). Logistic output is often used. This is the probability of a species’ binary argument given the environmental variables (Merow et al. 2013). Using logical output, we can discern between appropriateness of different sites. The settings of the regularization multiplier (rm) can be modified to change the models. To change the model’s complexity, several feature types can be utilized, such as linear (L), product (P), quadratic (Q), and hinge (H). A bias grid can also be created by computing the Gaussian kernel density of sample localities while taking into account the possibility of bias in the data. Using a subsampling technique with a number of repeats, a “N” number of iterations can be used to train the models, and a “100-N” number of iterations may be used to test them. The jackknife method can be used to assess the significance of all environmental variables.

3 The Climatic Variables

Any properly assigned variable can be used in MaxEnt habitat suitability modelling. Bioclim variables are a collection of 19 climatic variables from data sets offered by the WorldClim database (worldclim.org). Combining monthly temperature and precipitation values yielded these bioclimatic variables. Bioclimatic factors are frequently employed in species distribution prediction models. The bioclimatic variables are given annual temperature, annual precipitation, or climatic extremes. Table 3.1 provides information about bioclimatic factors. The variables 1 to 4 (BIO1 to BIO4) reflect the annual temperature, while the variable 12 (BIO12) represents the annual precipitation. The climatic variables 5 to 11 (BIO5 to BIO11) display the varying quarterly or monthly temperature extremes. Quarters are a group of 3 months in a year, with four quarters such as cold, warm, wet, and dry. BIO13 and BIO14 provide the precipitation data for 2 severe months, while BIO15 is the coefficient of variance for this data (precipitation seasonality). Precipitation data for the four quarters is represented by the numbers 16 to 19 (BIO16, BIO17, BIO18, and BIO19).

Table 3.1 Bioclimatic variables used for modelling (See Bioclimatic variables—WorldClim 1 documentation for more details)

4 Climate Change and Habitat Suitability

Numerous websites provide future climate statistics based on the three Representative Concentration Pathways (RCP). WorldClim, CHELSA, CliMond, ecoClimate, ENVIREM, and MERRAclim are the most important data-supply databases. RCP describes the possible future climate based on the Intergovernmental Panel on Climate Change’s (IPCC) greenhouse gas emission scenarios. For the prediction and analysis of climate change, four basic paths are frequently used. They had RCP values of 2.6, 4.5, 6, and 8.5. In RCP 2.6, it is assumed that greenhouse gas emissions will begin to decline by 2020, with carbon dioxide emissions reaching zero by the same year. In RCP 4.5, greenhouse gas emissions peak in 2040 and then fall, whereas in RCP 6, emissions peak in 2080 and then decline. Under RCP 8.5, greenhouse gas emissions will continue throughout the twenty-first century (Sharma et al. 2017). Using these notions, it is possible to model the habitat appropriateness of different species under different RCP scenarios. This will aid comprehension of the alternating distribution ranges of several species. The aforementioned databases can be used to find bioclimatic factors that can be used to make predictions about the distribution of species.

5 Model Appraisal

Area under the receiver operating characteristic curve (AUC) and actual skill statistics are two metrics that can be used to assess models (TSS). AUC is a threshold-independent metric used to evaluate model performance by measuring the model’s ability to distinguish between random and background data. Not all models with a high AUC score have great predictive value (Phillips et al. 2006), and evaluations based only on the AUC score are not accurate. The TSS formula is sensitivity plus specificity equals one, where sensitivity and specificity are evaluated relative to the probability threshold at which they are greatest (Allouche et al. 2006).

6 The Western Ghats and Climate Change

From Gujarat to Goa, Kerala, through Karnataka and Tamil Nadu in India, there is a 1600-kilometer range of mountain chains known as the Western Ghats (WG). The faulted ridges of an elevated plateau make up the WG, which is not a mountain in the traditional sense (Bhat 2017). During the continental drift, when the Indian subcontinent moved close to Reunion Island 120–130 million years ago, the mountain chain was built by volcanic eruptions. Volcanic eruptions contributed to the extinction of many reptiles, including the dinosaurs, during this period. The Southern WG’s 2000-million-year-old rocks provide evidence of domal uplift, which raised the WG. The fauna and geography of Peninsular India were altered by the Eocene alterations (40–45 million years ago) in the region (Karanth 2006). The high rates of uplift resulted in high heights, slopes, and gorges, which served as the cradle of speciation, resulting in the current amount of endemism. This hilly, rolling area with a wide range of landscapes and plants has a big effect on the climate of Peninsular India (Gunnell 1997).

The WG is a delicately diverse environment that hosts a wide range of rare and endangered species, making it a biodiversity hotspot (Cincotta et al. 2000; Myers et al. 2000; Shameer et al. 2019). The mountain ranges on the west coast of Peninsular India are unique due to a variety of topography, varied altitudes, different climates, and a variety of habitats. There are humid tropical conditions at lower elevations and a temperate environment with an annual average temperature of 150 °C at higher elevations. Many instances of parapatric and allopatric species have been found in high altitudes where frigid climates are prevalent (Vijayakumar et al. 2016). Deforestation, forest encroachment, infrastructural developments, agricultural expansion, hydroelectric projects, mining, timber logging, and the extraction of forest products are some of the human-induced stresses that the WG faces today (Menon and Bawa 1997; Priti et al. 2016; Sen et al. 2016; Raman et al. 2020a, b; Shameer et al. 2021a). Tropical montane ecosystems, like WG, are undergoing fast change, but the exact rate and pattern of this change remain a mystery. Variations in the pattern of land use and land cover have a significant impact on the fragile ecosystem’s biodiversity (Sukumar et al. 1995; Menon and Bawa 1997). Many species have already gone extinct due to habitat fragmentation, tourism schemes that are not based on facts, and the expansion of exotic/invasive species. Changing the landscape by removing shola-grasslands and replacing them with exotics has unpredictable consequences.

Animal metabolism and development are directly influenced by changes in CO2 concentration, temperature, or precipitation (Hughes 2000). Due to temperature changes, the reproductive requirements, habitat selection, and feeding strategies of species may also have differential effects, which may represent an extra risk for their survival. According to the Intergovernmental Panel on Climate Change (IPCC), if global temperatures rise by 2–3 degrees Celsius, 20–30% of species will become extinct (Stocker et al. 2013; Warren et al. 2013). Climate change is causing WG’s delicate biological equilibrium to be upended, resulting in an increase in dependent fauna and changes in floral composition (Shukla et al. 2003). According to the vulnerability index (Gopalakrishnan et al. 2011), the WG is more vulnerable to climate change than the northeastern forests. Local variety may suffer as a result of climate change’s negative influence on water supplies (Wagner and Weitzman 2015). Changes in the trophic structure have been observed in locations that are particularly vulnerable to climate change. Because of the altered climate, non-native and invasive organisms have an advantage over their native counterparts (Hellmann et al. 2008). Because of the rising impact of humans and the introduction of invasive species, tropical montane ecosystems like WG host many threatened taxa with a restricted distribution that are vulnerable to local extinction (Arasumani et al. 2019). Invasive species (flora and fauna) can spread rapidly in a changing environment because they are able to take advantage of new niches. The successful invaders are projected to be species that are phenologically flexible and occupy the temporal niche of the indigenous species (Moran and Alexander 2014). An alien flora has the same features as native flora and disperses in the same manner. External variables like climate change play a significant role in reshaping the trophic system. Sukumar et al. (1995) found that fragile mountain ecosystems are especially at risk from climate change because of their complicated topography and biogeographic history.

7 Habitat Suitability Model of an Endemic Mammal

We were able to model the ideal habitat for the Western Ghats’ endemic brown palm civet using the MaxEnt (Shameer et al. 2021b). Prediction models are important for reviewing or creating data for a less-known species since they understand the target species’ core niche. The brown palm civet, an endemic species, is difficult to monitor because of its nocturnal habits and elusive nature (Mudappa 2006; Patou et al. 2010). A thorough understanding of a species’ natural habitat helps researchers and conservationists plan suitable actions and undertake extensive monitoring. It has been suggested that the brown palm civet lives at elevations ranging from 500 to 1300 m above sea level (Rajamani et al. 2002). The Western Ghats’ brown palm civet has only been studied in terms of its occurrence, diet, pelage variation, and taxonomy (Pocock 1933; Hutton 1949; Schreiber 1989; Ramachandran 1990; Ashraf et al. 1993; Ganesh 1997; Rajamani et al. 2002; Mudappa et al. 2010). It is not enough to know about a species’ natural history and biology to devise an effective conservation plan. An in-depth understanding of the species’ range and ideal habitat is even more important (Papeş and Gaubert 2007). Please refer to Fig. 3.1 (adapted from Shameer et al. 2021b) for a visual representation of the predicted habitat areas. According to our research, the brown palm civet was previously widespread in the Western Ghats but is now confined to just four isolated blocks. The brown palm civet’s habitat was broken up by the destruction of dense rainforest, which was caused by a lot of human activity.

Fig. 3.1
A map of Western Ghats is color-coded based on the habitat suitability index. The 4 segments are labeled a to d for different regions, along with their enlarged view on the right.

Predicted SDM of endemic brown palm civet in Western Ghats. (a) The Nilgiri region; (b) Anamalai region; (c) Periyar and adjoined reserves; (d) Agasthyamalai region

8 Consequences of Climate Change on Endemic Animals

Many species’ distributions, abundances, and life cycles are directly impacted by climate change as a result of global warming (Thuiller et al. 2006). In order to protect biodiversity for the future, planners and politicians must pay direct attention to the impact of climate change around the globe (Pacifici et al. 2017). Climate change is expected to have a considerable impact on species’ geographic ranges, resulting in a decrease in their abundance (Warren et al. 2013). An ecological process in which climatic variables influence species niches at the spatiotemporal scale is of interest in a long-term study (MacFadyen et al. 2018). As a result, climate variables that influence species abundance and distribution can be predicted based on their response to current climatic conditions. Climate change has been implicated in numerous studies around the world, which have found that species’ geographic ranges are shrinking as a result (Walther et al. 2002; Hickling et al. 2006; Priti et al. 2016; Bhattacharyya et al. 2019). Because of their diverse ecological patterns and processes, high-altitude ecosystems, also known as “sky islands,” are particularly vulnerable to climate change (Raman et al. 2020b).

An essential role in the study of lesser-known species and the geographical simulation of the prospective effects of future environmental circumstances on various species has been played by ecological niche modelling (ENM) (Guisan and Zimmermann 2000). It is a very climatic-dependent species with a special geographic affinity for sky islands; it is a data-deficient high-altitude species. Two endangered species, the brown mongoose and Salim Ali’s fruit bat, were modelled to examine the effects of various levels of greenhouse gas emissions. Brown mongoose and other related species would experience considerable shifts in range due to climate change. The brown mongoose’s estimated range map is provided in Fig. 3.2 (adopted from Raman et al. 2020b), as is Salim Ali’s fruit bat’s expected range map in Fig. 3.3 (adopted from Raman et al. 2020a). The brown mongoose’s range will be significantly affected by climate change in the changing climatic circumstances, according to the findings. The expected shift in Salim Ali’s fruit bat’s trophic composition indicates that the WG’s floral composition is shifting, and this shift is reflected in the change predicted for this species. Because of this, we expect the floral and faunal composition of WG may change as a result of the shifting climatic condition.

Fig. 3.2
A series of 3 maps depicts the habitat change of brown mongoose in the Western Ghats. The plots are labeled Representative Concentration Pathway 4.5, 6, and 8.5.

Predicted habitat change of endemic brown mongoose in Western Ghats

Fig. 3.3
Three maps showcase the least potential habitat change of the Salim Ali's fruit bat in the Western Ghats, for R C P 4.5, 6, and 8.5, as compared to the other habitats.

Predicted habitat change of endemic Salim Ali’s fruit bat in Western Ghats

9 Paleoclimatic Model and Allopatric Speciation

The shifts in endemic species’ geographic distribution that have led to the current patterns can be traced back to the quaternary climate change (Hewitt 2000; Hewitt and Griggs 2004; Bose 2016; Ray et al. 2018). Climate change has resulted in a shrinking of existing ranges, culminating in the creation of new species from isolated meta-populations (Hewitt and Griggs 2004; Provan and Bennett 2008; Stewart et al. 2010; Bose 2016). During the Eocene, the Indian plate was migrating, and this is when the Dravidogecko evolved and became a distinct species (Chaitanya et al. 2018). Many species arrived in the WG in the late Miocene to early Quaternary period when the paleoclimate was suitable, according to Gupta (2010). Insights gained from phylogeography and paleoniche modelling studies (Robin et al. 2010; Ray et al. 2018) shed light on life during and after the ice age, and the WG’s constant precipitation made these locations ideal for human settlement and expansion. Based on paleoclimate theory, the dry glacial epoch may have led to species diversification. WG species diversification and its causes have been hypothesized using data collected from mountain ranges (Robin et al. 2015). During the glacial and interglacial periods, temperature changes affected forests and grasslands in the high mountains. Species may grow during the ice age and shrink during the interglacial period, with the former being more likely. By simulating historical, present-day, and future climate scenarios on the distribution patterns of old endemic reptile genera like Dravidogecko (Fig. 3.4), we were able to test this notion. The Nilgiris (Western Ghats) were the focus of our 1-year survey, which covered 58 diverse sites. For the distribution model, we employed environmental variables such as diurnal range, isothermality, and altitude. Modelling of past climate suggests that species currently found in the Southern Western Ghats will have existed throughout the WG during Pleistocene times. Foreseeing a new species from the Western Ghats, we combined our findings with DNA analysis (Fig. 3.5).

Fig. 3.4
6 maps of Western Ghats depict the distribution of the species Dravidogecko. a to c showcase the distribution based on elevation. d to f showcase the distribution based on R C P values.

Species distribution model of Dravidogecko in Western Ghats. (a) SDM under Pleistocene climate, (b) SDM under current climate, (c) occurrence points on the elevation map of the WG, (d) SDM under RCP 4.5, (e) SDM under RCP 6, and (f) SDM under RCP 8.5

Fig. 3.5
A phylogeny tree. It follows, Wyanad, Nilgris, Anamalai, Munnar, Meghamalai, and Agasthyamalai. It also presents their species classification and the corresponding S D Ms.

Connecting speciation and SDM and phylogeny

10 Limitations of Research

Species distribution models are superior machine learning techniques for predicting and mapping species’ possible habitats in space and time. Consequently, these methods are now acknowledged as sustainable biodiversity management instruments (Qazi et al. 2022). However, they are not always appropriate and, if their limitations are not acknowledged by decision-makers, can lead to ineffective and costly mitigation and compensation (Carneiro et al. 2016). Robust models that account for species detectability, such as occupancy (MacKenzie et al. 2006) models, require recurrent presence and absence records. MaxEnt solely employs presence data, which has been criticized as a significant restriction. This can be circumvented by giving accurate sample data. In much of the research, the modelling relies on incorrect secondary data, posing the greatest problem and leading to inaccurate predictions. Obtaining presence-absence records for lesser-known, uncommon, or elusive species is frequently difficult for researchers. If this is the case, MaxEnt outperforms occupancy models and generates a valid species distribution map using only presence data. It is important to sample correctly if you want to make accurate predictions about where less-known species live.

11 Future Prospects

Modelling the spread of different species can be greatly assisted by the recent and future breakthroughs in machine learning and artificial intelligence. Species distribution modelling could benefit from including recent developments in ecological theory. Using algorithms that take into account how prey and predators interact, how competition works, and how niches change over time makes species modelling more effective.

12 Conclusion

The addition of environmental data to the occurrence data of species aids in the precise mapping of their distribution in a plausible habitat. Ecological niche modelling (ENM) is a technique that aids in the conservation of lesser-known species by resolving range assessment and preferred habitat prediction. Based on the presence data, MaxEnt predicts the ecological niche of a variety of species, including both plants and animals. MaxEnt’s machine language has demonstrated the ability to estimate the geographic distribution, preferred habitat, and niche compatibility of species with minimal data. Hence this method can be used to identify the potential habitats of lesser-known species and develop long-term conservation plans for these species.