Introduction

Currently, habitat loss and fragmentation are the main drivers of the decline in global biodiversity (Newbold et al. 2015), and their impacts on terrestrial ecosystems are expected to increase even more during this century (Haddad et al. 2015; Wilson et al. 2016), especially in the tropics where the largest biological diversity is concentrated (Collen et al. 2008). In tropical countries, loss of natural habitats is strongly related to changes in land use, especially agricultural activities (Gibbs et al. 2010; Newbold et al. 2015; Lewis et al. 2015). Brazil is considered one of the largest producers of agricultural commodities in the world, and as a megadiverse country, it faces the challenge of reconciling economic growth, largely driven by agriculture, with biodiversity conservation (Lemes et al. 2019).

The highest volumes of agricultural production in Brazil are concentrated in the Cerrado region, which was responsible for 55% of the country's grain harvest in 2019 (Conab 2019). The Cerrado covers 23% of Brazilian territory and is the second largest biome in South America, surpassed in area only by the Amazon (IBGE 2020). In addition, it harbours the tropical savanna with the richest flora in the world (Murphy et al. 2016; Borghetti et al. 2019) and home to more than 13,000 plant species, of which about 5000 are endemic (Flora do Brasil 2020). Despite being considered one of the world's biodiversity hotspots (Myers et al. 2000), the Cerrado is under continuous pressure from deforestation, which has resulted in the loss of approximately 50% of its original vegetation cover (Alencar et al. 2020), posing an increasing threat to its flora (Strassburg et al. 2017; Velazco et al. 2019).

According to the Brazilian National Space Research Institute (INPE 2020), the Cerrado has had 91,800 km2 of its area deforested in the 2008—2018 surpassing by nearly 30,000 km2 the deforested area in the entire Brazilian Amazon (62,300 km2), despite being 50% of its size (http://terrabrasilis.dpi.inpe.br). Recent deforestation rates in the Cerrado can be explained by the expansion of the agricultural frontier, most notably in northern Mato Grosso state, and in a region known as Matopiba, which accounts for the acronyms of four states in the northeastern part of the Cerrado (Maranhão, Tocantins, Piauí and Bahia). In the past few years, Matopiba has been portrayed as the new agricultural frontier of Brazil, given its remarkable potential for large-scale agriculture (e.g., Araújo et al. 2019).

Land conversion in Matopiba has occurred mainly through the replacement of native savannas and grasslands by mechanized agriculture with large-scale production. This process has been facilitated by environmental and topographic conditions favourable to agricultural practices, land availability and fiscal incentives (Bolfe et al. 2016). However, environmental concerns have grown since the Matopiba region also harbours the largest natural remnants of the highly fragmented Cerrado biome (Alencar et al. 2020). Besides important biodiversity repositories and providers of ecosystem services, remnants of native vegetation in Matopiba are vital for the subsistence of local traditional communities (Schmidt and Ticktin 2012; Borges et al. 2016).

A major challenge in conservation planning is the protection of areas that contain as many species as possible (Tjørve 2010). As conservation implies economic and social costs, a viable and efficient strategy involves focusing on areas with greater diversity, rarity and endemism (Cañadas et al. 2014; Enquist et al. 2019). Therefore, the choice of the most appropriate areas for conservation can be optimized with robust information on species occurrence data and the filling of knowledge gaps.

However, biodiversity conservation planning is often hampered by the absence or incomplete and outdated data on species occurrences (Darbyshire et al. 2017). In addition, sampling efforts are generally biased towards regions close to major cities and research centres (e.g., Sousa-Baena et al. 2014). This practice favours the dissemination of misleading estimates on biodiversity data owing to biases introduced through an excessive representation of both well-studied taxonomic groups and regions (McRae et al. 2017).

In the Cerrado, floristic inventories are concentrated in the southern and central regions (Sousa-Baena et al. 2014) with an overrepresentation of surveys focused on tree species occurring in savanna habitats (e.g., Ratter et al. 2003; Françoso et al. 2020). This concentration of biological information in a particular region, life form or habitat creates severe gaps in the knowledge of species distribution, hindering the implementation of adequate species conservation strategies. Botanical surveys in the Matopiba region thus far carried out have been restricted to only a few areas with particular preference for woody plants in savanna and grassland habitats (e.g., Antar and Sano 2019). However, a general compilation of the flora for the region as a whole, encompassing all different habitats and life forms, is still missing. Nevertheless, new approaches based on specimen data deposited in large public biodiversity repositories, where millions of species occurrence data are available online, offer new opportunities for assembling comprehensive floristic datasets (e.g. Maldonado et al. 2015). Such approaches can be an efficient way to obtain adequate estimates of species diversity and distribution for a particular region, including areas of conservation concern experiencing increased rates of habitat loss such as the Matopiba. However, limitations regarding data quality remain a major obstacle when using information from large biodiversity repositories (Colli-Silva et al. 2020).

Given ongoing impacts created by the expansion of large-scale agriculture on natural ecosystems in Matopiba, coupled with the lack of robust biodiversity information in the region, we herein asked about (i) the extent of flora sampling in the Matopiba region, (ii) the number of plant species occurring in Matopiba, and (iii) the number of these species endemic to the region, or listed as threatened. To accomplish these goals, we compiled botanical information on plant occurrence in Matopiba based on online public repositories, in order to provide robust data to support effective conservation strategies. The analysis of the data collected enabled us to identify areas of sampling gaps and produce the first compilation of the angiosperm flora in the region, highlighting species of high conservation value, such as threatened and endemic.

Materials and methods

Study site

In this study, we adopted a delimitation for the Matopiba region that is located between 6º 15'—15º 15' S and 48º 45'—44º 45' W (Fig. 1) and comprised an area of physiographic and climatic homogeneity of about 300.000 km2 within the boundaries of 80 municipalities (Supplementary Material S1). As inclusion criteria for this delimitation, we adopted municipalities from Maranhão, Tocantins, Piauí and Bahia states that have most of their territory included within the boundaries of the Cerrado biome (sensu IBGE 2020) and that also have high potential for agricultural activity, including flat areas with deep soils suitable for the implementation of large-scale agriculture. The definition adopted here differs from a broader delimitation of Matopiba adopted in other studies (e.g., Miranda et al. 2014; Dias et al. 2016). We preferred to restrict the study area towards a more biologically homogeneous region in terms of climate and geology, thus concentrating most areas now facing intensive pressure for land use change (Dias et al. 2016; Parente and Ferreira 2018). Therefore, areas with semi-arid climates in the east, as well as lowlands west of the Tocantins River and close to the Amazon border, are excluded from our study site, avoiding an excessively heterogeneous definition in terms of environmental variability and agricultural potential.

Fig. 1
figure 1

Cerrado biome in Brazil showing the location of the defined study area (DSA) in the Matopiba region (left), which was delimited by the territory of 80 municipalities (see also Supplementary Material S1), and density of occurrence records of angiosperms in the DSA based on 24,312 unique records (right). Each hexagonal grid-cell has an area of 1000 km2, and the colours represent the range of botanical records per grid-cell

The defined study area (hereafter referred as DSA) used here largely coincides with the Cerrado ecoregion known as “Chapadão do São Francisco”, which was recently defined based on biophysical characteristics and encompasses a region with flat terrain in plateaus with altitudes above 500 m (Sano et al. 2019). This area also broadly overlaps with the physiography unit called Espigão Mestre do São Francisco (Cochrane et al. 1985), which is a study area considered in many floristic and biogeographic studies (e.g., Mendonça et al. 2000; Felfili et al. 2001).

The main vegetation types of the DSA are the savannas and grasslands that occur preferably on flat areas on well-drained terrain, while gallery forests and palm swamps (veredas) occur in association with water courses. Patches of seasonally dry tropical forests occur scattered throughout the region, often associated with limestone outcrops. The climate is classified as tropical semi-arid/humid with seasonal (summer) rains, annual rainfall ranging from 1000–1400 mm, and mean annual temperatures ranging from 23–26 °C (IBGE 2020). Predominant soil classes are the dystrophic latosols, as well as sandstone/quartzite-derived soils, which have good permeability, but limited fertility and low pH (Santos et al. 2018). In addition, the DSA includes important water sources that drain into major hydrographic basins, which are often used in irrigation by pivots (Landau et al. 2016).

Data acquisition and analyses

We concentrated our analyses on the angiosperms because they are the most diverse and well-studied group of plants (Magallón and Castillo 2009), as well as a preferred target of biodiversity assessments and conservation planning. In order to build a dataset for the flora of Matopiba, we conducted a search for occurrence records of angiosperms available for the DSA in the SpeciesLink (2021; http://specieslink.net) online herbarium repository, which embraces all major herbaria in Brazil. We also searched the Botanical Collection Management System—JABOT (2021; http://jabot.jbrj.gov.br/v3/consulta.php) for botanical data of the BRBA and IBGE herbaria, which house relevant collections from our study area but are not available in SpeciesLink. All repositories were last accessed in September 2021.

Our searching criteria was based on the 80 municipalities that comprise the DSA (Supplementary Material S1). Multiple searches were performed by entering the name of each municipality in the search operator “municipality” (SpeciesLink) or “locality” (JABOT). This search resulted in the download of 44,948 records of angiosperms from 106 herbaria (SpeciesLink) and 10,158 records from the BRBA and IBGE herbaria (JABOT). We also obtained 434 occurrence records (observation data) from a dataset on savanna woody species (Françoso et al. 2020) containing surveyed sites that fall within the DSA. This initial compilation comprised 55,540 occurrence records.

After compilation, taxonomic names were checked for spelling errors, and synonyms were updated using the R package flora v. 0.3.4 (Carvalho 2020; R Development Core Team 2020), which retrieves data from the Flora do Brasil 2020 platform. Varieties and subspecies were collapsed to species level. The data were filtered to eliminate specimens with doubtful identifications (incomplete or indeterminate) and botanical duplicates (specimens representing a single collection deposited in several herbaria). Duplicates were detected by manually searching for records with the same collecting date, collector number and collector name, and only one record per collection was kept in the dataset. In case of duplicates with different identifications, we kept the specimen determined by specialists, or considered the latest determination. Also, we excluded species that are non-native to the study region, which were identified with the help of the flora package version 0.3.4 (Carvalho 2020). For records lacking original geographic coordinates (c. 34% of all records) we used the centroid of the municipality where the specimen was collected, as provided by SpeciesLink. Although using municipality centroid, which may be some kilometres away from the locality where the specimen was actually collected, could result in less precise estimates of specimen georeferencing, this was not a major issue considering the large geographical scale of our study. Records with imprecise geographic coordinates (i.e., only to degrees or with geographic coordinates not matching collection locality) were manually checked and eventually eliminated. After the cleaning procedures described above, the final dataset contained 24,312 unique records of angiosperms native to the DSA (Supplementary Material S2).

All occurrence records were mapped into a grid consisting of 1000 km2 hexagonal cells superimposed on the study area boundaries, using ArcGis version 10.3.1 (ESRI) software. The density of species occurrence by grid cell was used to access gaps of botanical knowledge (angiosperm records) in the DSA. Species richness was estimated using rarefaction and extrapolation curves as a function of Hill numbers in the order q = 0, according to the method proposed by Chao et al. (2014), as implemented in the iNEXT package, version 2.0.17 (Hsieh et al. 2016; R Development Core Team 2020). The curves were generated with 95% confidence intervals for the DSA, as well as for individual states.

Based on our occurrence dataset, we assembled a list of angiosperm species for the DSA, by choosing one representative specimen, preferably determined by a specialist, to represent each species. Species only represented by specimens determined by non-specialists also occurred in our dataset. Those records were checked individually for their taxonomic accuracy, and specimen’s images were compared with specimens identified by specialists in biodiversity repositories such as SpeciesLink and Flora do Brasil 2020. A total of 352 species determined by non-specialists or without information on the determinator were excluded from our list due to unreliable identifications (Supplementary Material S2). The classification of the families follows the Angiosperm Phylogeny Group IV system (APG IV 2016). Species nomenclature and synonymy, as well as information on life form and geographical distribution, were revised and corrected based on the Brazilian Flora Checklist (Flora do Brasil 2020), using the flora package (Carvalho 2020).

We checked the geographic distribution of each species present in our list in online databases (SpeciesLink and Flora do Brasil 2020) to search for species endemic to the DSA, i.e., species that only occur within the limits of our study area. Some species that occur predominantly in the DSA, but are also recorded along the Serra Geral in Goiás near the border with Bahia, were also included in our list of endemic species. We also searched for the occurrence of threatened species in the DSA based on the Brazilian Red List of the National Centre for Flora Conservation (CNCFlora 2021http://cncflora.jbrj.gov.br/portal/pt-br/listavermelha), which uses threat categories established by the International Union for the Conservation of Nature and Natural Resources (IUCN): Critically Endangered, Endangered and Vulnerable. Subsequently, the endemic and threatened species had their distribution superimposed on the protected areas of the region (http://mapas.mma.gov.br) in order to verify if their occurrence lies within areas protected by law. To access recent botanical discoveries in the region, we also conducted searches in the International Plant Names Index (www.ipni.org) database in order to identify species published in the last five years (2015–2020) that occur within the DSA.

Results

The distribution of angiosperm occurrence records in the study region is very sparse (Fig. 1) with about 39% of grid cells showing less than 10 records. In general, the DSA has 0.08 record/km2, and only three grid cells have a density of > 1 record/km2. Individually, the states with the highest sampling effort are Bahia (0.10 record/km2) and Tocantins (0.08), followed by Maranhão (0.07) and Piauí (0.03).

The estimate of species richness based on the rarefaction curve of occurrence records indicates that the angiosperm flora of the DSA is still underestimated with an extrapolated increase of around 25% in the number of species, if sampling effort is doubled (Fig. 2A). The same applies to the states analysed individually with Piauí showing the lowest sampling effort and estimates of species richness (Fig. 2B).

Fig. 2
figure 2

Rarefaction (continuous lines) and extrapolation (dashed lines) curves representing the number of angiosperm species sampled in the defined study area (DSA) in the Matopiba region (a) and in each state that composes it, individually (b). The 95% confidence intervals (shaded area that accompanies the lines) were obtained using the bootstrap method

We found 2,517 species in the study region, belonging to 796 genera and 139 families (the species list, including species authorities, is presented in Supplementary Material S3). Families with highest species richness were Fabaceae (330), Poaceae (196), Asteraceae (152), Cyperaceae (103), Rubiaceae (101), Euphorbiaceae (85), Malvaceae (84), Malpighiaceae (75), Melastomataceae (67) and Apocynaceae (62). The top ten species-rich genera were Paspalum (53), Chamaecrista (37), Mimosa (36), Rhynchospora (33), Cyperus (22), Eugenia (22), Miconia (22), Byrsonima (21), Croton (21), and Hyptis, Ipomoea and Polygala (all with 20). The ten species with the highest number of occurrences were Qualea parviflora (139), Vochysia gardneri (138), Pouteria ramiflora (134), Connarus suberosus (122), Hirtella ciliata (121), Tachigali subvelutina (121), Exellodendron cordatum (115), Dimorphandra gardneriana (114), Rourea induta (110), and Eugenia punicifolia (109), all trees or large shrubs.

We recorded 54 species endemic to the DSA, of which only 15 are reported to occur in protected areas (Table 1; Supplementary Material S4). We highlight the occurrence of Keratochlaena rigidifolia (Poaceae), a species that belongs to a monospecific genus endemic to the DSA. We also report in our compilation 38 threatened species (Supplementary Material S4; Table 1; Fig. 3) included in the Red List of the National Centre for Flora Conservation, seven of which are critically endangered (CR), 17 endangered (EN) and 14 vulnerable (VU); 20 of these species occur in protected areas. Our survey on the taxonomic literature between 2015 and 2020 found 27 new species collected in the DSA (Supplementary Material S5), of which 23 are considered endemic to the region.

Table 1 Summary information on angiosperm diversity compiled for the defined study area (DSA) in the agricultural frontier of the Matopiba in the Brazilian Cerrado
Fig. 3
figure 3

Endemic and threatened species found in the defined study area (DSA) in the Matopiba region: Annona gardneri (a) Chamaecrista coradinii (b) Ipomoea maranhensis (c) Mandevilla abortiva (d) Mimosa carolina (e) Ouratea acicularis (f) Phyllanthus allemii (g) Qualea hannekesaskiarum (h) and Turnera macrosperma (i). Photo credits: Marcelo Simon (a-g, i) and Maurício Figueira (h). See more details in Supplementary Material S4

Discussion

Owing to its vast stock of mostly unoccupied land, Matopiba appears to be a key region, both in terms of agricultural potential and conservation of biodiversity and natural resources. Therefore, reconciling agricultural and economic activities with preservation of biodiversity is central for the sustainable development of that region. Conservation planning in the DSA should rely, among other factors, on the availability of accurate biological data. However, this is not the case for the DSA since most of its territory is poorly sampled. Overall, the floristic sampling effort reported here (0.08 record/km2) is well below Brazil’s average (0.41 record/km2; Sousa-Baena et al. 2014) and far from what is considered an adequate collection effort in tropical regions (1–3 records/km2; Campbell 1989).

Some of the better-inventoried areas in the DSA (red spots in Fig. 1B) correspond to the municipalities of Barreiras and São Desidério (Bahia), all with more than 1 record/km2. Other intensive botanical surveys carried out in the DSA were associated with the environmental licensing of hydroelectric dams (Estreito Dam, Maranhão/Tocantins; Medeiros et al. 2012), the implementation of large agricultural projects (Balsas, Maranhão; Aquino et al. 2007), and inventories with scientific purposes in western Bahia (Mendonça et al. 2000; Santana et al. 2010; Ribeiro et al. 2020), Maranhão (Conceição and Santos 2014; Silva-Moraes et al. 2019) Tocantins (Proença et al. 2002; Antar and Sano 2019), and Piauí (Castro et al. 1998).

Our compilation revealed that the DSA harbours a considerably rich flora (2,517 species), which represents 20% of the Cerrado angiosperm flora (Flora do Brasil 2020). Although presenting a considerable size, the compilation produced here is comparable to the number of species recorded in the Distrito Federal (3634 species; Flora do Brasil 2020), which is ca. 50 times smaller than that of the DSA, while having the best-known flora in the Cerrado region. Therefore, our dataset certainly underestimates the total number of species in the DSA. With increasing sampling effort, mainly in the gap areas, we estimate that the regional compilation could well surpass 5000 species. We highlight that the list presented here is based on a careful selection of herbarium specimens identified by specialists, and that records with inaccurate/suspicious identifications have been discarded. It is possible, however, that species only represented by old collections lacking precise locality data, which were not retrieved from public online repositories by our search strategy (municipality), may be absent from our list. For example, Goyazia villosa (Gesneriaceae), which was collected only once in 1841, was not retrieved by our initial search, but was later incorporated into our dataset once we occasionally came across some records available online. Limitations involving the use of specimens with imprecise locality data, such as those found in old herbarium collections, have been regarded as one of the major shortfalls in biodiversity documentation (Colli-Silva et al. 2020).

In addition to geographic gaps, we also found that collections are strongly biased towards woody species since all top ten species with the highest number of records in our dataset are trees or large shrubs. However, the rich herb-shrub layer that dominates grasslands and savannas and makes up the largest portion of the Cerrado’s plant diversity (Amaral et al. 2017), besides representing about 80% of the endemic species of the DSA, remains poorly sampled. Therefore, sampling of this component, particularly the endemic species known from a single or few collections (eg. Goyazia villosa), should be intensified and incorporated into broader biodiversity surveys and conservation efforts.

The DSA flora harbours a suite of woody species considered typical of the northeastern part of the Cerrado. The region largely overlaps with the northeastern Cerrado floristic province of Ratter et al. (2003) and also occupies parts of the northeastern and northwestern districts proposed by Françoso et al. (2020). Six out of the ten most frequent species in our database, except for Cenostigma macrophyllum, Eugenia punicifolia, Exellodendron cordatum and Vochysia gardneri, are cited as among the 35 most constant woody species in the northeastern province (Vieira et al. 2019). In relation to the herb-shrub flora, the DSA is contained within phytogeographic region 6 (central-north) of Amaral et al. (2017).

Overall, evidence from previous studies on the woody and herb-shrub flora, together with our results, suggest that DSA is part of a floristic region located in the northeastern periphery of the Cerrado, which is characterized by a considerable fraction of regionally exclusive species associated with a suite of climatic variables (Vieira et al. 2019; Françoso et al. 2020). The geographic location of the DSA facilitates the interchange of floristic elements with adjacent Amazonia (north) and Caatinga (east) biomes, which could have been influenced by vegetation dynamics caused by climatic oscillations during the Quaternary (Vieira et al. 2019). Typical Amazon taxa in the DSA flora can be mainly found in wet forests associated with water courses, while Caatinga elements often occur in seasonally dry forests on limestone outcrops.

Most taxa endemic to the DSA occur in savannas and grasslands on high altitude relief (> 700 m). This corroborates the findings that centres of endemism within the Cerrado are correlated with open habitats in higher altitudes areas, such as plateaus and chapadas (Simon and Proença 2000; Echternacht et al. 2011). In a recent survey on Cerrado endemic flora, Vidal et al. (2019) identified two endemism centres (Northern and Central Bahia western plateau) within the DSA. Curiously, none of the endemic species listed here was mentioned in the Vidal et al. (2019) endemic survey.

Besides endemism, we found 38 threatened species occurring in the DSA, including six endemics (Chamaecrista coradinii, Diplusodon gracilis, Hyptidendron conspersum, Janusia christianeae, Ouratea acicularis and Schultesia irwiniana). However, it is important to mention that only 15% of the 2,517 species recorded in the region were evaluated for the degree of threat by Brazil’s red list authority CNCFlora (2021). Considering the number of species that were not assessed, especially in areas of high anthropic pressure, we can infer that the number of threatened species in the DSA is clearly underestimated. Therefore, we urge that a more comprehensive survey be carried out in the region in order provide sufficiently robust data to allow reliable assessments of species conservation status, particularly for the endemics, in view of the tendency toward habitat loss and fragmentation forecast for the DSA (Barbosa-Silva and Antar 2020). We found that only 28% of endemic species and 53% of threatened species are recorded in protected areas in the region. However, these values might be underestimated since our knowledge of species occurrences within protected areas in the DSA, as well in other Brazilian reserves, is very limited (Oliveira et al. 2017).

Recent botanical surveys associated with taxonomic work have revealed the DSA to be a promising source of new species discoveries with an average of approximately four new species published every year since 2015 (IPNI 2021). This trend is likely to continue as more areas are surveyed and herbarium specimens are made available to specialists. Among the 27 new species that were published in the last six years, only Dioscorea compacta, Schizachyrium vallsii, and Stigmaphyllon occidentale are not endemic to the DSA, suggesting that most species discoveries are expected to be restricted to the region. It is important to note that more intensive surveys in surrounding areas can reveal that some species referred here as endemic actually have a broader geographical distribution, and therefore may no longer be considered endemic to the DSA.

Conclusion

Although it should be supplemented with additional surveys, our compilation represents a relevant reference for the angiosperm flora of Matopiba, containing a reliable dataset of plant occurrence data that can be used in downstream biodiversity analyses. Plant diversity data gathered here should be used in planning conservation actions in the study area, which is under ongoing habitat loss. Expansion of agricultural activities on native vegetation should consider the existence of plant species of high conservation value in the region. Deforestation in areas where populations of endemic and threatened species occur should be avoided in order to minimize the impact of habitat loss on plant populations.

Future studies should focus on inventorying the flora in the gap areas in order to provide better data on composition and distribution of species, especially those that are rare, endemic and threatened and, hence, considered to be of high conservation value. Floristic inventories covering diverse habitats and all life forms should be carried out in protected areas to know how much of DSA’s flora is represented within protected areas.

It is also vital to increase the number of species assessed for their conservation status, especially those reported as endemic, since only a small fraction of species have been evaluated so far. Lastly, it is important to improve support for research institutions and herbaria in order to foster taxonomic and floristic studies in the region, which would result in more robust biodiversity datasets for conservation planning. This is particularly relevant since many new species, which are being found at high rates in the DSA, could be lost before being collected and formally described. We hope that our results will encourage the development of floristic studies in the region, helping to fill gaps in scientific knowledge and subsidize conservation planning in this complex landscape matrix that includes the largest remnants of the Cerrado biodiversity hotspot and an expanding agricultural frontier.