Introduction

Describing all the species on Earth is a task that may never be completed (Scheffers et al. 2012). Yet, estimation of species diversity has been and continues to be of great interest, motivated mainly by the need to provide a reference point for current and future losses of biodiversity (Mora et al. 2011). Assessments on the number of marine species range widely, between 226,000 species at the conservative end (Appeltans et al. 2012) and up to 10 million and more (Frederick Grassle and Maciolek 1992; Costello et al. 2012). A significant factor for this broad range is the presence of taxonomically cryptic species (Bickford et al. 2007; Pfenninger and Schwenk 2007; Nygren 2014), and methods for delimiting them are one of the most pertinent issues in biological science (Sites and Marshall 2003). Taxonomically cryptic species are broadly defined as two or more genetically distinct taxa that are erroneously classified as a single species (Richards et al. 2016; Sheets et al. 2018; Bongaerts et al. 2021). This problem could be due to morphological resemblance despite the genetic divergence (Pfenninger and Schwenk 2007; Fišer et al. 2018). Studies on cryptic species have increased exponentially over the past three decades since genetic techniques started to be applied (Bickford et al. 2007; Nygren 2014). They are evenly distributed among biogeographical regions (Pfenninger and Schwenk 2007) and could be found in many major metazoan and plant groups (Grundt et al. 2006; Pfenninger and Schwenk 2007). In particular, marine environments are known to host numerous such cryptic species groups likely due to their species-rich habitats (Willig et al. 2003). Identification of cryptic species is crucial for biodiversity conservation, management and understanding the evolutionary processes which drive speciation (Chadè et al. 2008; Baird et al. 2011; Andrews et al. 2014). Given that most species remain undescribed, efforts to catalogue and explain biodiversity need to be prioritised (Bickford et al. 2007).

Scleractinian corals (Cnidaria: Anthozoa: Scleractinia) are difficult to classify due to their morphological plasticity (Todd 2008; Paz-García et al. 2015) and pervasive convergent evolution (e.g. Fukami et al. 2004, 2008; Huang et al. 2009). Taxonomically cryptic diversity has been documented in several coral genera, for example, in Acropora Oken, 1815 (Ladner and Palumbi 2012; Richards et al. 2016), Stylophora Schweigger, 1820 (Stefani et al. 2011), Pocillopora Lamarck, 1816 (Souter 2010; Schmidt-Roach et al. 2013; Gélin et al. 2017), and Seriatopora Lamarck, 1816 (Bongaerts et al. 2010; Warner et al. 2015). For centuries, morphology was the sole means to delimit one species from another (e.g. Lamarck 1815; Wells 1956; Chevalier 1975; Lang 1984; Veron 2002). However, integrative studies that rely on combining morphological and multi-locus molecular approaches for phylogenetic reconstruction have proved to be furthermost accurate (Benzoni et al. 2007, 2011, 2014; Kitahara et al. 2012, 2013; Huang et al. 2014; Arrigoni et al. 2014, 2015). For example, Arrigoni et al. (2019) recently described a new cryptic genus and species Paraechinophyllia variabilis, Arrigoni, Benzoni and Stolarski, 2019, which could not be distinguished from other lobophylliids based on morphological characters alone, and delimited it from other species from the family using six molecular markers (COI, 12S, ATP6-NAD4, NAD3-NAD5, histone H3 and ITS). Notably, species boundaries within the genus Leptastrea Milne-Edwards & Haime, 1849, have been evaluated by applying restriction site-associated DNA sequencing (RADseq) to sample single nucleotide polymorphisms (SNPs) across the genome, complementing detailed morphological diagnoses (Arrigoni et al. 2020). Such reduced-representation genomic approaches provide high coverage of homologous portions of the genome from multiple individuals for comparatively low cost and effort by sequencing only certain regions of DNA adjacent to restriction endonuclease sites (Toonen et al. 2013).

The zooxanthellate genus Pachyseris Milne-Edwards & Haime, 1849, is widely distributed throughout the Indo-Pacific and can be found in all reef habitats but most commonly on lower reef slopes (Veron 1980, 2000; Scheer and Pillai 1983; Sheppard and Sheppard 1991). In particular, Pachyseris speciosa Dana, 1846, the most widespread nominal species in the genus, is found in the Indo-Pacific from the Red Sea to the Philippines and Tahiti (Veron 2000; Hughes et al. 2013). There are currently six valid species in the genus (Hoeksema and Cairns 2020), which has been well represented in systematic and phylogenetic studies (Fukami et al. 2008; Kitahara et al. 2010; Terraneo et al. 2014). However, the relationships among Pachyseris species and within even the most widespread P. speciosa are unclear. Terraneo et al. (2014) reconstructed Pachyseris phylogeny inferred from Bayesian inference analysis of mitochondrial intergenic spacer between COI and 16S-rRNA. Three lineages were analysed: P. rugosa and P. speciosa were recovered as sister species and both were sister to P. inattesa. Despite the ecological prominence of P. speciosa on Indo-Pacific reefs, much of its evolution and population genetic structure remain obscure. Recently, Bongaerts et al. (2021) used reduced-representation sequencing to show that P. speciosa actually represents a species complex, with three sister species occurring sympatrically throughout Australasia. The genomic differentiation was accompanied by ecological, physiological and reproductive differences, yet there was a lack of morphological characters distinguishing the three lineages.

Members of the P. speciosa species complex are characterised as gonochores with a broadcasting reproductive mode (Kerr et al. 2011; Bongaerts et al. 2021). Their ecologically opportunistic nature is reflected in it being one of the most common species throughout Singapore’s reefs (Guest et al. 2016; Wong et al. 2018) and across depths (Chow et al. 2019). Ship movement rousing the shallow seafloor of the Singapore Strait (Browne et al. 2014, 2015) and the extensive urban coastal development projects are the main causes for the turbid water in the reefs here (Dikou and van Woesik 2006; Sin et al. 2016). Due to this high turbidity and the consequent high light attenuation (i.e. < 1% of surface PAR at ~ 9 m depth; Todd et al. 2004), foliose corals with large areal extents for light capture, such as P. speciosa, are most common at 3–8 m depth (Dikou and van Woesik, 2006; Guest et al. 2016; Chow et al. 2019).

In this study, we used a genotyping-by-sequencing approach to characterise the P. speciosa species complex and population genomic variation across seven sites on Singapore reefs (~ 2–15 km apart) situated along the Singapore Strait. We confirm the presence of two distinct and possibly cryptic species with further genetic substructuring uncovered that cannot be related to the studied geographic locations.

Materials and methods

Sample collection and DNA extraction

We focused on the highly urbanised reef system in Singapore where we targeted seven sampling sites—Raffles, Semakau, Hantu, TPT, Kusu, St. John and Sisters Islands (Fig. 1). Sites were chosen depending on the abundance and availability of Pachyseris speciosa. Colonies were identified to species based on the original description by Dana, 1846, supported by subsequent descriptions as well as images of the holotype and live colonies (Veron and Pichon 1980; Veron 2000; Terraneo et al. 2014). From each site, between 18 and 23 samples were taken from 4 to 6 m depth using a cutter (Table S1). We kept at least 5 m apart from sample to sample in order to avoid sampling genetically identical colonies. All samples were preserved in 100% molecular grade ethanol and stored at − 80 °C immediately after collection. Genomic DNA was extracted using Qiagen DNeasy Blood and Tissue Kit. Post-extraction, gel electrophoresis test was carried out in order to ensure that samples were clean from RNA or proteins. DNA quantification was performed using a Qubit 3 Fluorometer.

Fig. 1
figure 1

A map of all seven sampling sites at the offshore islands of Singapore

NextRAD library preparation and sequencing

All 144 samples were genotyped via nextRAD genotyping-by-sequencing (SNPsaurus, LLC), following library preparation as detailed in Russello et al. (2015). Briefly, DNA (~ 12 ng) was first fragmented and adapter-ligated with the Nextera DNA Library Prep Kit (Illumina, Inc). Fragmented DNA was then amplified with one of the primers matching the adapter and extending nine nucleotides into the genomic DNA with the selective sequence 5′–GTGTAGAGG–3′. Thus, only fragments starting with a sequence that can be hybridised by the selective sequence of the primer would be efficiently amplified. The nextRAD libraries were then sequenced on three HiSeq 4000 lanes for 150-bp single-end reads (University of Oregon).

Genotyping and quality control

The genotyping analysis used custom scripts (SNPsaurus, LLC) that trimmed the reads using bbduk (BBTools package, Brian Bushnell, Walnut Creek, CA, USA) (see Supplementary File 1). Next, all remaining reads were mapped to the Pachyseris speciosa genome (Bongaerts et al. 2021; downloaded from http://reefgenomics.org) (see Supplementary File 1). Genotype calling was performed using Samtools and bcftools (Li et al. 2009) and compiled in Variant Call Format (VCF) files using custom parameters (see Supplementary File 1). The VCF files were filtered to remove alleles with a population frequency of less than 3%. SNPs that were heterozygous in all samples, or had more than two alleles in a sample were also removed. PGDSpider (version 2.1.1.5) (Lischer and Excoffier 2012) was used to reformat the VCF files for downstream analyses. The remaining SNPs were evaluated for significant deviations from Hardy–Weinberg equilibrium and linkage using arlecore (version 3.5.2.2), with SNPs that deviated in more than five a priori populations removed as in Bongaerts et al. (2017). The clonecorrect function in R package poppr (version 2.8.1) (Kamvar et al. 2014, 2015) was used to remove potential clones from the dataset, with clonal groups reduced to a single representative per population. Finally, only SNPs with 1% missing data and samples with < 15% missing data were retained to ensure high-quality downstream analyses (see Afiq-Rosli et al. 2021). Three datasets with varying filtering parameters were assembled for analysis: overall dataset (all SNPs), neutral dataset (SNPs under selection removed) and outlier dataset (only SNPs identified as under selection). BayeScan (version 2.1) (Foll and Gaggiotti 2008) using default parameters (see Supplementary File 1) and Bayes factor cut-off of 0.05 were used to identify SNPs under possible selection.

Genetic structuring and connectivity analyses

To assess genetic structure for each dataset, Bayesian clustering analysis was performed in STRUCTURE version 2.3.4 for up to seven possible genetic clusters (K) according to the total number of collection sites. We considered correlated allele frequencies in the admixture model, using sampling locations as priors, and ran 10 iterations of 100,000 MCMC repetitions with the first 10,000 as burn-in (Gilbert et al. 2012; Janes et al. 2017). MCMC convergence, where α values reached equilibrium, was examined using the Data plot option in STRUCTURE (Porras-Hurtado et al. 2013). Variation of K values was then summarised and plotted in CLUMPAK (Kopelman et al. 2015). The optimal K was determined by examining the Ln Pr(X|K) and ΔK plots (Pritchard and Wen 2003; Evanno et al. 2005; Janes et al. 2017). Principal component analysis (PCA) was performed in the R package SNPRelate version 1.18 to identify clusters without relying on population genetic models (Jombart et al. 2010).

An individual-based analysis grounded on detecting deviations from the isolation-by-distance (IBD) models (Keis et al. 2013; Tang et al. 2018) was used to characterise barriers of and corridors for dispersal of each lineage using R package ResDisMapper (Tang et al. 2019). First, distributions of genetic distance (Nei’s standard genetic distance) and geographic distance (both measured in GenAlEx v 6.5) were checked using two modelling methods—linear and nonlinear—before a best-fit method based on R2 value was chosen for IBD residual calculation for each pair of individuals. Resistance values, together with their corresponding statistical significance over the landscape, were then calculated and visualised using default settings.

Identification of distinct lineages

In order to assess the relationship of our samples within the recently uncovered P. speciosa species complex (Bongaerts et al. 2021), we employed the cleaved amplified polymorphic sequence (CAPS) assay. This assay was designed to rapidly assign individuals to one of the three cryptic lineages that occur sympatrically in East Australia. Genomic DNA for each sample was amplified with three separate primer pairs (Pspe-GRN, Pspe-BLU and Pspe-RED), and amplicons were digested with restriction enzymes (HhaI, HaeIII and Taqα1, respectively) following reaction conditions described in Bongaerts et al. (2021). The lengths of restriction fragments were then determined via agarose gel electrophoresis and used to assign samples to one of the three clusters (Bongaerts et al. 2021) (see Fig. S1).

We also compared representative samples from our main clusters through joint variant calling and a maximum likelihood tree. For this comparison, we included three samples from each of our two main genetic clusters and six samples from each of the six main clusters (“green”, “blue”, “red”, “dark green”, “dark blue” and “purple”) in Bongaerts et al. (2021). RAxML version 8.2.12 (Stamatakis 2014) was used to infer a maximum likelihood phylogeny based on the SNP data with invariant sites removed. We applied an ascertainment bias correction for the omission of invariant sites (Lewis 2001; Leaché et al. 2015) to perform tree searches using 50 random starting trees and bootstrap resampling with 1000 replicates.

Results

Sequencing of the nextRAD libraries resulted in an average of 3.01 million reads per sample that mapped to the reference (n = 144) and genotype calling initially yielded 19,716 biallelic SNPs. After quality control to remove SNPs that were linked, deviated from Hardy–Weinberg equilibrium or had more than 1% missing data per SNP, removing colonies with more than 15% missing data per sample, and following minimal representation filtering, we retained 8590 biallelic SNPs (n = 140). clonecorrect under default parameters did not identify any clones. BayeScan identified 32 outlier SNPs. PCA and STRUCTURE analyses with or without SNPs identified as putatively under selection did not alter assignments of individuals to populations, and because of this, we opted to use the larger dataset for subsequent analyses.

PCA and STRUCTURE clearly showed that samples were clustered into two distinct lineages that were not site specific (Fig. 2), possibly representing cryptic species. The CAPS assay yielded clearly distinct restriction fragment length patterns for the two lineages (Fig. S1; Table S1). Specifically, one of the lineages had the diagnostic mutation for the blue cluster described by Bongaerts et al. (2021), as indicated by the restriction digest of the Pspe-BLU amplicon by the HaeIII enzyme into two 100–150 bp bands, whereas no digestion was observed for the other lineage. None of the samples had the diagnostic mutation of the “red” or “green” lineage, as indicated by consistent digestion observed for all samples with Pspe-Red-Taqα1 (not “red”) and no digestion observed for Pspe-GRN HhaI (not “green”) (Fig. S1; Table S1). It must be noted the CAPS assay was designed to target specifically GBR lineages, and was not diagnostic for the “dark green” and “dark blue” lineages in Okinawa (Bongaerts et al. 2021). We therefore conducted joint variant calling and maximum likelihood analysis with representative samples from Bongaerts et al. (2021), which indicated our lineages to be most closely related to the “dark green” and “dark blue” clusters (which appear to be geographic subclusters of the “green” and “blue” lineages) (Fig. 3; Table S1). We therefore analysed the two lineages (green and blue) separately, in order to detect genetic clustering within the two lineages with identical filtering steps as the overall dataset.

Fig. 2
figure 2

Analysis of 8590 SNPs found across samples of Pachyseris speciosa from Singapore. a STRUCTURE plot of ancestry proportions with K = 2 and K = 4 depicting two main clusters (green and blue) that were not site specific. Within the green cluster two subclusters could be observed (dark and light green). b Principal component analyses (PCAs) of the blue and the green clusters showed separation on PC1 (18.9% variations)

Fig. 3
figure 3

Maximum likelihood tree of representative samples of the green and blue lineages from Singapore showing clustering with samples from the “dark green” and “dark blue” clusters in Okinawa from Bongaerts et al. (2021)

Based on 6979 and 6562 biallelic SNPs (including 45 and 38 outlier SNPs detected by BayeScan) for the green (n = 70) and blue (n = 47) lineages, respectively, global mean FST was higher in the green lineage (FST = 0.0142) compared to blue lineage (FST = 0.0037). Evanno’s method and PCA showed that the most likely “K” is two for the green lineage (Fig. 4a, b; Fig. S2a). For this number of populations, STRUCTURE plots showed that St. John comprises exclusively of a dark green subcluster (not to be confused with the “dark green” cluster in Bongaerts et al. 2021) and that Semakau and TPT comprise mainly the light green subcluster (Fig. 4a). PCA and STRUCTURE showed no geographic structuring of colonies within the blue lineage (Fig. 4c, d; Fig. S2b). ResDisMapper identified a dispersal corridor at the southern periphery of the islands along the Singapore Strait for both lineages (Fig. 5a, b).

Fig. 4
figure 4

Analysis of the green and blue clusters with 8590 SNPs. a STRUCTURE plot of ancestry proportions from K = 2 to K = 4 with the best model depicting two main subclusters (dark and light green) that were not site specific. St. John has a 100% proportion of the dark green subcluster. Semakau has the highest proportion of the light green subcluster (54%). b Independent principal components (PCs) are separated primarily into two subclusters on PC1 (11.4% variations) with a few mosaics from Hantu, Sisters and Semakau. c STRUCTURE plot of ancestry proportions from K = 2 to K = 4 with the best model depicting one population across all sites within Singapore’s blue cluster. d Principal components (PCs) showing one cluster with a few mosaics from Kusu and St. John

Fig. 5
figure 5

Resistance map produced by ResDisMapper for a green lineage and b blue lineage of Pachyseris speciosa. Areas with resistance values that are higher/lower than those from a null distribution with high probability and lie within the red/green contours represent a significant barrier/corridor. Areas within the blue contours have resistance values with high probability of being positive or negative (high “certainty”). Yellow circles indicate sampling points

Discussion

The aim of this study was to characterise the Pachyseris speciosa species complex and its population genetic structure in Singapore using a genotyping-by-sequencing approach (nextRAD; Russello et al. 2015). Based on sequencing of 144 colonies that were identified as P. speciosa based on conventional taxonomy (Dana 1846; Veron and Pichon 1980; Veron 2000; Terraneo et al. 2014) from seven sites 2–15 km apart, we found two clearly distinct lineages (i.e. green and blue) that were sympatric at each of the seven sampling sites (Fig. 2). This separation into two lineages was supported through a CAPS assay and clustering analyses based on shared variants, with the latter indicating relatedness to the “dark green” and “dark blue” clusters from Okinawa identified in Bongaerts et al. (2021) (Fig. 3). Our results greatly extend the geographic range of these recently uncovered P. speciosa species, further confirming their status as distinct and widespread species.

The population structuring within each of the two lineages reveals an interesting pattern: the blue lineage seems to be one panmictic population, with only a few outliers in St. John and Kusu Island, while the green lineage has unexplained substructuring (Fig. 4). The low genetic structure of the blue lineage across all geographic sites is similar to other broadcasting spawners (e.g. Porto-Hannes et al. 2015; Bongaerts et al. 2017; Eckert et al. 2019) and more specifically in Singapore (Tay et al. 2015). Gene flow mechanisms of coral species are dependent mostly upon their life history traits, as broadcast spawners’ populations are expected to be more connected due to the long pelagic larval duration of broadcasted larvae compared to brooders (Serrano et al. 2016; Underwood et al. 2018). Reproductive patterns of P. speciosa in Singapore have never been documented. Australian P. speciosa species were reported to be gonochoric spawners, and the different lineages showed different timing of gamete release. It is possible that the different genetic patterns of the Singapore lineages could be linked to variations in the timing and mode of gamete release. Furthermore, changes in current regimes associated with tidal cycles, storms and seasonal monsoon along the Singapore Strait (Sin et al. 2016) may influence gene flow and drive the observed differences in genetic structures of the two P. speciosa lineages. The two lineages of P. speciosa—showing such distinct population genetic and gene flow patterns between them—support the species-level divergence inferred from the SNP data in this work and in Bongaerts et al. (2021), which found four distinct lineages including the two found here.

ResDisMapper analyses showing barriers vs. corridors to gene flow between sites indicate that connectivity between the western sites and between eastern and western sites is limited and could be observed only between the eastern sites and Raffles and Hantu (Fig. 5). The population distinction between the western and eastern populations and among the western populations is concerning and could be due to the massive reclamation and development of the area in the last 60 years. Fringing and patch reefs, mangroves, sandy and rocky shores were major coastal features in this area prior to development in the 1960’s (Chou et al. 2019). Yet, the distribution of coral reefs has been greatly reduced by landfill construction (Chou and Tan 2007), the expanding port facilities and terminals (Sin et al. 2016) and the reclamation and development of offshore islands (Tay et al. 2018). These extensive coastal changes have resulted in rapid declines in area and quality of reef habitats (Hilton and Manning 1995), especially those adjacent to the western cluster of islands due to their proximity to the intensive urban activities. The increase in land area could have reduced water flow at the western islands, or increased introduction of sediments, nutrients, heavy metals and organic chemicals (Goh and Chou 1997; Sin et al. 2016) and led to filtering of more tolerant genotypes.

Considering most islands are isolated (i.e. showing barriers between them and the others), there may be a need for these results to inform future developments to consider enhancing physical and biological connectedness among the western island cluster. Such actions will promote biological connectivity and resilience, especially since coral recruitment is at low levels in the Singapore Strait (Bauman et al. 2015). Relatedly, recent genotyping of the endosymbionts of P. speciosa suggests that the two lineages do not have divergent dominant Symbiodiniaceae types (Jain et al. 2020; Smith et al. 2020). Singapore’s reefs provide habitat for a diverse coral assemblage (Guest et al. 2016), yet a strong selective pressure caused by the turbid water along the Singapore Strait possibly limits the diversity of the associated endosymbiont community. Interestingly, detailed characterisation of the bleaching susceptibility of the two lineages also showed no clear difference as both lineages have similar responses to bleaching (Jain et al. 2020).

The subclustering within the green lineage (i.e. Figure 4a, b), possibly promoted by the barriers to gene flow (i.e. Figure 5a), may help explain the cryptic lineages of P. speciosa in other geographic sites. If these processes do scale up to the global level, i.e. at larger geographic scale and over longer time, these minute population differentiations could provide more opportunities for isolation of lineages, especially over the 10-million-year history of the three cryptic lineages (Bongaerts et al. 2021). We note that there are three other nominal species currently considered to be junior synonyms of P. speciosa (Veron and Pichon 1980; Scheer and Pillai 1983). Agaricia levicollis Dana (1846) has a non-specific type locality referring to Southeast Asia (‘East Indies’, Dana 1846, p. 338), while Pachyseris haimei Quelch, 1886, and Pachyseris clementei Nemenzo, 1955, were described from Tahiti (French Polynesia) and Puerto Galera (Philippines), respectively. Localities of Agaricia levicollis and P. clementei are within the range of the green and blue clusters discovered between Bongaerts et al. (2021) and the present study. Detailed taxonomic work integrating these genomic results, morphological differences between these nominal species, and more geographically resolved collections in the Central Indo-Pacific are needed to ascertain potential relationships with these junior synonyms.

Population genetics and connectivity among reefs play a vital role in shaping regional patterns of reef biodiversity and recovery following disturbance (Bongaerts et al. 2010, 2017; Sheets et al. 2018). This research highlights how a small reef system less than 15 km across all sites sampled can have highly sympatric distributions, furthermore showing distinct subclustering of a cryptic lineage.