Introduction

Thalassemia, a monogenic hereditary disease, arises from mutations in globin genes, resulting in reduced or absent synthesis of globin chains. This alteration leads to changes in the composition and levels of hemoglobin in peripheral blood, as well as in the oxygen-binding capacity of red blood cells, ultimately causing hemolysis. Consequently, individuals with thalassemia experience symptoms of anemia. The classification of thalassemia is based on the specific mutated globin gene, with subtypes including α-, β-, γ-, and δ- globin. Mutations of the α-globin (including two adjacent genes, HBA1 and HBA2, encoding identical sequences) and β-globin (HBB) genes are responsible for the majority of thalassemia cases. α-Thalassemia is classified into four categories, namely, α-thalassemia silent (-α/αα or αTα/αα), α-thalassemia minor (--/αα, -α/-α, -α/αTα or αTα/αTα), α-thalassemia intermedia or HbH disease (--/-α or --/αTα), and α-thalassemia major or Hb Bart’s hydrops fetalis syndrome, according to the number of affected copies of the α-globin gene [1]. β-Thalassemia is categorized into β-thalassemia minor (β+N or β0N), β-thalassemia intermedia (β++ or β+0), and β-thalassemia major (β00 or β0+) types (because β+ includes both mildly and severely pathogenic variations, β+/β0 may show characteristics of β-thalassemia intermedia or major depending on the specific pathogenicity of the β+ variation) [2, 3]. Notably, an Asian prevalent missense mutation in HBB gene (c.26G > A) termed βE produces a pathogenic β-hemoglobin which constitutes a unique Hb E hemoglobin characterized by a distinguished band from that of Hb A in hemoglobin electrophoresis.[4, 5].

Thalassemia is widely prevalent, ranging from the Mediterranean to the Middle East, Africa, India and Southeast Asia, and thus has significant global health implications [6, 7]. Extensive research [8,9,10] has found a high prevalence of thalassemia in southern China, particularly in the provinces of Guangdong (16.45%) [11, 12], Guangxi (20%) [13], and Hainan (21.03%) [14]. Shenzhen, a metropolitan area attracting migrants in Guangdong, is also affected by thalassemia. Migration has been observed to increase the prevalence of β-thalassemia in certain regions of Europe and North America [15]. The population of Shenzhen comprises diverse ethnic groups, with a significant portion originating from provinces with a high prevalence of thalassemia [16]. With carriers of variations from less investigated areas of southeast China gathering in Shenzhen for long-term work and habitation, the prevalence of thalassemia is constantly changing, and the odds of encountering rare variants may increase accordingly.

Symptoms of thalassemia vary depending on the type and the combination of globin gene defects carried by individuals. Thalassemia major patients may have to endure frequent transfusions to alleviate life-threatening symptoms or undergo costly bone marrow transplantation requiring lifelong immune suppression, placing a great burden on both family and society. However, for less symptomatic carriers of recessive or minor variations, the gene defect will very likely be passed on to the next generation and sustained in the local population. If both partners in a couple carry variations, the possibility of giving birth to a child with thalassemia major drastically increases. Consequently, it is imperative to examine the distribution patterns of thalassemia and conduct screenings for thalassemia mutations among individuals of child-bearing age in Shenzhen.

This study utilized a sample size of 22,098 peripheral blood samples obtained from 11,049 couples of child-bearing age who tested positive during preliminary screenings. The genotyping outcomes show the spectrum of thalassemia in Shenzhen and provide insights into the thalassemia mutation composition and distribution patterns as well as a diagnostic basis for thalassemia interventions and control measures in the local population.

Materials and methods

Participants and inclusion criteria

A total of 11,049 couples with at least one person in each couple who tested positive for thalassemia were recruited for this study between January 2017 and September 2022. Participants were considered positive for thalassemia if they exhibited abnormal values of red blood cell (RBC) indices (tested with Mindray BC-7500 CRP auto hematology analyzers, included individuals should show MCV ≤ 82 fl. and/or MCH ≤ 27 pg) or hemoglobin electrophoresis examination results (tested with Capillarys 3 OCTA, included individuals should show Hb A2% < 2.5, or Hb A2% > 3.5, or Hb F > 2). All participants were informed of the study details and signed an informed consent form. Whole blood samples were collected from these participants. In total, 22,098 peripheral blood samples were transported to our laboratory and subjected to further screening for α- and β-thalassemia under approval of the ethics committee.

Thalassemia screening by reverse dot-blot hybridization assays with PCR

The common mutations of both α- and β-thalassemia in China, including 6 α-thalassemia mutations (-- SEA, -α3.7, - α4.2, αCS, αQS, and αWS) and 19 β-thalassemia mutations (-30(T-C), -32(C-A), -28(A-G), -29(A-G), Cap + 40–43(-AAAC), Cap + 1(A-C), Int(T-G), CD14/15(+ G), CD17(A-T), CD27/28(+ C), βE(G-A), CD31(-C), CD41/42 (-TCTT), CD43(G-T), CD71/72(+ A), IVS-I-1(G-T, G-A), IVS-I-5(G-C) and IVS-II-654(C-T)), were detected using reverse Dot-Blot hybridization assays with PCR, following the guidelines provided by the manufacturer (Hybribio, ZL 201110117563.5, China).

Rare mutation identification by GAP-PCR and next-generation sequencing

The GAP-PCR technique was performed under a standardized procedure by Hybribio Company to screen for large deletions around HBA1, HBA2 and HBB. Regions of each globin gene in the α-globin (ξ, ψξ, ψα2, ψα1, α2, α1, θ) and β-globin gene clusters (ε, Gγ, Aγ, ψβ, δ, β), including exonic, intronic, 5’UTR, 3’UTR, and 5000 bp upstream and downstream regions, were amplified using long-range PCR, which can amplify DNA lengths up to 10 kb (Takara Bio, Japan), and sequenced using the Ion Torrent Proton (Thermo Scientific, USA) platform following manufacture’s standard procedure (performed by Hybribio) [17]. Mutation identification was achieved by comparing the sequencing data with databases including the ITHANET (https://www.ithanet.eu/), HbVar (https://globin.bx.psu.edu/cgi-bin/hbvar/query_vars3), 1000 Genome, ClinVar, HGMD and ExAc databases. All results were validated by Sanger sequencing (Hybribio). The rare variations were classified into “Pathogenic / Likely Pathogenic”, “Benign / Likely Benign” and “N/A (uncertain/VOUS)” based on annotations from ITHANET (https://www.ithanet.eu/) or HbVar (https://globin.bx.psu.edu/hbvar/hbvar.html).

Statistical analysis of the relationship between phenotype and genotype

For each hematological index, only data fall within the interquartile range (IQR) method were calculated and used for statistical analysis. All hematological indices are expressed as the mean ± SD. The Wilcoxon rank sum test was used to compare hematological indices and hemoglobin features across thalassemia genotypes (n > 20). P < 0.05 was considered statistically significant.

Results

Spectrum of Thalassemia variants prevalent in the Shenzhen population

In this study, a total of 9948 cases of thalassemia were identified among 22,098 participants. Among these cases, there were 7111 patients with α-thalassemia, including 2376 with α-thalassemia silent, 4590 with α-thalassemia minor, and 145 with α-thalassemia intermedia or HbH disease. A total of 18 genotypes of α-thalassemia were detected, with the --SEA/αα (63.37%), -α3.7/αα (18.66%), and -α4.2/αα (7.31%) forms accounting for the majority of cases (Fig. 1A). Additionally, a total of 3252 cases and 15 genotypes of β-thalassemia were identified in this study, with β41–42N, β654N, and β17N being the three dominant β-thalassemia genotypes, accounting for 34.96%, 28.11% and 13.84% of cases, respectively (Fig. 1B). We also identified 415 cases with coinheritance of α-thalassemia and β-thalassemia variants.

Fig. 1
figure 1

Spectrum of detected αand β thalassemia mutations

Constitution of spectrums of A α- thalassemia genotypes in 7111 carriers and B β-thalassemia genotypes in 3252 carriers

Differentiation of hematological indices and thalassemic genotypes

According to the number of defective genes, α-thalassemia can be categorized into four subtypes, namely, α-thalassemia silent, α-thalassemia minor, α-thalassemia intermedia, and α-thalassemia major (lethal). Statistical analysis revealed significant variations in hematological parameters, including mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), hemoglobin A percentage (Hb A%), and hemoglobin A2% (Hb A2%), among the four subtypes of α-thalassemia (P < 0.05). Notably, the most substantial differences were observed in MCV and MCH between the α-thalassemia silent and α-thalassemia minor groups (Table 1). Significant variations in hematological indices, such as MCV, MCH, Hb A%, and Hb A2%, were observed in the βEN group compared to other β-thalassemia genotype groups. Moreover, the β−28N, β−29N, and β17N groups exhibited significant differences in MCV and MCH indices. Additionally, the β41–42N group displayed significant differences in Hb A levels, while the β654N group showed significant differences in Hb A2 levels in comparison to the other groups (Table 2).

Table 1 Detected α-Thalassemia genotypes and hematological indexes
Table 2 Detected β-Thalassemia genotypes and hematological indexes

Identification of high-risk couples

A total of 970 couples were identified as high-risk couples due to the potential inheritance of two or more thalassemia mutations, resulting in thalassemia intermedia or thalassemia major with severe symptoms. These high-risk couples were further categorized into four groups according to the worst possible phenotypes that might occur in their offspring: the α-thalassemia major (Hb Bart’s) group, consisting of 383 couples; the α-thalassemia intermedia (HbH) group, consisting of 435 couples; the β-thalassemia major group, consisting of 113 couples; and the β-thalassemia intermedia group, consisting of 39 couples (Table 3). Notably, the majority of these high-risk couples carried α-thalassemia mutations, with the predominant genotype combinations being --SEA/αα with --SEA/αα (36.6%) and --SEA/αα with -α3.7/αα (23.3%). For those couples categorized as high risk in our study, follow-up genetic counseling is highly recommended.

Table 3 Genotype combinations of 970 high-risk couples identified

Identification of rare mutations in the α- and β-globin gene cluster

In our study, a total of 42 rare globin gene mutations were identified in 99 patients using a combination of GAP-PCR and NGS (Table 4). These rare mutations were validated through Sanger sequencing. Among the identified mutations, deletion mutations accounted for 40.4% (40 out of 99) of the cases. Specifically, there were 21 cases of rare α-globin mutations, 60 cases of rare β-globin mutations, 2 cases of rare γ-globin mutations, and 16 cases of coinheritance. The coinheritance cases included 6 cases of HKαα, 1 case of --THAI, 20 cases of Chinese Gγ+(Aγδβ)0, 10 cases of Southeast Asian (Vietnamese) deletion, 3 cases of SEA-HPFH, and 1 case of Taiwanese deletion. Furthermore, a total of 11 mutations, including HBA1:c.46G > C, HBA1:c.-9G > C, HBA2:c.-24 C > G, HBA2:c.300 + 34G > A, HBB:c.-137 C > T, HBB:c.47G > A, HBB:c.92 + 2T > A, HBB:c.246 C > A, HBB:c.315 + 1G > A, HBB:c.316-184T > C, and HBG1:c.-211 C > T (rs7482144), as well as an unreported synonymous mutation in the β-globin gene, HBB:c.246 C > A (rs145669504), were newly detected in the Chinese population (Supplementary Table S1).

Table 4 Detected 42 rare mutations of hemoglobin genes

Ratio curve of thalassemia carriers versus enrolled participants by year

This study started in January 2017 and ended in September 2022; due to an update of local policy, the collected samples were subjected to immediate analysis at the community health care facilities of each jurisdiction of Shenzhen instead of being sent to our laboratory for standardized detection. Overall, the number of total participants included each year was comparable, except for a sharp increase in 2021 (Fig. 2A). The percentages of thalassemia carriers among all participants in each year were calculated and compared. An even reduction in the rate of detected thalassemia carriers was observed since 2017 (Fig. 2B). In addition, similar decreases were also observed in the separate detection rates of α- and β-thalassemia carriers (Fig. 2B). These observations may indicate a recent tendency of diminishing thalassemia mutation alleles affecting couples of childbearing age among the Shenzhen population.

Fig. 2
figure 2

Delineation of thalassemia cases detected each year from 2017 to 2022

A Number of thalassemia carriers and total participants each year B Ratio in percentage of α, β and rare thalassemia mutation carriers against total enrolls each year

Discussion

While there are various treatments available to alleviate symptoms, a definitive cure for thalassemia remains elusive. Hence, it is of utmost significance for married individuals to undergo thalassemia screening and genetic counseling to prevent the occurrence of thalassemia in their offspring. A comprehensive thalassemia screening program was conducted on 11,049 couples (comprising 22,098 individuals) of reproductive age from January 2017 to September 2022. The results of this study can serve as a valuable resource for informing substantial genetic counseling or prenatal diagnostic interventions.

Shenzhen is a highly industrial city with a large population of young migrants from southern areas of China with a high thalassemia prevalence, raising concerns about local public health. This study encompasses 8 districts of Shenzhen and utilizes a larger sample size than prior studies, enabling a comprehensive depiction of the current state of thalassemia in the region. The detection rate of thalassemia gene mutations in 22,098 participants was found to be 45.02%. This rate is comparable to the rates observed in previous studies conducted in Shenzhen [18, 19], Fujian [20], and Jiangxi [21]. Additionally, it is slightly higher than the rates observed in the Guangming and Longhua districts of Shenzhen [22, 23] but significantly lower than the rates observed in Guangxi [24] and Guangzhou [10]. These variations in detection rates between district-level studies and our own study in Shenzhen may be attributed to different sample sizes, coverages of the city’s population, and the notably intricate composition of the Shenzhen population, wherein migrants from regions with a lower prevalence may dilute the impact of the frequency of mutant alleles. Specifically, we identified 7111 cases of α-thalassemia encompassing 18 genotypes, with the --SEA/αα, -α3.7/αα, and -α4.2/αα genotypes accounting for 89.34% of cases. A total of 3252 cases of β-thalassemia encompassing 15 genotypes were identified, with β41–42N, β654N, and β17N accounting for 76.91% of cases, which is consistent with the findings of Zengjun [19]. Screening for predominant thalassemia alleles remains a vital priority capable of identifying the majority of thalassemia cases in Shenzhen.

The preliminary screening processes for thalassemia involved testing RBC indices and hemoglobin features. Variations were observed among different types of thalassemia patients, such as a common decrease in MCV in α-thalassemia cases and a common increase in Hb A2 in β-thalassemia cases. Analysis of the mean and standard deviation of the MCV index indicated a decreasing trend among silent, intermedia and major groups. The most prominent decreases were observed in the MCV and MCH indices of individuals with the --SEA/-α3.7 and --SEA/-α4.2 genotypes. In individuals with β-thalassemia, a significant difference was observed in all four hematological indices, particularly the Hb A2 index (3.50 ± 0.09), in the βEN group, which is consistent with a previous study conducted on the population of Chongqing [25]. The βCAP detected in our study comprises of two potentially pathogenic variations, namely Cap + 1(A-C) and Cap + 40–43(-AAAC). Based on ITHANET database, these two variations were annotated as “pathogenic/likely pathogenic” and therefore packaged as one single Dot-Blot hybridization spot in our testing strategy. Although in several findings, the phenotypes of these βCAP variations may fall within the normal range [26, 27]. The βCAPN group exhibited the lowest Hb A2 index (2.53 ± 0.01) among the β-thalassemia groups. Therefore, the Hb A2 indices of individuals with the βEN and βCAPN genotypes may fall within the normal range (2.5% < Hb A2 < 3.5%). Hence, it cannot be ruled out that patients with abnormal MCV and MCH but normal Hb A2 indices may have βEN or βCAPN. As for Hb F, the values of most samples are 0 and therefore might be unsuitable for statistical comparison. It would provide more valuable information to analyze other blood test results, hemoglobin electrophoresis data, abnormal hemoglobin, etc. However, only MCV, HCH, Hb A and Hb A2 data of the samples are accessible to us based on the administration protocol, which is a limitation of our study.

Among 11,049 couples of child-bearing age, 970 high-risk couples were identified, representing 8.78% (970/11,049) of the cases. This percentage is lower than that reported by Wenzhong Zhao et al. in 20 regions of Guangdong Province (37.04%) [12]. The majority of high-risk couples belonged to the α-thalassemia major (39.48%, 383/970) and the α-thalassemia intermedia group (44.85%, 435/970), which is consistent with the distribution pattern observed in a population from Guangxi Province [28]. These couples have a 25% risk of delivering babies with thalassemia intermedia or thalassemia major. Babies with α-thalassemia intermedia, β-thalassemia intermedia, and β-thalassemia major may require lifelong blood transfusions and chelation therapy, while Hb Bart’s fetuses (α-thalassemia major) often do not survive until birth. The current treatments for thalassemia are not cost-effective [29], which further burdens both the families of patients and society as a whole [18, 30, 31]. According to most updated data [30], the average cost for a single thalassemia major patient would reach up to 4.8 million RMB (~ 670,000 US dollars). All of these 970 high-risk couples could thoroughly benefit from genetic counseling and other prenatal tests, potentially saving a cost of billions for both the family and the government in thalassemia treatment. Therefore, it is crucial to screen married couples for specific globin gene mutations before pregnancy and/or during prenatal care to prevent and control thalassemia. It is noteworthy that a rare α-globin gene triplication variation [32], resulting from an unequal cross recombination between two HBA alleles, produces one α-triplication allele and one “-α” allele which may lead to α-thalassemia in offspring. However, the testing strategy employed in our study is unable to recognize this α-triplication variations. Testing of α-triplication should be considered in further screening programs.

This study identified 42 rare mutations, with 13 potentially pathogenic mutations and 11 mutations reported for the first time in Chinese populations, including an HBA2:c.300 + 34G > A mutation that was initially documented in ITHANET (https://www.ithanet.eu/) by researchers from Malaysia but has not been published in any scholarly works. In addition, we identified a novel mutation in the HBB gene (HBB:c.246 C > A). This novel mutation is situated within the exon region of the HBB gene, and the resulting amino acid remains unaltered, indicating a synonymous mutation. Currently, it is not included in the ITHANET database, but it is documented in the 1000 Genomes database with a T allelic frequency of 0.005 in the east Asian population. Another infrequent mutation, HBB:c.-138 C > G, was initially reported in 2018 by our laboratory [33]. The rare mutation HBB:c.-137 C > T was identified in the CAP domain, which is an essential component in the regulation of downstream gene expression. A patient with coinheritance of 3 rare HBG mutations was identified in this study. One mutation was rs1554921759, a pathogenic mutation located at position − 158 of the HBG1 gene, which was reported to be associated with the elevation of Hb F and was the cause of hereditary persistence of fetal hemoglobin (HPFH) in a patient. Another mutation was rs368698783, a mutation disrupting the Ly1 antibody reactive (LYAR)-binding motif located at the proximal promoter region of the HBG1 gene. The mutation rs7482144 [34], located at position − 158 of the HBG2 gene, was found to be associated with elevated levels of Hb F in multiple research studies [35]. However, the mechanisms by which rs1554921759 and rs7482144 increase Hb F expression remain unknown. However, the coinheritance of these mutations did not significantly increase the level of Hb F in the patient (Hb F = 4.2%), which was lower than that in patients carrying rs368698783 and rs7482144, as reported in previous studies [36,37,38]. Further investigation is needed to determine whether there are any interactions among rs1554921759, rs368698783, and rs7482144 that restrict the elevation of Hb F. The novel variation HBB:c.246 C > A (rs145669504) identified in our study is a synonymous mutation which theoretically do not affect the β-globin function. However, the carrier of this mutation also harbors a --SEA/αα variation and thereby manifests relevant symptoms. The impact of this variation on gene transcription still awaits further investigation.

The Chinese public health system has made significant advancements in the management and intervention of thalassemia in recent years. However, given its status as a metropolis situated in a prominent thalassemia hotspot region and a highly sought-after destination for migrants, Shenzhen continues to face significant challenges. Consequently, it is of the utmost importance to provide initial screening examinations for thalassemia in married couples. By implementing procedures tailored to thalassemia-positive and high-risk couples, including genetic assays for prevalent mutations, molecular diagnostic assays for uncommon mutations, and reproductive counseling services, the incidence of thalassemia intermedia or thalassemia major in newborns can be largely mitigated, thereby improving the health of the newborn population.

Conclusion

Our study examined the range of thalassemia variations, proportions of high-risk couples and rare globin gene mutations over a period of six years in a population from all regions of Shenzhen, providing a comprehensive understanding of the current prevalence of thalassemia in the local population. These findings could offer valuable clinical reference for the local Centers for Disease Control and Prevention (CDC) in their efforts to control and intervene in thalassemia.