Introduction

Phenylketonuria (PKU) is one of the most common autosomal recessive metabolic diseases. Phenylalanine (Phe) metabolic disorders in the liver can be caused by phenylalanine hydroxylase (PAH) gene variant (Blau et al. 2010). The incidence of PKU in the northwest regions of China is significantly higher than the average national level (1/11,188) (Gu and Wang 2004) reaching its highest level in Gansu (1/3420) (Wang et al. 2015). This may be related to the unique geographic location and ethnic composition of the northwest region. The low rates of PKU screening and treatment affect the quality of life of the population and the infant mortality rate. Therefore, early diagnosis and treatment of PKU is particularly important.

The human PAH gene is located on chromosome 12q23.2 and contains a total of 13 exons and 12 introns (Woo et al. 1983). More than 1000 variants have been identified in PAH, recorded in the locus-specific database (LSD) known as PAHvdb (http://www.biopku.org/pah/). The detection of PAH gene variants and correlation analysis between PKU genotype and clinical phenotype may be of great value for PKU genetic counseling and prenatal diagnosis. As an important link on the Silk Road, gene mixing is frequent in the northwest regions and PAH gene variants may have a unique distribution in this area. Thus, although it is expected that some overlap exists with various other provinces and autonomous regions of China (He et al. 2015; Qiang et al. 2014; Yan et al. 2009; Zhang et al. 2015), a large-scale analysis of the PAH gene mutational spectrum of PKU patients is of great significance and importance for the prevention and treatment of PKU in this region.

In this study, an analysis of PAH gene variants was performed in a large cohort of 475 PKU patients and their families in the Northwest of China by Sanger sequencing and the MLPA method. This is helpful for the establishment of the PAH gene mutation spectrum and clarification of PAH gene mutation distribution characteristics of PKU patients in Northwest China. These results provide the basic data for prenatal diagnosis and prevention of PKU. Further correlation exploration between genotype and phenotype of PKU patients also provides some theoretical basis for the study of gene function.

Materials and methods

Patients and phenotypes

A large cohort of 475 patients with PKU, diagnosed at the Newborn Screening Center and Medical Genetics Center of Gansu Province Maternal and Child Health Hospital from January of 2007 to January of 2017, were included in this study. All patients were from Gansu, Shaanxi, Ningxia, Qinghai and Xinjiang. This study was approved by the Ethics Committee of Gansu Province Maternal and Child Health Care Hospital, and the subjects included in this study have provided written informed consent.

These patients were identified through a neonatal screening program. The plasma Phe concentration were measured by tandem mass spectrometry from dried blood spot (DBS) samples before starting treatment. According to classification criteria (Yang and Ye 2014), the 475 PKU patients were divided into three types: 253 cases of classic PKU (Phe ≥ 1200 μmol/L), 150 cases of moderate PKU (Phe: 360 ~ 1200 μmol/L), and 74 cases of mild hyperphenylalaninemia (MHP) (Phe: 120 ~ 360 μmol/L). All patients were excluded from tetrahydrobiopterin deficiency through a BH4 loading test, a urinary pterin analysis, and a DHPR activity assay on DBS samples.

Sanger sequencing

Genomic DNA (gDNA) was extracted from the peripheral blood mononuclear cell or DBS samples of patients using Whole Blood Genomic DNA Extraction Kits (Tiangen, Beijing, China) or the Chelex-100 (Bio-rad) method according to the product descriptions. The PAH gene reference sequence (NG_008690.1) and the cDNA sequence (NM_000277.1) were obtained from the UCSC database (Giardine et al. 2007). A total of 13 PCR primer pairs, targeting exons 1–13 and the flanking sequences of the PAH gene were designed using Primer Premier 5.0 software (Premier Biosoft International, Palo Alto, CA, US). The BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) was used to generate sequencing products from purified PCR products. Sequencing products were subjected to capillary electrophoresis on an ABI 3500DX DNA analyzer (Applied Biosystems, Foster City, CA, USA). Sequencing results were analyzed using Chromas 2.33 software (Technelysium Pty Ltd., South Brisbane, Australia), and variants were determined using Seqman in the DNASTAR® Lasergene v7.1 software package (DNASTAR, Inc., Madison, USA) by comparing sample sequences with the normal reference sequence.

MLPA analysis

The detection of large-scale deletion/duplication variants of PAH gene was performed in patients with only one variant allele of PAH or in those with no variant found, using the SALSA MLPA kit P055 PAH (Cat. P055-100R, MRC Holland, Amsterdam, Netherlands). The assay operation was performed according to the manufacturer’s protocol. Copy number variations of exons of PAH were analyzed by the Coffalyser data analysis software (MRC-Holland b.v., Amsterdam, Netherlands). Copy number was indicated by ratio value. The ratios range of normal reference is 0.7–1.3; the ratios indicate 3 copies in the range of 1.3–1.7, which is judged to be a heterozygous duplication mutation; the ratios indicate a single copy from 0.3 to 0.7, which is judged to be a heterozygous deletion mutation; the ratios indicate homozygous at 0 to 0.3 and is judged to be a homozygous deletion mutation. Four normal control DNA samples from healthy individuals were used in each run. Copy number variations of exons of PAH detected by MLPA were identified by gap-PCR analysis. Gap-PCR primers for detection of type D1 (Ex1del3758) and type D2 (Ex1del5269ins56) in exon 1 and deletion of exon 3 were designed based on previous reports (Desviat et al. 2006; Lu et al. 2011). The primers for other deletion sites were designed using Primer 5.0 (Supplementary Table 1). All aberrantly amplified fragments detected by gap-PCR were sequenced to identify the deletion sites.

Analysis of novel variants

All detected PAH gene variants were screened against the 1000 Genomes Project (Sudmant et al. 2015), dbSNP (Hubbard et al. 2016), and ExAC databases (Song et al. 2016) to exclude polymorphic sites in the population, and against the ClinVar (Zastrow et al. 2018), HGMD (Qiang et al. 2014), and PAHvdb databases (Polak et al. 2013) to identify potential unreported variants (novel variants). The nomenclature of novel variants was based on the sequence variant nomenclature of the Human Genome Variation Society (den Dunnen et al. 2016). The novel missense variants and splice variants were identified in accordance with the criteria for interpretation of sequence variants (Richards et al. 2015).

Statistical analysis

SPSS 16.0 software was used for statistical analysis. Comparisons of the distribution frequencies of different variants in different types of patients were performed with the Chi-square test. A P value <0.05 was considered statistically significant.

Results

Mutation spectrum of PAH gene

A total of 895 variant alleles were detected in the 950 alleles of 475 PKU patients, and 128 different pathogenic variants were identified (Table 1). All detected variants were derived from parents and no de novo mutation was found. Among the 895 detected mutant alleles, the 10 most frequent variants represented 50.21% of the total, which were p.Arg243Gln (14.00%), EX6-96A > G (c.611A > G) (5.58%), p.Tyr356* (4.95%), p.Arg413Pro (4.74%), IVS4-1G > A (c.442-1G > A) (4.32%), p.Val399 = (4.11%), p.Arg241Cys (4.11%), p.Arg111* (2.95%), IVS7 + 2 T > A (c.842 + 2 T > A) (2.95%), and p.Ile65Thr (2.53%). The highest frequencies of variants were found in exon 7 (33.04%), exon 11 (15.29%), exon 12 (12.7%), exon 5 (10.4%), exon 3 (9.15%) and exon 6 (8.37%), respectively.

Table 1 PAH gene mutational spectrum in PKU patients in Northwest China

The 895 variants were found at 128 sites and involved 8 variant types, including 83 missense variants, 20 splice site variants, 12 non-sense variants, 7 large deletion variants, 4 minor deletion variants, 1 large-scale duplication variant, 1 silence/splicing variant, and 1 5’UTR variant.

Large-scale deletion/duplication analysis

A total of 74 cases without two known pathogenic variants were identified by exon sequencing. PAH gene deletion/duplication variant was analyzed in those 74 patients using the MLPA method. Seven large deletion/duplication variants of PAH gene were detected in 25 patients. Two of the seven deletion variants were located in the 5’UTR of exon 1 of the PAH gene. Four other deletion variants were located in exon 3 to exon 7, and the last one was a duplication variant in exon 12 (Table 1, Fig. 1).

Fig. 1
figure 1

Analysis of deletions and duplications detected by MLPA. a, normal sample; b, deletion in exon1 upstream region (D2); c, exon 12 duplication mutation (Ex12dup). The ratios range of normal reference is 0.7–1.3; the ratios indicate 3 copies in the range of 1.3–1.7, which is judged to be a heterozygous duplication mutation; the ratios indicate a single copy from 0.3 to 0.7, which is judged to be a heterozygous deletion mutation; the ratios indicate homozygous at 0 to 0.3 and is judged to be a homozygous deletion mutation

Gap-PCR products were sequenced for large-scale deletions involving exons 1 and 3, and abnormally amplified fragments were detected (c.1-4163_1-406del3758, c.1-1932_60 + 3402del5269ins56, and c.169-4940_352 + 1459del6604ins8). Gap-PCR products using primers IVS5-8F and IVS-1R revealed that the length of the amplified fragment of exon 6 deletion variant was 804 bp, while the normal amplified fragment was 8597 bp. The PCR product was identified by sequencing and compared with the reference gene sequence to confirm that a 2 G bases replaced a fragment of 7793 bp, which was from the nucleotide −6708 of intron 5 to the nucleotide +888 of intron 6. According to the HGVS naming rules (den Dunnen et al. 2016), this variant was named c.510-6708_706 + 888del7793insGG (abbreviation: Exon6del7793insGG) (Fig. 2).

Fig. 2
figure 2

Large-scale mutation breakpoint analysis. a, deletion variant in Exon 1 region (Ex1del3758); b, deletion variant in Exon 1 region (Ex1 del5329ins56); c, deletion variant in Exon 3 region (Ex3 del6599ins8); d, deletion variant in Exon 6 region (Exon6del7793insGG)

Novel variants in PAH gene

A total of 20 of the 128 variants were not reported in the ClinVar, HGMD, or PAHvdb databases (Fig. 3). In addition, 3 of the 128 variants were recorded with extremely low frequency and unknown pathogenicity in 1000 Genomes, dbSNP and ExAC databases. Pathogenicity analysis showed that 6 missense variants were likely pathogenic, and that 4 missense variants were variant with the unknown clinical significance. In addition, abnormal splicing was predicted for the IVS9 + 4A > T (c.969 + 4A > T) variant in exon 9, where the original splice site was removed (Table 2).

Fig. 3
figure 3

Sequencing results for novel variants. WT: wild type; MT: mutant type

Table 2 Functional predictions of novel mutations detected in the PAH gene

Correlation between variants and phenotype in patients with PKU

The detection rates of variants in the three types of patients (classic PKU, intermediate PKU, and MHP) were 96.25%, 94.79%, and 88.89%, respectively (P < 0.05). The frequencies of each PAH variants were compared among the three phenotype patients. We used the variants with frequency more than 1% for this comparison. The distributions of the 7 variant sites with the highest frequencies were significantly different among the three types of patients: p.Arg243Gln (P = 0.0057), IVS4-1G > A (P = 0.0323), p.Arg241Cys (P < 0.0000), p.Arg111* (P = 0.0208), p.Arg53His (P < 0.0000), p.Thr418Pro (P = 0.0003), and p.Gln419Arg (P < 0.0000). The p.Phe392Ile and p.Val230Ile variants were only detected in patients with MHP. The p.Leu430Pro variant was only detected in moderate PKU patients. The p.His107Arg variant was detected only in MHP and moderate PKU patients. Finally, the frequencies of the p.Arg53His and p.Gln419Arg variants were clearly higher in MHP patients than in the other two types of PKU patients (Table 1).

Discussion

A large cohort of 475 PKU patients detected by newborn screening in the past 10 years were recruited, a variant detection rate of 94.21% (95% CI, 92.72%–95.70%) confirms the pervasiveness of genetic mutations in the PAH gene. Variants in six exons accounted for 88.61% of the total, which are also variantal hot spots in the other regions of China (Li et al. 2015; Song et al. 2005). Among the 128 variants detected in this study, those with the highest frequencies are consistent with the results from other large-scale studies in Chinese populations, with the p.Arg243Gln and EX6-96A > G variants being the most common (Gao et al. 2011; Guo et al. 2011). A novel finding of our study is the p.Ile65Thr variant, which had a relatively high frequency of occurrence. The similarity of our results with those found in previous reports from Shaanxi (Qiang et al. 2014), Ningxia (Zhang et al. 2015), Qinghai (He et al. 2015), and Gansu (Yan et al. 2009) regions lay a theoretical foundation for the next stage of early screening.

One deletion variant (c.1-4163_c.1-406del3758) was found in compound heterozygous patients, having the same previously reported breakpoints (Lu et al. 2011). Chen et al. (Chen et al. 2002) found that a liver-specific enhancer is located in this deleted region. PAH transcriptional activity is severely damaged because of this deletion and results in the classic PKU phenotype. Our results showed that this harmful variant accounted for 57.14% (16/27) of the large-scale deletion variants and is the most common large-scale deletion variant in the Northwest Chinese population. A potentially unique type of variant in the Northwest of China was identified in this study: a deletion variant (Exon6del7793insGG). This variant results in a codon deletion in exon 6 of the PAH gene and results in a truncated protein without the key catalytic region and directly impacts PAH function.

Our predictions showed that the original splice mutation in exon 9 was disrupted by IVS9 + 4A > T resulting in abnormal splicing and classical PKU, but other two variants (c.843-6 T > C and c.970-3C > T) predicted not to affect splicing occurred in classical PKU and MHP patients, respectively. Because differences existed between the results predicted by pathogenicity scoring system and the phenotype, and given that splicing occurs in vivo, functional studies regarding splice site variants are still necessary. Although this study has further demonstrated the different detection rates of PAH gene variants among the three types of PKU patients (P < 0.05) (Bercovich et al. 2008; Zhao et al. 2016). Most of the MHP patients were compound heterozygous patients with one mutation in PAH gene. Transient hyperphenylalaninemia caused by a slight decrease in enzyme activity due to a single polymorphism, young age, liver immaturity, or other unknown cause remains to be confirmed by further follow-up results.

The p.Arg243Gln variant, a severe type of variant with 13% residual enzyme activity (Pey et al. 2007), was found as the main variant in classical PKU with 16.01% of variant frequency in this study. We also found that the frequency of the p.Arg53His variant in patients with MHP was 13.89%, which was the variant with the highest frequency in this type of patient as same as the results from another large sample study (Zhao et al. 2016). The results of in vitro expression experiments showed that the residual enzyme activity of p.Arg53His variant was 79%, and a high phenylalanine metabolism ability was obtained in patients with p.Arg53His and other mutations (Polak et al. 2013). It is reported that multiple patients have p.Arg53His in cis with other pathogenic variants (Dobrowolski et al. 2011), and some researchers believe that the variant may be a polymorphism in the normal population (Lu et al. 2011).

In summary, the Sanger sequencing combined with MLPA detection technology demonstrated the PAH gene mutation spectrum in Northwest China, enriched the PAH gene mutation library, and laid a theoretical foundation for the next step of population carrier screening. At the same time, the comparison of variant frequencies among patients with phenotypes of different severity further clarified the correlation between compound heterozygosity and phenotype, and provided the basis for phenotypic prediction.