Introduction

The Mucopolysaccharidoses (MPS) are a group of inherited lysosomal storage disorders (LSDs) caused by the deficiency of specific enzymes that catalyze the stepwise degradation of glycosaminoglycans (GAGs) leading to accumulation of fragments of GAGs in the body and subsequent widespread tissue damage [1]. The clinical manifestations in these patients include coarse facial features, spinal deformities, skeletal deformities (dysostosis multiplex), joint contractures, short stature, corneal clouding, inguinal/umbilical hernias, hepatosplenomegaly, recurrent respiratory infections and psychomotor retardation [2,3,4]. Due to overlapping clinical presentation, it is difficult to differentiate within the MPS subtypes as well as from other LSDs and rheumatologic disorders [5]. Establishment of diagnosis purely on clinical observations is rarely conclusive; biochemical diagnosis or genetic confirmatory testing is always required. Enzyme assays are considered the most definitive tests, though mutation analysis is often equally effective [6].

The differential diagnosis of MPS remains a clinical challenge as analyzing all the screen-positive cases for 11 different enzymes is cumbersome and unaffordable; hence, several researchers focused on biochemical characterization of GAGs for the differential diagnosis of MPS. One-dimensional electrophoresis showed a high degree of false negative and false positive results limiting their clinical utility. We have recently proposed the utility of two-dimensional electrophoresis (2-DE) in the differential diagnosis of MPS [7]. However, the spot distribution limits its utility for densitometric measurement and the information is more or less qualitative in nature.

With the advent of mass spectrometric tools, specific quantification of GAGs is possible [8]. However, there is a need to integrate this information using machine learning tools in order to improve its clinical utility in the differential diagnosis. Specific enzyme assays and molecular analysis are well-established modes of confirmatory diagnosis of MPS.

There have been recent advances in the treatment of certain subtypes of MPS with bone marrow or hematopoietic stem cell transplantation (HSCT) and enzyme replacement therapy (ERT) [9,10,11,12,13,14,15]. The prognosis is better if the treatment is started early. It is, therefore, crucial to diagnose as early as possible in order to provide targeted treatment options for MPS patients.

In the current study, an attempt was made to capture liquid chromatography–tandem mass spectrometry (LC–MS/MS) and 2-DE GAG profiles and translate them into classification and regression trees (CART) using machine learning tools. This enables quick decision making in referring specific enzyme assays and targeted mutation analysis.

Materials and methods

Study population

Control urine samples (n = 153) from healthy individuals were collected to establish GAG reference values with age range of 0.1–41 (6.0 ± 6.8) years and male-to-female ratio of 99: 54. Eighty-nine patients (62 males and 27 females) were diagnosed with different types of MPS type I (n = 28), II (n = 17), III (n = 23), IV (n = 12), VI (n = 6), and VII (n = 3) with an age range: 0.1–25.4 (5.6 ± 4.4) years formed an experimental group. The non-specific quantitative urinary total GAG, qualitative 2-DE, and the three urinary GAGs levels (DS, HS, and KS) on LC–MS/MS were evaluated. The diagnosis of specific MPS type was confirmed by specific enzyme assays in leukocytes and by identification of a pathogenic mutation. None of the patients had received ERT or HSCT before entering this study.

Urinary GAG quantitative and qualitative estimation

Urine samples were used for estimation of urinary GAG quantification and 2-DE. Urinary GAG quantification was carried out by the standard dimethylmethylene blue (DMB) dye method, and 2-DE was performed with Alcian blue reagent method [16]. GAG/creatinine ratio (milligrams of GAG per millimole of creatinine) was used as a measure of the urinary excretion of GAG. Creatinine estimation was carried out by the standard method [17].

LC–MS/MS analysis

The LC–MS/MS method for GAG disaccharide analysis of chondroitin sulfate (CS), heparan sulfate (HS), dermatan sulfate (DS), and keratan sulfate (KS) was done by the method reported by Auray-Blais et al., Chuang et al. [18, 19]. GAGs were first precipitated from urine using the Alcian Blue (AB) reagent containing sodium acetate. Then, sodium chloride and methanol were used to dissolve MPS-AB complex. Sequentially, sodium carbonate was added to dissociate the MPS complex and AB. Finally, ethanol was used to re-precipitate MPS. After evaporating to dryness, the precipitate was dissolved in the water based on DMB values in mg/l, followed by methanolysis. In 1.5 ml microcentrifuge tubes 2 µl of precipitated GAG samples were taken and 200 µl of 3N Methanolic HCl and 10 µl of 2, 2-dimethoxy propane were added. Samples were dried under N2 stream at 65 °C for 75 min. To this, 100 µl of acetonitrile was added and further dried under N2 stream. Samples were reconstituted with 100 µl of NH4OAC (10 mM). Internal standards, [2H6] DS, [2H6] CS, and [2H6] HS were prepared in-house by deuteriomethanolysis of CS, DS, and HS [20]. Disaccharides were separated on an Atlantis T3 C18 column (3 µm, 2.1 × 50 mm, Waters Corporation) over 6.5 min at a flow rate of 450 µl/min. Before injection, all of the clinical samples were passed through a disposable 0.2 µm filter to minimize the ion suppression due to endogenous contaminants. Data acquisition was performed by selected reaction monitoring (SRM) using the protonated molecular ion transition mass-to-charge ratio (m/z) 384.2 > 161.9 for HS derived disaccharides and m/z 426.1 > 236.2 for DS and CS-derived disaccharides. The m/z values of [2H6] HS-derived disaccharides were 390.2 > 168, and m/z values for [2H6] CS and [2H6] DS disaccharides were 432 > 239. Quantifications were achieved using peak areas that were processed using Analyst 1.5.2™ software (AB Sciex).

Enzyme analysis of MPS types

Four milliliters of peripheral blood was collected in a sodium heparin vacutainer. Plasma was separated, and leukocytes were isolated from sodium heparin whole blood. Leukocytes were stored at − 80 °C till process. All MPS disorders were confirmed by respective enzyme analysis assays. These assays were carried out by using artificial fluorigenic (4-methylumbelliferyl) substrates, except arylsulfatase B which was assayed by photometric method using para-nitrocatechol sulfate K2 as substrate [21,22,23,24,25,26,27,28,29,30]. The fluorescence of the 4-methylumbelliferyl was measured on multimode reader at λEx of 362 nm and λEm of 448 nm. The absorbance of para-nitrocatechol was measured at 515 nm.

Mutation analysis

Genomic DNA was prepared from peripheral blood leukocytes by high salt extraction. PCR amplification of genomic DNA in MPS patients was carried out using oligonucleotide primers (Table 1). PCR products were purified and sequenced using a DNA sequencer. All amplified fragments flanking the exons were analyzed to identify variations. The resultant sequences were imported into Codon-code aligner software for alignment, editing, and mutation analysis.

Table 1 PCR primer sequences used in mutation analysis

Development of classification and regression models

For the development of CART model, pattern of dermatan sulfate, heparan sulfate, keratan sulfate was used as input variable to derive the type of MPS. From the given set of input variables, the most significant classifier formed the apex of the tree. It bifurcates into two classes based on a threshold value. Further branching continues until clear discrimination between classes is established based on ‘if’ and ‘then’ rules. At each node, there will be one input variable branching into two based the threshold value. Finally, a decision is made whose performance can be verified by computing a different combination of input variables to derive the output. In order to implement such a model in a diagnostic set-up, the performance characteristics, namely overall accuracy, sensitivity, specificity, positive predictive value, and negative predictive values, were assessed.

In silico analysis

In order to cross-verify identified mutations and their impact on the protein, ww.mutalyzer.nl was used. Further, the sequence change was computed in mutation taster to retrieve the functional impact of the mutation and also to check the mutation frequency in 1000 genome and ExAc databases.

Statistical analysis

Fisher exact test was used to calculate the performance characteristics of the model. In parallel, receiver operating characteristic (ROC) curves were also plotted to assess the overall diagnostic utility of the models.

Results

Urinary quantitative GAG analysis

The reference ranges for GAG by DMB method were determined from 153 healthy control persons (males 99 and females 54) with different age groups (i.e., 0–6 months, 6–12 months, 1–2 years, 2–4 years, 4–6 years, 6–8 years, 8–10 years, and > 10 years). High DMB ratio is observed in neonatal and infantile age group, i.e., 26.3 ± 14.6 mg/mM creatinine and very less DMB ratio in adult age group, i.e., 3.7 ± 1.3 mg/mM creatinine. The average concentration of urinary CS found in normal healthy controls is 35.5 ± 27.9 µg/ml with very low concentrations of KS (4.6 ± 3.0 µg/ml) HS (0.3 ± 0.2 µg/ml) and DS (0.2 ± 0.1 µg/ml) (Table 2).

Table 2 Age group-wise reference intervals (n = 153) of Glycosaminoglycans using LC–MS/MS and DMB spectrophotometric method

CART model of 2-DE-GAG

The 2-DE of GAG identifies DS, HS, and KS as bands. The presence or absence of the band was computed to construct classification and regression tree (CART) using purity criteria. The branching of tree stops when it can clearly classify them into certain groups based on ‘if’ and ‘then’ rules.

Positivity for DS and HS was characteristics of MPS I, MPS II and MPS VII. DS positivity and HS negativity were observed in MPS VI. DS negative MPS were segregated based on the presence or absence of KS band. KS positivity is characteristic of MPS IVA and MPS IVB. HS band alone was observed in MPSIIIA, MPS IIIB and MPS IIIC. Based on these criteria, the overall accuracy of this model in differential diagnosis was 96.3% (Fig. 1a).

Fig. 1
figure 1

CART model of 2-DE-GAG and LC–MS/MS-GAG. a The two-dimensional electrophoresis (2-DE) of glycosaminoglycans (GAG) identifies dermatan sulfate (DS), heparan sulfate (HS), and keratan sulfate (KS) as bands. The presence or absence of the band was computed to construct classification and regression tree (CART) using purity criteria. The branching of tree stops when it can clearly classify them into certain groups based on ‘if’ and ‘then’ rules. b Tandem mass spectrometric (LC–MS/MS) quantification of glycosaminoglycans (GAG) was used to establish thresholds of dermatan sulfate (DS), heparan sulfate (HS), and keratan sulfate (KS) that can classify the groups based on ‘if’ and ‘then’ rules based on purity criterion. The classification and regression tree (CART) segregated the classes in tree-like fashion starting from apex to branches

CART model of LC–MS/MS-GAG

Construction of CART model established thresholds of HS, DS, and CS for the differential diagnosis of MPS. HS is the key predictor in differential diagnosis with a threshold value of 4.93 µg/ml. HS > 4.93 µg/ml and DS > 10.22 µg/ml were characteristics of MPS I, MPS II, and MPS VII. HS > 4.93 µg/ml and DS < 10.22 µg/ml was observed in MPSIIIA, MPS IIIB, and MPS IIIC. HS < 4.93 µg/ml and DS < 0.47 µg/ml were ruling out the possibility of MPS. HS < 4.93 µg/ml, DS > 0.47 µg/ml, and KS > 10 µg/ml are characteristic of MPS IVA and MPS IVB. HS < 4.93 µg/ml, DS > 0.47 µg/ml, and KS < 10 µg/ml are characteristic of MPS VI (Fig. 1b). The overall accuracy of this model in the differential diagnosis of MPS was 98.3%. As shown in Fig. 2, HS levels are significantly higher in MPS IIIA, MPS IIIB and MPS IIIC. DS levels were higher in MPS I, MPS II, MPS VI, and MPS VII. KS levels were elevated in MPS IVA and IVB.

Fig. 2
figure 2

Distribution of heparan sulfate, dermatan sulfate, and keratan sulfate in different MPS types. As illustrated, HS levels are higher in MPS IIIA, MPS IIIB and MPS IIIC. DS levels were higher in MPS I, MPS II, MPS VI, and MPS VII. KS levels were higher in MPS IVA and MPS IVB. Error bars represent Mean ± SD. The statistical significance (p value) between different groups being compared were labeled as ****(p < 0.0001) and ***(p < 0.001)

Table 3 compared the advantage of LC–MS/MS over the 2-DE in GAG characterization and differential diagnosis of MPS. Total GAG was inversely associated with the age of onset [r = − 0.76, p < 0.0001]. All GAG types (CS, DS, HS, KS) were inversely associated with age of onset in MPS II [DS: r = − 0.64, p < 0.005; HS: r = − 0.59, p < 0.01; KS: r = − 0.57, p < 0.01]. Figure 3 showed the representative 2-DE data and LC–MS/MS spectra used to generate the machine learning tools.

Table 3 Comparison of mass spectrometry and 2-DE techniques for GAG characterization
Fig. 3
figure 3

Representative two-dimensional electrophoresis and LC–MS/MS mass spectra used to generate machine learning tools. a Two-dimensional electrophoresis of GAGs. b LC–MS/MS mass spectra of GAGs a, b mass spectrum from methanolysate of CS, DS and deuteriomethanolysis product of CS, DS; c, d mass spectrum from methanolysate of HS and deuteriomethanolysis product of HS; e, f mass spectrum from methanolysate of KS and deuteriomethanolysis product of KS

Specific enzyme assays

Ten specific enzyme assays were performed to confirm the diagnosis of MPS type. All the assays were fluorometric except for MPS VI. Total GAG excretion was highest in MPS I, MPS II, and MPS VII. Using the cases doubly confirmed through biochemical and molecular approaches, thresholds of specific enzyme activity diagnostic for each type of MPS were also established. β-glucuronidase and N-Acetyl glucosamine 6-sulfate sulfatase activities were < 1.0% of mean normal; β-galactosidase and glucosamine-N-acetyl transferase activities were < 2.0% of mean normal; α-N-acetylglucosaminidase and galactose 6-sulfate sulfatase activities were < 3.0% of mean normal and remaining MPS types showed activities < 5.0% of mean normal in homozygous mutants (Table 4).

Table 4 Levels of urinary glycosaminoglycans and leukocyte enzyme activities in MPS patients (n = 89)

Mutation spectrum

Among IDUA mutations, c.1469T > C was the most common followed by c.784delC, c.532G > A, c.908T > C, c.1759C > T. Out of the 31 different mutations identified in IDUA, 9 were non-sense mutation resulting in premature termination, which contribute to 32.3% cases, i.e., c.436A.T (p.K146*), c.606C > G (p.Y202*), c.895 G > T (p.E299*), c.1029C > A (p.Y343*), c.1750C > T (p.Q584*), c.1759 C > T(p.Q587*), c.1855C > T (p.R619*), c.1861C > T (p.R621*), c. 1882 C > T (p.R628*). Two gross deletions were also reported, i.e., ex8_ex14del and ex9_ex14del. IDS mutation spectrum was highly heterogeneous involving 7 different mutations in 8 patients. There were 4 missense mutations, i.e.,c.253G > A (p.A85T), c.263G > A (p.R88H), c.329G > A (p.R110K), c.1402C > T (p.R468W); two frameshift mutations, i.e., c.1467_1468insG (p.Y490Vfs*9) and c.474_474delT (p.H159Ifs*54); one large deletion, i.e., c.982_996del (p. I328_T332-DEl) in IDS gene. In MPS IIIB, NAGLU c.1693C > T (p.R565W) and c.1914_1915insT (p.E639*) are the most common accounting for 6 cases out of 9. The other mutations observed were c.1694G > T (p.R565L) and c.2209C > G (p.R737G). SGSH c.1129C > T (p.R377C) was observed in only one case diagnosed to be MPS IIIA. In MPS VI cases (n = 2), two different ARSB mutations were identified, i.e., c.479G > A (p.R160Q) and c.1208_1208delC (p.S403Y). Only one case of MPS IVA had GALNS c.647T > C (p.F216S) mutation (Fig. 4).

Fig. 4
figure 4

Identified mutations in IDUA and IDS genes. a In the current study, except for exon 1, 2 and 12, all other exons had IDUA mutations. Out of the identified mutations, nine were nonsense mutations. b In the current study, four missense, two frame shift, and one large deletion (exon 7) were identified in IDS gene

Discussion

The utility of machine learning tools in diagnosing Gaucher disease type I based on trabecular bone microarchitecture has been demonstrated in a recent study [31]. However, till date no such tools were used for the differential diagnosis of MPS. To the best of our knowledge, this is the first study to demonstrate the application of machine learning tools for differential diagnosis of MPS based on GAG pattern. We compared 2-DE and LC–MS/MS-based methodologies to achieve this objective. Both were informative, the first being qualitative in nature and the second being quantitative. The performance of LC–MS/MS-based GAG pattern was better than the 2-DE pattern. This application will have direct utility for the physician in deciding which specific enzyme assays to be performed to arrive at a conclusive diagnosis. Further, this study indicates the percentage of specific enzyme activity that can be considered diagnostic of specific MPS.

Although there are several studies that have utilized LC–MS/MS-based GAG characterization, there were no efforts to translate this data into the specific type of MPS. Auray-Blais et al. [6] applied LC–MS/MS data to investigate changes in GAG during treatment, mainly in MPS I, II, and VI. Our data are consistent with Li et al. in classifying MPS into MPS I, II, III, IV, and VI based on GAG pattern on LC–MS/MS. However, minor cutoff differences were observed between their study and ours suggesting that there could be population-level differences in GAG that needs to be considered before applying the proposed CART model. Mashima et al. [32] have demonstrated that GAG profile can also be used to differentiate subtypes of MPS, e.g., attenuated versus severe forms of MPS II. Our study is consistent with their observation and indicated an inverse association of HS and DS concentrations with age of onset in MPS II. In other terms, GAG pattern also serves as severity index. However, no such association is observed in MPS I. Although the accuracy of the machine learning tools appears to be 96.3–98.3%, there is still occasional misdiagnosis. The machine learning tools are based on classification and regression models, and hence, thresholds of GAGs were used as purity criteria for branching and splitting of the tree. In view of this, it is likely that few cases with borderline values of GAGs might be misdiagnosed. Since this model is dynamic in nature, we can further improvise with larger data sets.

Among the studies from India, our results are consistent with Utterilli et al. [33] in demonstrating a high frequency of c. 1469T > C, c.784delC mutations. However, we observed a total of 31 different mutations of IDUA in 40 patients, indicating a high degree of allelic heterogeneity in MPS I. Similar to Utterilli et al., we observed c.253G > A, c. 263G > A, and c.[1402C > T] mutations in IDS gene. In addition, we observed c.474_delT, c.982_996del, c.1467_1464insG, and c.263G > A mutations in the IDS, which were not reported in their study. The current study is the first reporting the NAGLU mutations from India. We observed three missense mutations and one insertion mutation in NAGLU gene. Out of the two ARSB mutations observed in the current study, c.479G > A was reported by Mathew et al. [34] while c.1208_1208delC was reported by Utterilli et al. 2016 [33]. The first SGSH mutation identified in Indian subject is c.613G > C mutation, while we report another mutation, i.e., c.1129C > T in one of our patients. We have only one case of GALNS c.647T > C mutation, which was already reported earlier in the Asian Indians [35].

The major strengths of our study are: (i) Three-tier evaluation of MPS cases in terms of GAG characterization, specific enzyme assay and targeted mutation analysis; (ii) construction and application of machine learning based CART models for early diagnosis while referring specific enzyme assays; (iii) establishing the diagnostic threshold of specific enzyme assays to differentiate affected vs. unaffected; and (iv) reporting of Indian specific mutation spectrum, mainly for MPS I and MPS II.

Conclusion

To conclude GAG characterization when coupled with machine learning tools facilitate quick decision making in referring specific enzyme assays to differentially diagnose the type of MPS, with improved accuracy. Hence, the proposed CART models can be adapted as first tier testing in GAG positive-MPS types. Compared to 2-DE, LC–MS/MS-based GAG analysis suggested to have a higher clinical utility in the differential diagnosis of MPS. In specific enzyme assays, residual enzyme activity < 5% is found to be diagnostic of a particular type of MPS. In view of highly heterogeneous mutation spectrum of MPS-related genes such as IDUA and IDS among the South Indians, the CART models of GAG play a significant role in the differential diagnosis of MPS as demonstrated in the current study.