Introduction

Autism spectrum disorder (ASD) is a complex neuro-behavioral syndrome usually described as a heterogeneous group of neurodevelopmental disorders, which is believed to affect about 1:68 children just only in US population (Christensen et al. 2016) and 1:160 pediatric subjects globally in the world (Elsabbagh et al. 2012). Standardized ASD diagnostic criteria were published and recently reviewed in the World Health Organization’s International Classification of Diseases (ICD-11 will be published in 2018) and the American Psychiatric Association’s Diagnostic and Statistical Manual, fifth edition (DSM-5) (APA 2013). Central issues to these criteria are impaired social development, communication deficits, and pathological lack of flexibility or “insistence on sameness” (Volkmar and Reichow 2013). Current diagnosis practices are based on phenotypic characterizations that rely on standardized scoring systems. These diagnostic methodologies have been vital for advancing clinical practice and research, but fall short of enabling early diagnosis and preclinical disease prediction. ASD-associated deficits are commonly recognized in children during their first 12 to 24 months of age (Zwaigenbaum et al. 2015), but a reliable diagnosis is often made at 3 years of age or later (Woolfendena et al. 2012; Steffenburg et al. 2018; Sharma et al. 2018).

It is widely accepted that early diagnosis provides valuable opportunities for primary intervention and better prognosis in ASD (Boyd et al. 2010; Debodinance et al. 2017). Therefore, it is particularly advantageous being able to predict ASD before the onset of prodromal signs and symptoms, a goal that is not currently attainable in the absence of suitable biomarkers (Sices et al. 2017). Biomarkers are measurable outputs that indicate the presence of a disease or an outcome, including biochemical analytes and imaging data. In ASD diagnosis, biochemical analytes are typically measured in body fluids (e.g., blood, urine, saliva, and cerebrospinal fluid); they are easy to measure, cost-effective, and do not often require invasive procedures (Mayeux 2004; Nunes et al. 2015; Beversdorf and Missouri Autism Summit Consortium 2016). The very recent years have witnessed an increasing interest in the search for suitable biomarkers for the early diagnosis of ASD (Daniels and Mandell 2014; Uddin et al. 2017; Prata et al. 2017).

It is noteworthy that ASD is caused by the combined action of various genetic, epigenetic, and environmental factors, rather than a single mutation or a single simple pathogenetic cause or mechanism (Volkmar and Reichow 2013; Beversdorf and Missouri Autism Summit Consortium 2016). Consequently, even a single, non-polymorphic defined phenotype might be caused by multiple panoplies of different underlying mechanisms, which may trigger an effective and safe treatment in one patient but not necessarily in another one (Volkmar and Reichow 2013). On the one hand, the remarkable genetic heterogeneity of ASD may raise a challenge that may hamper the search for a more general and a wider ASD biomarker collection (Geschwind and Levitt 2007; Khramova et al. 2017). On the other hand, class-specific biomarkers may guide a better understanding of the underlying mechanisms of ASD, thus providing a tool for tailoring therapeutic strategies to specific classes of ASD patients (Loth et al. 2016).

In the present study, we reappraised and re-analyzed previously published data (Alabdali et al. 2014a, b) using a different mathematical approach, to highlight new important insights on ASD biomarkers. Unlike our previously reported investigations, none of the participants was needed to be excluded as a possible outlier in the current study. We used principal component analysis (PCA) to verify the authenticity of the classification of participants based on selected biomarkers and used multiple statistical tests to verify the obtained results. More importantly, we evaluated the effect of using multiple biomarkers simultaneously on the accuracy of predicting disease occurrence, an approach previously suggested as a way to improve prediction accuracy (Gupta et al. 2013; Abruzzo et al. 2015).

Methods

Participants

Participants enrolled in the present study were previously described (Alabdali et al. 2014a, b). Briefly, 58 male autistic patients ranging in age from 3 to 12 years (mean 7.0 ± 2.34 SD) were recruited through the Autism Research and Treatment Centre, Faculty of Medicine, King Saud University, Riyadh, Saudi Arabia. Patients enrolled in the study were diagnosed with ASD according to the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders and further updates (APA 2000; Sharma et al. 2018; Galiana-Simal et al. 2018). A number of 32 age- and gender-matched control participants (mean age 7.2 ± 2.14 SD) were recruited from children who came to the Well Baby Clinic at King Khalid University Hospital for routine checking. Control subjects did not show any signs or symptoms of infectious diseases or neuropsychiatric disorders. All participants had normal erythrocyte sedimentation rates and urine analysis results. The Ethical Committee of the Faculty of Medicine, King Saud University approved the present study. Participants’ parents or legal tutors signed informed consents before any sample were collected. The experimental design of the whole research study was consistent with the principles of the Declaration of Helsinki (General Assembly of the World Medical Association 2014).

Measures of Disease Severity Among Autistic Patients

Disease severity was measured using the Childhood Autism Rating Scale (CARS) and the Social Responsiveness Scale (SRS) (Chen et al. 2018). To obtain a CARS score, each child was rated on a scale ranging from 1 (normal) to 4 (severely abnormal) with respect to each of 15 criteria (relating to others; imitation; emotional response; body use; object use; adaptation to changing; visual response; listening response; taste, smell, and touch responses; fear and nervousness; verbal communication; non-verbal communication; activity level; level and reliability of intellectual responses and general impressions). A final score was obtained by computing the sum of the 15 individual scores, resulting in a combined score that could range from 15 to 60. Scores below 30 were considered non-autistic; 30–36.5 were considered mild to moderate autism and scores greater than 36.5 were considered severe autism (Mick 2005). SRS scores were generated from the results of a questionnaire, with scores ranging from 60 to 75 considered mild to moderate, and scores of 76 or greater considered severe autism (Constantino et al. 2003). Patients with a history of epileptic seizures, obsessive-compulsive disorder, fragile X syndrome, or any psychiatric or neurologic disorder other than autism were excluded from the study.

Biomarker Data Collection

Blood samples were treated as previously described (Alabdali et al. 2014a, b). Briefly, whole blood specimens were collected by venipuncture using heparin as an anticoagulant. Plasma and red blood cells were separated by centrifugation and stored at – 80 °C until used. Biomarkers were properly selected to represent various physiological processes with established links to ASD. Serotonin, gamma-aminobutyric acid (GABA), and dopamine are related to brain neurochemistry; the hormone oxytocin has been shown to improve social interactions in ASD patients (Yatawara et al. 2016); interferon-gamma-inducible protein-16 (IFI16) is associated with neuroinflammation and ASD (Alabdali et al. 2014b) and glutathione-S-transferase (GST), vitamin E, mercury, and lead are markers associated with xenobiotic toxicity and their scavenging by detoxification and antioxidant enzyme complex have also been associated with ASD (Alabdali et al. 2014b). All analytes, except for lead and mercury, were measured in plasma. Lead and mercury were measured in red blood cells. Experimental procedures used to measure these analytes have been described elsewhere (Alabdali et al. 2014a, b). Raw data are shown in Tables 1 and 2.

Table 1 Collected blood raw data from control participants of each biomarker investigated in the present study
Table 2 Collected blood raw data from autistic subjects of each biomarker investigated in the present study

Assessing the Accuracy of Prediction

Two methods were employed to evaluate the accuracy of biomarker-based predictions of binary clinical outcomes (e.g., autism versus healthy control or having severe versus mild/moderate disease). One method relies on calculating the area under a ROC curve (AUC). Receiver operating characteristic (ROC) curves are generated by graphing biomarker sensitivity on the vertical axis and specificity subtracted from one (1—specificity) on the horizontal axes for all possible biomarker values. The aim is to graphically illustrate the trade-off between sensitivity and specificity at all possible cut-off values of a continuous biomarker. A biomarker with perfect sensitivity and specificity is the one that yields an AUC of 1.0, while a useless biomarker yields an AUC of 0.5. An AUC of 0.5 indicates that the predictions made using the biomarker are equivalent to chance or random guessing. AUC values below 0.5 should indicate that the predictions made using the biomarker are more often false than true (Perlis 2011). The second method is a library-based identification, which relies on comparing subjects of unknown classification to a library of subjects of known classification. Therefore, a library must be constructed with subjects organized into units of unique classifications. Each of the libraries used in the current study contained 2 units, one for autistic and the other for healthy control participants. Unknown participants were then submitted for identification by determining the library unit to which the unknown subject is most similar. Similarity can be determined using various coefficients. In the present study, pairwise similarities were calculated using Canberra distances (Eq. (1)), and matching to a library unit was accomplished using the K-nearest neighbor algorithm. Using this algorithm, a user-defined number of top matches is determined for each unknown, and the unknown is simply assigned to the unit containing the largest number of those top matches. This number becomes a score that can be used as a measure of confidence in the identification process. It was in the present study based on the top five most similar library entries, giving rise to scores ranging from 0 to 5.

Designing Biomarker Profiles

In the present study, data for each of the nine investigated variables (biomarkers) were available for some but not all participants (Tables 1 and 2). To maximize the use of participants and variables, five biomarker profiles were constructed. Profile 1 contained all variables and only those participants with no missing data for any of the nine variables (10 controls, six autistics). Similarly, profile 2 contained eight variables (25 controls, nine autistics), profile 3 contained seven variables (25 controls, 20 autistics), profile 4 contained six variables (25 controls, 21 autistics), and profile 5 contained five variables (30 controls, 40 autistics).

Statistical Analysis

Data were expressed as means ± SD (standard deviations). Statistical analysis of quantitative data was performed using a nonparametric test. An ANOVA with a two-tailed t test was used to determine the significance of differences observed in biomarker values between autistic and control participants. A p value of < 0.05 was considered significant.

PCA and multidimensional scaling (MDS) were performed using Bionumerics version 6.6 (Applied Maths, Austin, TX) or IBM SPSS version 22 as previously described (El-Ansary et al. 2016). Briefly, the inputs into PCA and MDS were a covariance matrix and a similarity matrix, respectively. Similarity matrices were constructed from all possible pairwise similarities calculated using Canberra distances (Eq. (1)). PCA reduces the number of variables by condensing correlated variables. Therefore, the correlation between some of the variables must exist for the analysis to be meaningful. The presence of correlated variables was tested by Bartlett’s test of sphericity (Bartlett 1937), with a p value threshold of < 0.001. Kaiser-Meyer-Olkin (KMO) measure was used to test the adequacy of the sample sizes (Kaiser 1974; Tomlinson et al. 2013). The number of statistically significant components in PCA was determined using parallel analysis (Monte Carlo simulation) using Brian O’Connor’s syntax for SPSS (O’Connor 2000).

$$ D=\frac{1}{n}{\sum}_{\mathrm{i}=1}^{\mathrm{n}}\frac{\left| Xi- Yi\right|}{\left| Xi+ Yi\right|} $$
(1)

where “D” is the Canberra distance metric, “n” is the number of variables, “i” is the ith variable, and “X” and “Y” are the two participants.

Hierarchical clustering was performed using Bionumerics version 6.6 as previously described (El-Ansary et al. 2016). Briefly, pairwise similarities were calculated using Canberra distances, and dendrograms were constructed using unweighted pair group method with arithmetic mean algorithm. A two-tailed t test was used to determine the significance of differences observed in biomarker values between autistic and control participants. A p value of < 0.05 was considered significant. A t test was performed using GraphPad Prism version 6 (GraphPad Software, Inc., La Jolla, CA). The correlation was estimated by Spearman correlation coefficient, and a p value is assigned based on permutation analysis. Correlation analyses were performed using GraphPad Prism version 6. For analyses involving computation of a Z-score, Z-scores were calculated according to the formula of Eq. (2) using GraphPad software

$$ \mathrm{Z}=\frac{\left(\mathrm{X}-\upmu \right)}{\upsigma} $$
(2)

where Z is the Z-score, X is the observed value, μ is the mean, and σ is the standard deviation.

Results

The Accuracy of Disease Prediction Using Individual Biomarkers

Consistent with previously published results (Alabdali et al. 2014a, b), all nine biomarkers significantly differed between autistic and control groups (Fig. 1). Individual biomarkers were evaluated for their accuracy in predicting the occurrence of disease and disease severity using the AUC method. Most autistic participants had impaired CARS and SRS scores, but some ended up with a normal score using one of the scoring methods. Also, a few participants either had a missing score or were too young to be scored by SRS. For this reason, ROC curves were generated separately for each of the autistic participants with impaired CARS and those with impaired SRS scores. In both groups—henceforth referred to as CARS and SRS groups—all nine biomarkers effectively predicted the occurrence of autism, with AUC values falling between 0.64 and 0.96. Vitamin E was associated with the largest AUC (0.94), followed by dopamine, serotonin, and GST (all > 0.8) in the CARS group, while GST had the largest AUC (0.96), followed by vitamin E, mercury, and dopamine in the SRS group. GABA, mercury, and IFI16 were the only biomarkers able to predict the occurrence of severe autism—as determined by SRS scores—with AUC values ranging from 0.66 to 0.78. None of the tested biomarkers was able to predict the level of CARS impairment (Table 3).

Fig. 1
figure 1

Statistical comparisons of biomarkers. Nine biomarkers showed significantly different serum values in autistic and control participants. Bar graphs show mean serum values of gamma-butyric acid (GABA), dopamine, serotonin, oxytocin, interferon-gamma-inducible-protein-16 (IFI16), glutathione S transferase (GST), vitamin E, mercury, and lead in autistic and healthy control participants. Statistical significance was estimated using a two-tailed t test with p values shown in parentheses. Error bars represent the standard error of the mean for each comparison p < 0.05

Table 3 Assessment of prediction accuracy of nine individual biomarkers and five profiles using the area under a receiver operating characteristic (ROC) curve (AUC)

Combining Biomarkers (Variables) into Profiles Improves Disease Prediction

Next, we asked whether grouping the nine biomarkers into profiles could enhance their predictive power. Five profiles were designed as described in the methods section; the most complex of which consisted of nine variables, and the simplest of five. Employing PCA and MDS in testing these five profiles revealed clear segregation of autistic and control participants, with complete segregation, achieved using profiles of higher complexity—those with larger numbers of variables. Before moving forward with further analyses, we thought to verify several aspects of the PCA analyses. We used Bartlett’s sphericity test to confirm the presence of correlated variables and showed that the absence of correlations in our datasets was extremely unlikely (p values < 0.0001). In terms of the adequacy of sample sizes, KMO measure of sampling adequacy was employed giving rise to values hovering around 0.7. The obtained values were consistent with samples of sufficient sizes for the analyses to be meaningful (Kaiser 1974; Tomlinson et al. 2013). In terms of the significance of principal components, Monte Carlo simulation demonstrated that the first component (PC1) in the analysis of each of the five biomarker profiles was the only significant component (Fig. 2). PC1 was the principal component responsible for most of the segregation between the autistic and control groups. We then examined the contribution of individual variables to the segregation of autistic and control participants by comparing their contribution to the principal component responsible for most of this segregation, in this case, the first principal component. We found that the markers responsible for most of the separation between the two groups (e.g., dopamine, serotonin, GST, and vitamin E) were the same ones that had shown relatively large AUCs. Conversely, markers with small AUCs (e.g., oxytocin and IFI16) did not contribute nearly as much in separating autistic and control subjects in PCA analysis (Fig. 3). To further confirm the authenticity of the segregation between autistic and control participants, we wanted to use a clustering method that differed in principle from PCA and MDS. For this purpose, we used hierarchical clustering, which produced consistent results, further confirming the genuineness of the segregation between autistic and control participants based on our biomarker profiles (Fig. 4). We also compared the AUCs obtained using profiles to those obtained using individual biomarkers. Variables were combined by using either the coordinates of PC1 from PCA or the sum of Z-scores as input in ROC curve analyses. When variables were combined into profiles using PC1 coordinates, we found that complex profiles had AUCs of one (perfect sensitivity and specificity), while simpler profiles had slightly smaller AUCs. In all cases, combining markers led to increased AUCs in both CARS and SRS groups. In our experience, using the sum of Z-scores did not perform as well as individual biomarkers or profiles combined using PC1 coordinates (Table 3).

Fig. 2
figure 2

The aptness of the use of principal component analysis (PCA), adequacy of sample sizes, and statistical significance of principal components. The suitability of PCA for analyzing various datasets was determined using Bartlett’s sphericity test. The sample size was evaluated using the Kaiser-Meyer-Olkin (KMO) test. Statistical significance of components in PCA was estimated using Monte Carlo simulation. Scree plots show eigenvalues of raw (blue), 50th percentile simulated data (green), and 95th percentile simulated data (yellow). Principal components with greater raw than 95th percentile simulated eigenvalues were considered statistically significant

Fig. 3
figure 3

Biochemical profiles are effectively separating autistic participants from healthy controls3. Principal component analysis (PCA) and multidimensional scaling (MDS) were employed to test the segregation of autistic and control participants based on five biochemical profiles. The contributions of variables to the most discriminatory component in PCA are shown, with the top three most contributing variables in bold (bottom right)

Fig. 4
figure 4

Segregation of autistic (red) and healthy control (green) subjects in hierarchical clustering based on biochemical profiles. Five profiles composed of five to nine variables were tested

Our results suggested that complex profiles were better in distinguishing autistic participants from healthy controls and that was shown using mathematically different approaches. Next, we wanted to rule out possible confounding factors to confirm our findings. Although complex profiles outperformed simpler ones in distinguishing autistic from healthy controls, the former were tested on a smaller number of participants (complex profile n = 16–46, simple profile n = 47–71). Consequently, PCA and MDS plots depicting the results of low-complexity profiles contained larger numbers of data points than the ones depicting the results of high-complexity profiles, creating higher density plots for the former compared to the latter (Fig. 3). Higher plot densities could have contributed to the partial overlap between autistic and control groups seen with simple profiles by simply providing more opportunities for overlap due to random chance alone. Thus, further analyses were performed to interrogate this notion.

Additional Testing Confirms that Higher Complexity Profiles Yield Better Separation of Autistic and Control Groups

To investigate whether profile complexity was the principal underpinning of the observed separation between autistic and control subjects, two tests were performed. First, all five profiles were tested using the same number of participants. To do so, we used the small group of participants (six autistics, ten controls) with whom we had a complete dataset covering all variables. Second, we used group-specific means as surrogates for missing data points. In other words, variable means within a group—either autistic or control—were used to substitute for missing data points of the corresponding group. The latter approach enabled the use of a larger number of participants (58 autistic, 32 control) compared to the former. Both tests confirmed that profiles of higher complexity enabled better distinction between autistic and control subjects than simpler profiles. This was demonstrated by tighter group clustering and wider inter-group distances in PCA and MDS plots using the 16 participants with no missing data points (Fig. 5). Using this group of participants for whom missing data were replaced by the corresponding means, better group separation was evident using profile 1 (nine variables) compared to profile 5 (five variables), as demonstrated by PCA, MDS, and hierarchical clustering (Fig. 6). Profiles 2, 3, and 4 were also tested showing results that supported the same conclusion (data not shown).

Fig. 5
figure 5

Principal component analysis (PCA) and multidimensional scaling (MDS) higher complexity profiles yield better separation of autistic (red) and control (green) participants. The same autistic and control participants were analyzed based on profiles composed of five to nine variables. Both principal component analysis (PCA) and multidimensional scaling (MDS) were used. To help illustrate the effectiveness of separation in MDS plots, group compactness was measured by the width of the groups (red and green dotted arrows) and the distance separating the groups (blue dotted arrows). Summary of effectiveness of group separation by MDS is shown in a line graph (bottom right)

Fig. 6
figure 6

A 9-biomarker profile was found to better segregate autistic patients (red) from healthy controls (green) compared to a 5-biomarker profile. Segregation of 58 autistic and 32 control participants were tested using principal component analysis (PCA), multidimensional scaling (MDS), and hierarchical clustering. Similarity matrices for MDS and hierarchical clustering were calculated using the Canberra metric. Missing data were replaced by the corresponding group means

The next question we wanted to answer is whether our biomarker profiles can be used to predict the occurrence of disease within the population of participants included in the current study. Library-based identification was employed to answer this question.

High-Complexity Biomarker Profiles Predict the Occurrence of Disease with 100% Specificity and Sensitivity

Library-based identification was used to compare the sensitivity and specificity of autistic patients’ identification, within the available sample size, using five biomarker profiles. Only observed data were used in this test (i.e., group means were not used to fill-in for missing data). We showed that high-complexity profiles (profiles 1, 2, and 3) resulted in a perfect identification of both autistic and control participants, while the rate of correct identification (RCI) ranged from 83 to 96% using simpler profiles. These results stimulated our interest in testing profiles with fewer than five variables, which we tested by modifying profile 5 to generate new profiles consisting of all possible combinations of one, two, three, and four variables. Identification was attempted using each of these profiles, and RCI was averaged over profiles composed of the same number of variables. The results obtained showed a progressive decline in RCI as the number of variables decreased, underscoring the superiority of using biomarker profiles over individual markers and that of high-complexity profiles over simple ones (Fig. 7). For diagnostic purposes, it would be useful to have some measure of confidence each time an identification is made, in addition to the predetermined sensitivity and specificity. Using k-nearest neighbor in library-based identification generates a score, which we thought might be suitable to serve as this measure of confidence. To test this possibility, we compared the ranges and averages of the scores associated with correct identifications to those associated with incorrect identifications. We found that the average scores associated with incorrect identifications were consistently lower than those associated with correct identifications and scores of four or greater were largely associated with correct identifications (Fig. 7). Taken together, our results suggest that the use of our biomarker profiles for diagnostic purposes may lead to the development of a novel diagnostic tool for the laboratory diagnosis of ASD. Given the heterogeneity of disease manifestations and their direct implications for treatment, prognosis, and patient’s quality of life, it would be advantageous to develop laboratory methods that can accurately predict various ASD-associated clinical pictures. Therefore, we wanted to explore the utility of our biomarker profiles in differentiating different levels of disease severity.

Fig. 7
figure 7

Library identification using the Childhood Autism Rating Scale (CARS) and the Social Responsiveness Scale (SRS). Library identification accurately predicts autistic and healthy control participants. Rates of correct identification are shown using profiles 1 through 9 to classify healthy control and autistic participants with impaired scores on the CARS and the SRS (top) or using individual biomarkers to classify healthy controls and either autistic participants with impaired CARS (middle) or those with impaired SRS scores (bottom). The bar graphs depict the rates of correct identification for the healthy controls and autistic participants, and the overall rates of correct identification are indicated on top of the corresponding bars

The Biomarker Profiles Investigated in the Present Study Were Not Able to Predict Disease Severity

In addition to assisting with the initial diagnosis of ASD, having reliable biomarkers to help quantitate disease severity would likely inform treatment decisions, facilitate follow-up, and improve prognosis. Therefore, we wanted to determine whether any of the biomarkers investigated in the current study correlated with either CARS or SRS scores. Both scoring systems did not correlate with any of the biomarkers studied here, as demonstrated by Spearman correlation (Fig. 8) and multiple regression analysis (data not shown). Also, hierarchical clustering, PCA, and MDS analysis did not show discernible segregation between autistic participants with different disease severity (Fig. 9). Taken together, our data suggest that predicting disease severity, at least based on CARS and SRS scores, using the markers we studied isunlikely to be successful.

Fig. 8
figure 8

Correlation statistics showed that neither scores on the Childhood Autism Rating Scale (CARS) or the Social Responsiveness Scale (SRS) correlated with the nine studied biomarkers

Fig. 9
figure 9

Principal component analysis (PCA), multidimensional scaling (MDS), and hierarchical clustering (HiClust) in disease severity. Disease severity could not be predicted using the biomarkers investigated in the current study. Principal component analysis (PCA), multidimensional scaling (MDS), and hierarchical clustering (HiClust) were used to test the segregation between patients with severe (purple) and those with mild to moderate (peach) disease measured by either the Childhood Autism Rating Scale (CARS) or the Social Responsiveness Scale (SRS) scores. Profile 1 (nine variables) was used in the depicted studies, where missing data were substituted for group means

Discussion

In the present study, we examined the potential of nine analytes in distinguishing autistic patients from healthy controls and in distinguishing between severe and mild to moderate impairment of the CARS and SRS scores. The data have been previously analyzed, but in previous analyses, participants with markedly different observed data from the mean were considered outliers and were therefore eliminated from further analyses. It is conceivable that biomarker data may differ widely among autistic patients simply because ASD consists of a diverse group of neurodevelopmental conditions with dramatically different presentations. However, this was not true for all of the nine biomarkers we tested. For example, the variance in the healthy control group was ten times that of autistic participants for serotonin, but the variance for lead was more than three times higher in the autistic group compared to controls. Regardless of the amount of variance, none of the data points stood out as an outlier in dot plots (data not shown). Also, the variance in healthy versus autistic subjects may vary widely in different populations and between different markers. Taken together, we could not develop a convincing rationale for identifying and excluding potential outliers. Therefore, all participants were included in this study, which might explain the lower AUC values obtained in this study compared to previously published work (Alabdali et al. 2014a, b).

Our data show that any of the nine biomarkers tested is likely useful in predicting the occurrence of ASD, with vitamin E and GST being the most useful in predicting both CARS and SRS impairments. We also found dopamine, serotonin, and mercury to be good predictors of the occurrence of ASD. Predicting the severity of CARS and SRS impairments was more challenging, with GABA being the most promising predictor of the severity of SRS impairment and no useful predictors of CARS impairment were found. It would have been interesting to test the effect of combining GABA with additional biomarkers on prediction accuracy, but we did not have enough participants to test this possibility. We speculate that the use of biomarkers in this study and other biomarkers might be more useful in predicting the level of impairments of individual components, rather than overall CARS and SRS scores. Additional studies involving larger numbers of participants are needed, however, to test this hypothesis.

We have demonstrated that combining multiple variables into profiles augmented prediction accuracy and that increased profile complexity is generally associated with high accuracy. In the current study, we combined multiple variables using three methods. In the first method, we replaced observed values of individual variables by the coordinates of the eigenvector (or principal component) that explained the most variance and was responsible for most of the segregation between groups. This gave us a single value for each participant that was computed from the multiple variables included in each analysis. The advantage of this method is the ability to combine variables in a way that is focused on the portion of data variance that is most relevant to the segregation of the groups under study. The caveat, however, is the possible loss of information, which is an inherent disadvantage of data reduction techniques, including PCA and MDS. The second method was taken from the work of Abruzzo et al. (2015), which involved computing a Z-score for individual variables and combining them by taking the sum of Z-scores (Alessandro Ghezzo, personal communications). Z-scores describe the relationship between the values of a dataset and the mean. Specifically, a Z-score of zero indicates that the corresponding value is equal to the mean, while Z-scores greater than zero represent the number of standard deviation the corresponding value is above the mean, and those lower than zero (i.e., negative Z-scores) indicating the number of standard deviations the corresponding value is below the mean. Since the mean of the dataset directly affects the Z-score (see Eq. (2)), input data should contain equal numbers of all groups. Having more participants in one group than another will immediately skew the Z-scores of all variables for which group means are unequal. For the autistic and control groups, means are unequal for all variables tested in this study. Both of these two methods were used as input into ROC curve analyses. In the third method, which was used in library-based identification, we combined variables using a similarity coefficient. The coefficient we used here, Canberra metric, was selected because it resulted in the best group separation when compared to other coefficients, such as Pearson correlation, ranked correlation, cosine coefficient, Gower coefficient, and Bray-Curtis coefficient (data not shown). The Canberra metric computes the distance between a pair of participants by first computing the sum of absolute differences between these two participants for each variable and then dividing by the number of variables to obtain a mean summarized distance. This coefficient standardizes all variables by dividing each absolute difference by the corresponding absolute sum before a grand sum over all variables is calculated (Eq. (1)).

The use of PC1 to calculate AUCs or a similarity coefficient in library-based identification led us to conclude that profiles were superior to individual markers in regard to prediction accuracy. This conclusion is in agreement with the conclusion of a previous study, in which six biomarkers were combined using the sum of Z-scores. This study showed that prediction accuracy increased when the six variables were combined (Abruzzo et al. 2015). The advantage of using the sum of Z-scores to combined variables was not shown in our study. In fact, doing so in our study lowered prediction accuracy as demonstrated by the UACs. It is noteworthy that most of our datasets contained unequal numbers of participants in each of the two groups being compared. This alone may offer some explanation since this can easily alter the mean and, thus, the Z-scores, as described above. We conclude that the discrepancy between the study by Abruzzo et al. (2015) and ours may be attributable, at least in part, to the imbalanced groups in our datasets. A clear advantage of the use of similarity measures and eigenvectors over the sum of Z-scores is that computing the sum of individual Z-scores may conceivably result in cancelation of group-specific features, while this is not the case with the other two methods.

We also compared the accuracy of predicting ASD occurrence using ROC curves versus using library-based identification. ROC curves are widely used in studies addressing the utility of various biomarkers in clinical practice. One of the greatest advantages of using ROC curves is the ability to optimize a cutoff value taking into account sensitivity, specificity, and clinical considerations specific to each disease. Raising a cutoff value increases specificity, but often at the expense of sensitivity (Akobeng 2007; Hajian-Tilaki 2013). The trade-off between sensitivity and specificity varies according to the severity of the illness in question, the treatability of this illness, and the consequences of delaying treatment. High sensitivity might be crucial for illnesses known to cause devastating consequences if left untreated and, thus, the benefit of early detection may outweigh the harm of reduced specificity. On the contrary, harsh treatment decision may require a high level of certainty (or specificity) that such treatment is justified.

Conclusion

Although we find our results compelling and encouraging of further investigations, we acknowledge the limitations imposed by the limited number of participants. Studies of larger scale are warranted to verify our findings and move the proposed diagnostic tool to clinical practice.