Introduction

Autism spectrum disorder (ASD) is characterized by social interactions and communication impairments as well as restricted and stereotyped behavior. This neurodevelopmental disorder develops in early childhood and results in developmental differences in brain anatomy, functioning and connectivity that affect behavior across the lifetime. The diagnosis is typically done by behavioral observations and clinical interviews at this stage. Due to the complex nature and increasing number of ASD patients, attractions have been drawn by both the scientific community and society.

Disrupted network connectivity between distant brain regions has been reported among individuals with ASD (Wiggins et al. 2011; Cherkassky et al. 2006; Anderson et al. 2011a; Keehn et al. 2013; Lynch et al. 2013; Tyszka et al. 2013; DiMartino et al. 2011; Gotts et al. 2012; Von Hagen et al. 2013). These reports showed both increased and decreased connectivities in brain regions including default mode network, social brain regions, attentional regions, visual search regions and corticostriatal connections. However, knowing the specificity of diagnosis criteria (American Psychiatric Association 2013), there is hope that some (possibly complex) patterns of brain features may be unique to the disorder and it is worth to continue the research.

Given the fact that functional connectivity literature in ASD is complex and often inconsistent (Müller et al. 2011; Nair et al. 2014), machine learning (ML) techniques may provide valuable tool to discover the aberrant connectivity pattern that characterizes ASD. Several studies applied multivariate classification to functional connectivity MRI (fcMRI) (Van Dijk et al. 2010) for diagnostic classification, i.e. to characterize ASD using features that are predictive of a diagnosis on the level of individuals. A leave-one-out classifier (Anderson et al. 2011a, b) was performed using a large fcMRI connectivity matrix and achieved an overall classification accuracy of 79%. However the accuracy was lower in a separate small replication sample. Uddin et al. (2013) used a logistic regression classifier with features identified by ICA and achieved accuracies about 60–70% for all but one component identified as salience network, for which accuracy reached 77%.

Due to the advent of collective publicly dataset ABIDE (Di Martino et al. 2014), previous studies started investigating the potential diagnostic accuracy of ML algorithms with fMRI data gathered across different scanners, with different field strengths and different acquisition schemes. Nielsen et al. (2013) reported an overall accuracy of 60% for whole brain classification using a large dataset obtained from ABIDE dataset. In another study, Chen et al. (2015) showed that support vector machine classifier performed modestly (with accuracy <70%) whereas random forest classifier achieved 91% accuracy using Power atlas (Power et al. 2014). Plitt et al. (2015) implemented several machine learning tools and reported that individuals can be classified as having ASD with accuracy 76.67% using Destrieux atlas from their rs-fMRI scans alone. In a recent study (Chen et al. 2016) a MVPA was utilized to classify ASDs from HCs based on frequency-specific whole brain functional connectivity using a cross-site evaluation. These studies used aggregated data from all sites and evaluated using either LOOCV or 10-fold CV scheme.

Due to large sample size, multi-site data provides the power necessary to identify the neuroanatomical differences between ASD and typical control. However, this aggregated data introduces an additional problem of heterogeneity. Previous studies (Nielsen et al. 2013; Castrillon et al. 2014) revealed that acquisition site has significant effects on image properties. One solution to alleviate the problem of between-site variation is to utilize the domain adaptation machine learning algorithms (Jiang 2008; Pan and Yang 2010). Domain adaptation is a new approach in machine learning that deals with the differences in data distribution between test and train domain. Recently, this strategy has been used in predicting symptom severity in ASD based on cortical thickness measures in ABIDE dataset (Moradi et al. 2016).

Most functional neuroimaging methods are based on voxel (volume element data); even though recent studies used regions of interest (ROI) data. Most often this ROI approach presupposes the prior hypothesis and the regions outside of this network are not fully explored leading to potentially undetected effects. In this study, we used 42 bilateral Brodmann areas to see the effect of functional connectivities in these regions in distinguishing the two groups.

Previous studies did not address whether certain feature selection methods are better than others in combination with certain learning methods, in terms of producing models with high prediction accuracy. Relatively little has been published on the combined impact of choices of feature selection method and learning method on the predictive performance in autism research. To address this issue we empirically evaluated two different feature selection algorithms (filter-based and Elastic Nets) together with SVM learner. We addressed classifier generalizability by including subjects from the ABIDE (Di Martino et al. 2014) preprocessed dataset (356 individuals) from scanners located at six sites. Based on this data set, we aimed to assess the accuracy of ML classifiers for the automated detection of ASD. First we utilized domain adaptation algorithm to address the problem of data distribution across different acquisition sites. We used partial least square regression (PLS) with site as response variable and functional connectivities as predictor variables to reduce between-sites variability. Then we used PLS approximation of predictor variables and ASD/Control status as input to SVM model. We hypothesize that by considering PLS approximations of predictor variables one can effectively reduce nuisance variation between the data from different sites. We performed feature selection within a classification framework using a cross-site evaluation strategy, where the data from one site used as test dataset, and the remaining sites data as training data to learn the model parameters.

Methods and Materials

ABIDE Preprocessed Dataset

Data were collected from ABIDE site which is an open-access multi-site image repositories consists of structural and rs-fMRI scans from ASD and TD individuals ((Di Martino et al. 2014). The included sites with sample size were New York University (NYU, 149), Stanford University (STANFORD, 40), Olin Center, Institute of Living at Hartford Hospital (OLIN, 35), Kennedy Krieger Institute (KKI, 55), San Diego State University (SDSU, 36), University of Pittsburgh School of Medicine (PITT, 57). Acquisition parameters, protocol information can be obtained at ABIDE site http://fcon_1000.projects.nitrc.org/indi/abide/. We used preprocessed data using Connectome Computation System (CCS) pipeline as described at ABIDE sites. Subject demographics for individuals satisfying inclusion criteria are shown in Table 1.

Table 1 Subject demographics from the ABIDE sample

Region of Interest and Connectivity Matrix

Connectivity maps were obtained utilizing CONN tool box (http://www.nitrc.org/projects/conn) running under Matlab 8.3 (2014a) (http://www.mathworks.com). Results were filtered to 0.01 to 0.1 Hz to limit spatial-temporal correlation to the spontaneous brain oscillation power; Brodmann’s areas ROIs provided by the same tool were utilized as seeding areas. Each Brodmann region was analyzed against all other Brodmann regions. Bivariate correlations were calculated between each pair of ROIs as reflections of connections. The list of 42 bilateral regions provided in CONN toolbox is given in supplementary materials. The rs-fMRI network was captured by an 84 × 84 symmetric matrix of nodes. We extracted the upper triangle elements of the functional connectivity matrix as classification features, i.e. the feature space for classification was spanned by the (84 × 83)/2 = 3486 dimensional feature vectors.

Significance Testing for Brain Connectivity and Site Effect

We performed a two sample t-test with Benjamin-Hochberg (Benjamini et al. 1995) correction using our data set to find any significant functional connectivity differences between healthy controls and ASD. A multivariate ANCOVA was used to see the effects of site for brain connectivities.

Feature Selection Algorithms

Previous studies did not explore the impact of feature selection in ABIDE studies. We empirically evaluated two different feature selection algorithms such as filter-based (ttest) and embedded Elastic Nets (Tibshirani 1996, Zou and Hastie 2005) together with SVM learner and evaluated the impact of these methods on selecting the important connections. The popular LASSO regression method minimizes the Residual Sum of Squares (RSS), similar to Ordinary Least Squares (OLS) regression, but poses a constraint to the sum of the absolute values of the coefficients being less than a constant. This additional constraint is similar to that introduced in Ridge regression, where the constraint is to the sum of the squared values of the coefficients. This simple modification allows LASSO to perform also variable selection because the shrinkage of the coefficients is such that some coefficients can be shrunk exactly to zero. The LASSO computes model coefficient \( \widehat{\beta} \) by minimizing the following function R(β) + λ ||β||1, where R(β) is the mean square error on the training set and ||β||1 =\( \sum \limits_{i=1}^p\mid {\beta}_i\mid \). λ controls the degree of sparsity of the solution, i.e. the number of features selected.

Elastic Net (Zou and Hastie 2005) is similar to LASSO. It differs in that the l1 norm of β is replaced by a combination of l1 and l2 norms. In this case, we minimize R(β) + λ Pα(β), where \( \kern0.5em P\alpha \left(\beta \right)=\frac{\left(1-\alpha \right)}{2}{\left|\left|\beta \right|\right|}_2^2+\alpha\ {\left|\left|\beta \right|\right|}_1 \), for α strictly between 0 and 1, and a nonnegative λ. The λ parameter can be tuned in order to set the shrinkage level, and the higher the λ is, and the more coefficients are shrunk to 0. Elastic Net is the same as LASSO when α = 1. As α shrinks toward 0, Elastic Net approaches ridge regression. For other values of α, the penalty term P α (β) interpolates between the L1 norm of β and the squared L2 norm of β. The advantage of Elastic Net over LASSO is that the Elastic Net penalty completes automatic variable selection and continuous shrinkage simultaneously, and it can select from a group of correlated variables. It is especially useful for large p small n problems where the grouped variables situation is a particularly important concern (Hastie et al. 2001).

Both Elastic Net and Univariate t-test based feature subset ranking were implemented in MATLAB 2016a.

Partial Least Squares

Partial Least Squares (PLS) regression is based on linear transition from a large number of original predictors to a new variable space based on small number of orthogonal factors (latent variables). The advantage of PLS is that it finds components (latent variables) which explain the covariance between predictor and response variables. This method is particularly suitable with high dimensional and high correlated predictor variables. The general underlying model of multivariate PLS is X = TPT + EY = UQT + F, where X is an n x m matrix of predictors, Y is an n x p matrix of response variables; T and U are n x l matrices that are respectively projections of X and projections of Y. P and Q are m x l and p x l orthogonal loading matrices and matrices E and F are the error terms, which are assumed independent and identically distributed random normal variables. We denote the functional connectivities values by X, where n is the number of subjects and m is the number of connectivities (m = 3486). The response variable Y represents the site information and is denoted by Y = {y1, 1 .....,yn, p}, where p is the number of sites. yn, p is 1 if subject n belongs to site p, otherwise zero. We used the PLS approximation of the predictor matrix X as our feature set for predicting ASD. In this current work, we used SIMPLS method which calculates the PLS factors directly as linear combinations of the original variables. The rationale behind this method is to reduce the overall inter-site variance by using PLS approximation of the predictor matrix X. We hypothesize that using PLS approximation our classification framework will be robust.

Classification and Feature Selection Framework

Classification algorithms were implemented using MATLAB 2014a. We used Support Vector Machine (SVM) classifier (Vapnik 1995), which is a widely used method for binary classification in fMRI studies. We performed a cross-site validation approach to address the issue of site acquisition effect. More specifically, a leave-one-site-out CV (LOSOCV) was performed in such a way that the data from each site was in its own fold. In this way we are training the model using 5 sites and testing with the one site to avoid any double dripping. In each training fold for LOSOCV, we performed another 10-fold CV to determine the best α (elastic net) and k (ttest) parameters. Inside each 10-fold CV training fold another 10-fold CV was performed to determine the parameter C for SVM and lambda for Elastic Net. Once we found the connectivities using training subjects with best alpha and k parameters, we used these connections to train the SVM model and tested with the hold-out site for prediction.

We used PLS approximation for predictor variables and binary outcome of patient status ASD vs control as input to SVM algorithm. For elastic net features we selected non-zero coefficient of elastic net parameters as important features and ranked t-stat for t-test feature selection. The framework is given in Fig. 1. The implementation of PLS was done by PLSREGRESS functions in MATLAB software with a fixed number of components selected from a range of values {5, 10, 15, 20, 25, 30, 35} with highest percentage of variance explained by the model. We have chosen 30 components that explain most of the variance in the observed response variables.

Fig. 1
figure 1

Classification framework (LOSOV)

Consensus Features

Since we used a 10-fold CV strategy, the feature ranking was based on different training dataset in each cross validation (CV) fold. Therefore the feature (functional connections) contributions to classification were not evenly distributed. In this study we adopted the concept of consensus functional connectivity (Fair et al. 2012, Bhaumik et al., 2016), which is defined as the functional connectivity feature appearing in the final feature set of each CV iteration. We computed the percentages of occurrences of features that contributed to identification of depressed patients across all iterations of the cross validation. The functional connectivities which appeared in more than 70% of the 10-fold process are selected for each site and indicated the most discriminative features between those with ASD and TD subjects. We reported those connectivities (Table 5) which appeared in 3 or more sites in our results section.

Assessment of Classification Result

To evaluate the quality of the classification result, we will report three established measures, accuracy, sensitivity and specificity. The accuracy of a classifier is defined as the ratio of total number of correctly classified subjects and total subjects. The sensitivity and specificity evaluates the performance of a classifier to identify positive and negative instances, respectively.

Results

First we employed a two sample t-test with Benjamin-Hochberg (Benjamini et al. 1995) correction using our data set to find any significant (p < .05) functional connectivity differences between healthy controls and ASD. The significant connections were calculated using data from all sites but one site. We also reported significant connections combing data from all sites. The connections between Primary Auditory Cortex and bilateral Somatosensory Association Cortex have been seen as significant in the four data sets removing KKI, OLIN, SDSU and STANFORD respectively as well as combined sites (Table 2). Our MANCOVA analysis suggests that overall there is a site effect (F (5,366) = 4.97, P = .0002) on brain connectivities.

Table 2 Significant connections for each site removed and combined sites (t-test with BH correction)

Classification Accuracy

Relatively moderate classification accuracies from holdout site have been achieved for different experimental strategies. First we found the connectivities using 10-fold CV with best alpha and k parameters and an internal 10-fold CV for SVM parameter C. We then used these connections to train the SVM model and tested with the hold-out site for prediction. The datasets from KKI (64%), OLIN (63%) and SDSU (64%) can achieve accuracy more than 60% using elastic net strategy. The sites PITT and STANFORD dataset could reach an accuracy of 61% and 70% respectively (Table 3) using t-test strategy. Prediction accuracies of all sites are always lowered in LASSO compared to t-test or Elastic Net except OLIN site. Previous result (Chen et al. 2016) also showed the similar classification accuracies for NYU (63%) and SDSU (60%) sites in cross-site evaluation.

Table 3 Evaluation of SVM for different sites (average and 95% C.I)

While comparing overlapping confidence intervals between sites there is clearly significant differences in accuracy between STANFORD and other sites for ttest. Paired t-test showed no significant differences between ttest and Elastic Net for all sites.

The prediction accuracies are decreased when we applied PLS strategy to account for site variation (Table 4). Using t-test strategy, the accuracies were decreased in KKI (2%), PITT (19%) and STANFORD (8%) sites. On the other hand, the accuracies were decreased in KKI (10%), NYU (7%), OLIN (3%), PITT (9%) and SDSU (3%) sites using Elastic Net strategy. However no significant differences between SVM with PLS and without PLS for any feature selection algorithm.

Table 4 Evaluation of SVM for different sites using PLS (average and 95% C.I, underlined significance at .05 level)

Functional Networks Associated with Top Features

The most discriminative connections based on consensus functional connections are shown in Table 5 across all sites. These connections were selected more than 70% of training folds during classification. The connections which appeared in 3 or more sites are shown here. Several regions were noted to participate in two or more informative connections: Dorsolateral Prefrontal Cortex, Somatosensory Association Cortex, Primary Auditory Cortex, Inferior Temporal Gyrus and Temporopolar Area.

Table 5 Consensus functional connectivities

Discussions

To our knowledge, this is the first study to find the effect of partial least square regression (PLS) in conjunction with SVM algorithm using preprocessed Autism Brain Imaging Data Exchange (ABIDE) dataset, which is available online. Our goal was to overcome the issues of multi-site, multi-protocol variability by increasing sample sizes collected from these sites. We employed a cross-site evaluation strategy to generalize the classifier’s performance. We further evaluated different state-of-art feature selection strategies to find the important functional connectivities in classification of ASD versus TD subjects. Previous studies employed MVPA approach and achieved nearly 90% accuracy (Ecker et al. 2010, Uddin et al. 2011) when data set collected at a single center. These studies depend on specific scan parameters and are hard to utilize on other datasets. A multi-center study (Nielsen et al. 2013) achieved poorer accuracy (60%) than for single site results. Recent study showed (Chen et al. 2016) that using cross-site evaluation strategy the highest accuracy can be obtained for NYU and SDSU sites are 63% and 60% respectively. Our results showed similar accuracies for these two sites, 60% and 64% respectively. To overcome the problem of heterogeneity of sites we adopted PLS strategy and the accuracies obtained for these two sites are 53% and 61% respectively. Our results indicated that there is an effect of PLS strategy on classifier’s performance.

Our results showed that several regions occurred in two or more informative connections are Dorsolateral Prefrontal Cortex, Somatosensory Association Cortex, Primary Auditory Cortex, Inferior Temporal Gyrus and Temporopolar Area. Autism spectrum disorder had been diagnosed by emphasizing the impairments seen and the wide range of their severity (Tyszka et al. 2013). Several basic processing deficits have been seen in autism including social cognition, both interpersonal social processes and self-referential thought (Lombardo et al. 2007, Uddin 2011), impaired global feature processing (Anderson et al. 2011a, b), impaired reward processing (Chevallier et al. 2012; Damiano et al. 2012; Lin et al. 2012), motivation (Chevallier et al. 2012), and sensorimotor impairment (Perry et al. 2007). The social and self-referential cognitive processes have been linked with a pair of cortical midline brain regions, the ventromedial prefrontal cortex (VMPFC) and posterior cingulate cortex (PCC), which serve as hubs of the default mode network (DMN) (Greicius et al. 2003; Raichle et al. 2001). A critical component of social communication is processing auditory information. People with autism spectrum disorders typically have problems processing this information. The auditory cortex is the region of the brain that is responsible for processing of auditory (sound) information. As language deficits are a core feature of ASD, the study of auditory processing is essential to considering the roots of ASD as well as to conceptualize rational interventions. Investigators argued that autism is better characterized as a disorder of higher cortical functions, and specifically of the dorsolateral prefrontal cortex (Minshew and Goldstein 1993, Ozonoff et al. 1991, Rogers and Pennington 1991). Studies (Bertone et al. 2005, Dakin and Frith 2005, Tommerdahl et al. 2008, Dinstein et al. 2012) found that Somatosensory ASD has long been associated with sensory abnormalities. The middle temporal gyrus and inferior temporal gyrus are involved in a number of cognitive processes, including semantic memory processing, language processes (middle temporal gyrus), visual perception (inferior temporal gyrus), and integrating information from different senses. Temporopolar area is located primarily in the most rostral portions of the superior temporal gyrus and the middle temporal gyrus and so responsible for language processes. The other regions are orbitofrontal cortex, which is responsible for cognitive processing; angular Gyrus, involved in a number of processes related to language, number processing and spatial cognition and attention; supramarginal gyrus, receives somatosensory, visual, and auditory inputs from the brain (Dubac 2014); Dorsal anterior cingulate cortex (dACC), is a brain region that serves cognition and motor control and responsible for abnormalities in the structural or functional connections of the ACC and its sub-regions contribute to ASD (Zhou et al. 2016). Our results suggest that the functional connections distinguished two groups are heavily related to speech and language. Inferior prefrontal, premotor cortex, supramarginal gyrus and auditory cortex together make a strong case for that. These results are expected based on existing literature.

Our study suggests that leave-one-site out cross validation can be a potential strategy to moderately classify ASD from healthy controls. Classification accuracies were comparable for two sites NYU and SDSU with a recent study (Chen et al. 2016) without PLS. However, adopting PLS as a site variation correction, the accuracies decreased but not significantly. Higher accuracies obtained without PLS indicate an effect of site variability. In future, we need to collect data for other sites from ABIDE dataset and evaluate these strategies. We would like to evaluate our proposed domain adaptation strategy for other classifiers and different preprocessing strategies used in ABIDE consortium.

Information Sharing Statement

We provide our entire source code written in Matlab at the link https://github.com/ashishpradhan1008/PredictingASDbyCrossSiteEval. This link contains the dataset and Matlab code used for this paper. A demo code has been provided to guide the users to run the code and reproduce the results. Please keep in mind that the source code has not been optimized for speed and RAM use.