Introduction

Resting-state fMRI is one of the most widely used methods in neuroscience, psychology and psychiatry research. However, one of the challenges of resting-state fMRI is the artifact removal as BOLD signal represents just 1–3% of the variability in the fMRI timeseries (Bianciardi et al., 2009; Biswal et al., 1995). A variety of methods exist for single-echo fMRI acquisitions, including regression of motion parameters, white matter and cerebrospinal fluid signals, as well as several methods based on independent component analysis (ICA) of single-subject resting-state fMRI data. ICA-based methods, including i) semi-automated FMRIB’s ICA-based X-noiseifier (FIX) (Salimi-Khorshidi et al., 2014) ii) fully-automated ICA-based Automatic Removal Of Motion Artifacts (ICA-AROMA) (Pruim et al., 2015) and iii) hand classification (Griffanti et al., 2017) distinguish noise from signal with a high degree of accuracy (Dipasquale et al., 2017). In particular, ICA-FIX was able to preserve signal, while removing a substantial proportion of noise thus improving data quality. However, ICA-FIX can be challenging to set up and integrate into workflows. FIX requires a working installation of FSL, MATLAB, and R and further requires training the classifier on manually labelled data. While hand classification follows rules proposed in (Bijsterbosch et al., 2017; Griffanti et al., 2017) and returns high-quality data, it is not viable for larger datasets. Following hand classification guidelines, we propose a simple Alternative Labeling Tool (ALT) that provides a user-friendly alternative to FIX or AROMA. ALT uses two features: i) the proportion of any given independent component that falls into a gray matter mask and ii) the component's distribution of the power spectrum. Here, we use an ageing dataset (OASIS-3) (LaMontagne, 2019), to test whether this denoising approach with some simple parameter optimization reduces motion artifact compared with AROMA, FIX, and spike regression, common approaches to data denoising (Ciric et al., 2017). We also tested whether ALT classification accuracy was consistent with the gold-standard hand classification approach. We expected ALT to improve data quality which was assessed using four metrics based on functional connectivity (FC) estimates, including motion-FC relationships and their distance dependence, network identifiability and loss of degrees of freedom.

Methods

Installation requirements

ALT requires a single-subject Melodic (or ICA-feat) output from FSL that includes registration files, spatial maps and power spectra for each independent component for each subject. ALT requires a working installation of FSL and either Octave (free and open source) or Matlab (commercial license), with no additional toolboxes needed (Table 1). A gray matter mask is provided based on the Harvard–Oxford cortical and subcortical atlas but alternative masks (in MNI152 space) can be supplied by the end user.

Table 1 Overview of ICA-ALT (Alternative Labeling Tool), compared to ICA-FIX and ICA-AROMA

Algorithm description

ALT includes the following three steps:

  1. (1)

    In the first step, ALT generates labeling metrics that can be used for labeling in the subsequent step. First, ALT computes the proportion of the component’s spatial map that falls into a gray matter mask. This is achieved by registering the melodic_IC component maps to standard MNI152 space using linear registration. Nonlinear FNIRT registration often fails with older participants or those with neurological conditions. An inclusive gray matter mask was selected in order to capture all signal components. In addition, ALT uses MELODIC-generated power spectra to assess the “skewness” of the power spectra. The power of signal components is concentrated in the frequency band that captures slower fluctuations, thus resulting in positively skewed power bands. We operationalize this metric as the proportion of the power spectrum concentrated in the leftmost third of the graph compared to the overall power.

  2. (2)

    Next, the ALT user is advised to optimize the “hyperparameters” of the cleaning script. Two parameters can be adjusted that control the thresholds for the gray matter proportion and for the power spectrum skewness. By default, the gray matter threshold is set to 0.5, meaning that only those components that (thresholded at z > 2) contain at least 50% of voxels that fall under the gray matter mask are labelled as signal. Further, the default power spectrum threshold is set to 0.6, meaning that only those components with at least 60% of their power concentrated in the leftmost third of the spectrum are labelled as signal. Components with less than 50% gray matter voxels or with more diffuse power spectra are labelled as noise. Components with less than 60% of the spectrum located in the leftmost tertile of the graph are also labelled as noise. These defaults may work well for some datasets such as OASIS3 data used here, but may need to be tuned depending on each unique dataset. Therefore, ALT users are advised to perform manual labeling on a small number of subjects in their sample (N=15-20) and use the following evaluation step (3) to assess the performance of the ALT algorithm. Example labels from this classification are shown in Fig. 1.

    Fig. 1
    figure 1

    Example signal (A) and noise (B) components. An example signal component and the two features used to classify it are shown in (A). The upper (A) panel shows a power spectrum, highlighting the leftmost tertile of the graph in green. Components with less than 60% of their power concentrated in this highlighted area were considered noise. The lower (A) panel shows a spatial map of the same component, with the gray matter mask highlighted in green. Components were thresholded at z > 2. If less than 50% of their voxels were inside the gray matter mask, then the component was labeled as noise. Analogous data for a noise component from the same participant is shown in (B). The noise component’s spatial map was located primarily in the ventricles and outside the gray matter. Further, its power spectrum was not positively skewed. Spatial maps were transformed to the MNI standard space and are overlaid on the MNI152 template (axial slices at MNI Z coordinates of -29, -17, -5, 7, 19, 31, 43 and 55)

  3. (3)

    After the classification is completed, evaluation against manual labeling is advised. This is achieved by comparing the ALT labels with manual labels considered as “ground truth”. This step generates a confusion matrix, including true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), calculated as follows:

    $$TP=\frac{\mathrm{Nr}\;\mathrm{of}\;\mathrm{components}\;\mathrm{labeled}\;\mathrm{as}\;\mathrm{noise}\;\mathrm{by}\;\mathrm{ALT}\;\mathrm{and}\;\mathrm{as}\;\mathrm{noise}\;\mathrm{by}\;\mathrm{manual}\;\mathrm{approach}}{\mathrm{Nr}\;\mathrm{of}\;\mathrm{components}\;\mathrm{labeled}\;\mathrm{as}\;\mathrm{noise}\;\mathrm{by}\;\mathrm{manual}\;\mathrm{approach}}$$
    $$FP=\frac{{\mathrm{Noise}}_{\mathrm{ALT}} \& {\mathrm{Signal}}_{\mathrm{manual}}}{{\mathrm{Signal}}_{\mathrm{manual}}}$$
    $$TN=\frac{{\mathrm{Signal}}_{\mathrm{ALT}} \& {\mathrm{Signal}}_{\mathrm{manual}}}{{\mathrm{Signal}}_{\mathrm{manual}}}$$
    $$FN=\frac{{\mathrm{Signal}}_{\mathrm{ALT}} \& {\mathrm{Noise}}_{\mathrm{manual}}}{{\mathrm{Noise}}_{\mathrm{manual}}}$$

Application to an ageing dataset (OASIS3)

We showcase ALT’s performance using resting-state date from 275 participants from the OASIS dataset.

MRI data

MRI data were acquired on Siemens 3 T scanners. The most common combination of T1-weighted image parameters was: voxel size = 1 × 1 × 1 mm​3, echo time (TE) = 0.003 s, repetition time (TR) = 2.4 s. The most common parameter combination of the resting-state fMRI data was: voxel size = 4 × 4 × 4 mm​3​, TE = 0.027 s, TR = 2.2 s, scan duration = 6 min (LaMontagne et al., 2015).

MRI processing

We used the FSL processing pipeline (Alfaro-Almagro et al., 2018) on the downloaded fMRI data in BIDS format. For independent component labeling, we apply ALT and compare its’ performance to ICA-FIX and ICA-AROMA. Briefly, we used the FSL (v6.0.1) FEAT (v 3.15) with default settings for fMRI pre-processing (deleting the initial 3 volumes, including linear detrending over 100 s, realignment motion correction, and 5 mm Gaussian smoothing) and for ICA using the MELODIC tool (Beckmann and Smith, FMRIB Technical Report TR02CB1). MELODIC generates a number of components with distinct (but sometimes overlapping) spatial maps, power spectra and timeseries. We then used ALT to label these components as signal or noise and subsequently used fsl_regfilt to regress out the noise components, thus obtaining denoised data. As mentioned above, ALT registers melodic_IC components to standard space using boundary-based linear registrations provided by FLIRT (Greve & Fischl, 2009; Jenkinson & Smith, 2001; Jenkinson et al., 2002), since nonlinear registration using FNIRT was less reliable with the older participants in OASIS3.

For subsequent comparisons, data was denoised using the strategies described below and registered to MNI152 space using Freesurfer (v6.0, reg-feat2anat, bbregister and mri_vol2vol). Timeseries from a 229-region Power parcellation (Power et al., 2011) were extracted using fslmeants. We did not include task-positive Power networks given that the fMRI data was obtained under resting-state conditions.

Overview of confound regression strategies

We compared ALT performance to a) ICA-FIX, b) ICA-AROMA, and c) Spike regression with regression of 24 motion parameters (Ciric et al., 2017; Parkes et al., 2018).

ICA-FIX

We used ICA-FIX (version 1.06.15) with Standard.RData as the training dataset and the standard threshold of 20 to generate signal and noise labels and fsl_regfilt to generate the denoised images. Note that FIX automatically defaults to nonlinear registration.

ICA-AROMA

We used AROMA (version v0.3-beta) with default settings to generate fMRI images with non-aggressive denoising. We used the linear registration option (-affmat only) with AROMA.

Spike Regression

We generated a number of spike regressors corresponding to the number of volumes with a volume-to-volume root mean square (RMS) displacement greater than 0.25 mm. For each volume with relative RMS > 0.25, a new regressor was created that took a value of 1 for that volume and the value of 0 for all other volumes. These regressors were included alongside 24 motion parameters (6 motion parameters, 6 temporal derivatives, 6 quadratic terms, and 6 quadratic expansions of the derivatives of motion). We did not include Global Signal Regression (GSR) given that the ICA-based denoising methods under consideration also don’t include GSR and inclusion of GSR vs non-GSR based denoising has been previously extensively compared (Ciric et al., 2017; Parkes et al., 2018).

Overview of outcome measures

Following established criteria of denoising assessment (Ciric et al., 2017; Parkes et al., 2018), we used four metrics to assess denoising performance: motion quality control and FC (QC-FC) correlations; distance-dependence of these QC-FC associations, loss of temporal degrees of freedom and modularity of networks constructed after denoising. Data from 275 participants in the OASIS3 dataset was used for these analyses.

QC-FC correlations

Firstly, we show the association between connectivities of each ROI-ROI pair with motion QC, quantified as relative mean RMS. There were 229 × 229 = 26,106 edges, each of which was correlated with mean RMS across the 275 participants. Significance was determined using false discovery rate (FDR), q < 0.05 (MATLAB mafdr.m function with Benjamini–Hochberg method (Benjamini & Hochberg, 1995)). We report median absolute correlations and the proportion of edges that were significant.

Distance-dependence of QC-FC correlations

Centroids of the Power parcellation were obtained using fslstats and the Euclidean distance for each pair of ROIs was calculated (MATLAB norm function), resulting in 26,106 pairwise distances. We report correlations between the QC-FC correlations and the ROI-ROI distance across edges.

Modularity of networks

We used Louvain clustering as part of the Brain Connectivity Toolbox (Rubinov & Sporns, 2010) to generate modularity quality (Q) of the connectomes in the Power parcellation. This measure quantifies the degree to which structured sub-networks were found in the denoised connectomes. Connectomes were not thresholded and included signed data. We report mean Q values across subjects and the correlations between Q and motion (mean RMS) across subjects.

Loss of temporal degrees of freedom

We quantified the loss of temporal degrees of freedom by calculating the sum of the number of regressors and the number of volumes flagged for spike regression.

Comparison with manual labels

In a subset of 30 OASIS-3 participants, we compared the ALT-generated labels with manual labels by two independent raters (GC and PZ, using evaluation.m script as part of ALT, in MATLAB R2016a). On average, 50–80 components can be generated in any 6-minute fMRI scan. For particularly noisy fMRI scans, or high-resolution scans, the number of components may exceed 100. We compared ALT with FIX and FIX with manual labels using a similar confusion matrix approach used when comparing ALT with manual labeling.

Results

Background demographics and cognitive status

Participants had a mean age of 70.1 (SD = 9.2). Among the full sample of 275 participants, 154 were female and 121 were male. Forty-six participants had a clinical dementia rating score of 0.5, 13 participants had a clinical dementia rating score of 1 and 216 participants had a score of 0.

In the subset of those included in the hand classification comparison, participants had a mean age of 66.5 years old (SD = 9.4). Eighteen participants were female, while 12 were male, and 5 participants had a clinical dementia rating score of 0.5, while the remaining 25 participants had a clinical dementia rating score of 0.

ALT and AROMA perform well across all outcome measures

Both ALT and AROMA significantly reduced the motion artifact, with low median absolute QC-FC correlations (Fig. 2A, 2B) and a very small proportion of connectivities related to motion (< 10%, Fig. 2C). Furthermore, motion artifacts were not distance-dependent after ALT and AROMA (-0.1 < r < 0). Motion artifacts were more strongly associated with the distance between regions of interest after FIX and spike regression with 24 motion parameters (Fig. 3). All methods resulted in good network identifiability, with relatively high modularity (Q) values. ALT showed better network identifiability compared to other methods (Fig. 4). Network modularity was more strongly associated with motion after ALT denoising than after other methods (Fig. 4C). All methods resulted in relatively high numbers of regressors, and consequent loss of degrees of freedom. Smallest losses of degrees of freedom were found after AROMA, with highest numbers of regressors in the spike regression model (Fig. 5). Spike regression included separate regressors for volumes exceeding the motion-QC threshold and thus showed the largest number of additional regressors compared to ALT, AROMA, and FIX.

Fig. 2
figure 2

Associations between motion and functional connectivity (QC-FC). ALT and AROMA reduced the relationship between connectivity and motion the most compared to FIX using standard training data and spike regression with 24 parameters. Mean absolute QC-FC correlations were low for all methods except FIX with standard training data (A). Distributions of QC-FC correlations across subjects are shown in (B). Finally, only a small proportion of edges in the 229-node network defined by Power et al. (2011) was related to motion after ALT and AROMA denoising (C). Proportions are shown in (C). Smaller number of significant edges suggests better performance of the denoising algorithms

Fig. 3
figure 3

Distance dependence of the QC-FC correlations between motion and functional connectivity after denoising. AROMA and ALT performed similarly well, outperforming FIX with standard data and spike regression with 24 motion parameters. The QC-FC correlations were only very weakly associated with the Euclidean distance between pairs of regions of interest (A). Generally, nodes that are spatially close to each other show higher impact of motion on connectivity. Datapoints were 26,106 unique edges, calculated as (229 × 229–229)/2

Fig. 4
figure 4

Network modularity (Q) and its association with motion after denoising. Network modularity was similar across all denoising methods, with higher values found after ALT and FIX. Connectivity matrices for the Power networks featuring 229 regions are shown in (B). Nodes were ordered according to their membership in the Power networks. Correlation between modularity and motion (mean RMS) was highest following ALT, and lowest after AROMA. Higher negative correlations suggest that network identifiability is higher in participants showing less motion, indicative of worse denoising performance

Fig. 5
figure 5

Mean number of confound regressors for each denoising algorithm. Error bars show standard deviations

ALT is highly consistent with a hand classification approach

ALT-generated labels were highly consistent with manual labeling (Fig. 6A, B). In particular, True Positive rates for detecting noise components was on average 86.7% for Rater 1 and 90.4% for Rater 2, with the lowest agreement rate for labeling all of a given subjects’ components being at just over 60%. Conversely, False Negative rates were very low (13.3% for Rater 1 and 9.6% for Rater 2). Although more variability was seen in the True Negative rates for ALT detecting signal components compared to hand classification, on average strong agreement was found (88.7% for Rater 1 and 85.9% for Rater 2). Inter-rater consistency was very high (True Positive rates of 93.6 and True Negative rates of 90.2, Fig. 6C).

Fig. 6
figure 6

Proportions of True Positives (TP), False Negatives (FN), True Negatives (TN) and False Positives (FP) for rater R1 (A), and rater R2 (B) showed high consistency between ALT-generated labels and manual labels. Inter-rater consistency (C) was similar to that of ALT with each of the raters. ICA-FIX also showed high consistency with manual labeling (D, E). FIX and ALT showed strong agreement on which components were classified as noise labels, but only moderate agreement on signal labels

ICA-FIX showed very similar consistency rates with manual labeling compared with ALT (TP = 82.5; FN = 17.5; TN = 92.4; FP = 7.6 for R1 and TP = 86.3; FN = 13.7; TN = 92.4; FP = 7.6 for R2, Fig. 6D and 6E). ALT and ICA-FIX showed high rates of agreement on noise labels (True positive rates for detecting noise of 91.3%) but less agreement on signal labels (TP = 91.3; FN = 8.7; TN = 61.8; FP = 38.2; Fig. 6F).

Discussion

Here, we introduce a computationally simple alternative labeling tool (ALT) for single-subject independent component labeling and showcase its strong agreement with the gold standard hand classification approach. While other denoising approaches such as FIX may have higher efficacy, they require a lot of computational power and user training. ALT is more accessible since it does not have high computational requirements and can be applied to even very large data sets with good efficacy. Motion represents a significant confound in resting state fMRI studies (Power et al., 2012; Satterthwaite et al., 2013; van Dijk et al., 2012), and we show that ALT significantly reduces motion artifact, QC-FC correlations and QC-FC distance dependence. ALT’s performance is very similar to AROMA and ALT labeling is close to manual labeling. Importantly, ALT is tied to FSL MELODIC processing suite and can be seen as a lightweight alternative to FIX, alongside AROMA.

ALT’s denoising performance was very similar to AROMA, although ALT resulted in slightly better network modularity Q. Network modularity declines with age due to desegregation (Chan et al., 2014), hence across all approaches we find lower mean modularity Q compared to previous studies of younger adults (Ciric et al., 2017). ALT showed higher correlations between mean modularity and motion, however, suggesting that participants with greater levels of motion were showing lower network modularity. This correlation was relatively low (r=-0.18). ALT installation is easier for those using FSL and MELODIC and it’s less computationally demanding and quicker to run compared to AROMA. On the other hand, AROMA is integrated in some processing pipelines such as fmriprep (Esteban et al., 2019). While ALT may not replace AROMA, we argue that ALT a viable alternative to AROMA given their performance is very similar.

We test the performance of these denoising algorithms in an ageing dataset (OASIS-3) that included participants with very mild cognitive impairment measured using the clinical dementia rating. It is important to validate denoising approaches in datasets that may present unique challenges such as registration or the presence of clinical conditions. ALT alongside AROMA performed well at denoising this fMRI data.

We do not compare these methods to global signal regression, since this debate has been extensively addressed elsewhere (Ciric et al., 2017; Parkes et al., 2018). Our primary aim was to introduce ALT and assess its performance relative to comparable denoising methods that do not include global signal regression.

FIX consists of several steps: spatial ICA, component-wise feature extraction, classifier training (hand-labelled data), components’ classification and denoising. The ICA step is performed with MELODIC (Beckmann & Smith, 2004), similar to ALT. In contrast to FIX, ALT’s hand classification is an optional (recommended) step to validate the programs performance in any unique dataset. Not relying on manual labelling is important because it is time-consuming, operator dependent, and requires expert knowledge about signal and noise fluctuations’ spatial and temporal characteristics. AROMA was created to automate ICA-based denoising. Thus, it requires no such hand classification, but can suffer from elevated signal removal in older adult populations due to motion and its classification performance is inferior to FIX (Carone et al., 2017). ALT on the other hand offers an automated alternative to FIX with the option to adjust performance to match manual labelling if there are concerns regarding the program’s performance in diverse datasets. ALT only uses two features, while FIX uses a random forest algorithm trained on over 150 features. Given that FIX uses additional information about the independent components, it is more suited to identify subtle patterns that distinguish signal and noise components based on the manual training labels.

A potential limitation of this study is the relatively high quality of fMRI data we used to test ALT, which showed relatively low levels of motion despite the advanced age of the participant population that typically elevates head movement. Thus, in a noisier dataset, ALT may not always perform as well as in the above sample. Moreover, although ALT is a user-friendly alternative to FIX and AROMA, ALT still requires the installation of both FSL and either MATLAB or the free alternative Octave. We hope that ALT will make ICA denoising more accessible and facilitate the assessment of denoising algorithms to ensure the highest quality of resting-state fMRI data possible.