1 Introduction

An affective disorder is a condition in which a person's behavior, cognition, or emotional awareness is disturbed. Such changes are clinically significant as they result in psychological and physiological imbalances. It may also affect various mechanisms that contribute to development of mental abilities. Mental illness is frequently linked to considerable distress or impairment in personal, professional, and social life [1]. The Diagnostic and Statistical Manual of Mental Disorders (DSM) and the International Classification of Diseases (ICD) are most extensively used as a reference for classifying different types and categories of mental disorders worldwide. Depression, Schizophrenia, Attention-Deficit Hyperactivity Disorder (ADHD), Mood Disorders, Stress and Anxiety Disorders are some of the prominent types of affective disorders [2,3,4].

Mental illnesses are becoming a rising public health issue that affects millions of individuals worldwide, causing significant misery, incapacity, and even economic disruption. World Health Organization (WHO) has stated that 33% of Years Lived with Disability (YLD) across the globe are related to mental disorders. As per WHO, more than 325 million people suffer from depression and anxiety disorders [5]. Behavioral and mental health disorders presumably account for around 25% of the worldwide disease burden [6]. Mental illness is often associated with negative consequences like economic burdens, reduced worker wages, poverty, premature mortality, personality disorders, suicide attempts, high nicotine consumption, caffeine, and alcohol. It can lead to chronic diseases such as cancer, heart problems, and physical illnesses [7,8,9,10,11]. Early diagnosis of mental disease may facilitate timely therapeutic measures. Specific preventive measures may also help reduce mental disorders' incidence and recurrence [12].

Numerous investigations have discovered a direct correlation between mental disorders and behavior. Mental illnesses affect physiological indicatorss like nerve-to-muscle signal transmission, heart rate, eye movements, skin conductance, and brain waves. Behavioral attributes like text, speech, gesture, and Facial Expressions (FEs) may also indicate a potential mental disorder [13,14,15]. Physiological responses provide a quantitative method for evaluating the relationship between brain signals and cognitive skills. These signals are sensitive to neuronal and circulatory changes caused by brain activation. A few reliable modalities to identify mental disorders are Electromyogram (EMG), Electroencephalogram (EEG), Electrooculogram (EOG), Electrocardiogram (ECG), and parasympathetic electrical characteristics like Electrodermal Activity (EDA), Heart Rate Variability (HRV), changes in pupil dilation, and fluctuation in skin conductance. Also, behavioral measures like facial expressions, voice, speech, hand gestures, and text are used for emotion recognition [16,17,18]. Machine learning approaches for mental disorder detection are divided into unimodal and multimodal approaches. Unimodality refers to the use of only one signal for classification. Multimodality refers to the use of more than one signal for problem identification. Features extracted from these modalities are used for classifying different types of mental disorders using different deep learning and machine learning techniques [19,20,21,22]. Finding success in the identification of mental disorders involves evaluating the best feature extraction strategy using a sizable amount of highly dimensional data [23].

In light of the points made, conducting a thorough literature assessment is necessary to identify contributions representing state-of-the-art research in the field. Additionally, it can be beneficial for modern scholars to look into various multidisciplinary open research challenges, including the combination of behavioral sciences, computer science, machine learning, psychology, and sociology. Following research questions were developed to achieve the objective of this study:

  • (RQ1) Which behavioral and physiological modalities (signals) could be used for machine learning-based detection of affective disorders?

  • (RQ2) Which machine or deep learning methods for detecting affective disorders best utilize these modalities?

  • (RQ3) What potential challenges can there be when employing these modalities to identify mental health issues?

The answers to these questions are intended to help develop appropriate research in mental disorder detection using machine learning techniques. The rest of this manuscript is organized as follows; Section 2 discusses the most recent five-year surveys that used various approaches to identify mental disorders. Section 3 describes the various Physiological and Behavioural signals used in Mental Disorder Detection. Section 4 describes the numerous unimodal and multimodal approaches used in mental disorder detection. Section 5 provides information about features extracted from different modalities for mental disorder classification. Section 6 provides information on signals and ML classifiers utilized in investigations and scientometric analysis based on physiological signals, mental disorder detection and machine learning, followed by challenges. Section 7 concludes the survey findings with future scope.

2 Comparison with contemporary surveys

The current survey has explored physiological and behavioural signals for detecting mental disorders and recognising emotions using the machine or deep learning technologies. When compared to contemporary surveys, several used one or more modalities to identify mental illness or recognise emotions using machine or deep learning techniques [24,25,26,27]. Certain studies have attempted to identify a specific mental illness, such as epilepsy, depression, and seizure detection using a single modality. A few of the surveys have used multiple signals focused on mental disease detection, overall wellness, mental health, and emotional well-being based on a multimodal approach [24, 28, 29]. This survey intends to investigate and explain the use of unimodal and multimodal-based systems for detecting mental disorders. A detailed description of the same has been added in Sections  4 and 5, respectively.

Further, this survey also examines micro-expressions used for detecting mental disorders. Existing surveys have primarily ignored the micro-expression-driven investigation of mental disorders. Figure 1 presents an alluvial diagram illustrating the association between several modalities, mental disorders, machine or deep learning methodologies, challenges, and potential outcomes. The study of the last five years, from 2018 to 2022, shows that most work concentrated on EEG signals for mental disease identification, employing both machine learning and deep learning algorithms [30,31,32,33,34]. Training and testing of a system based on machine or deep learning requires an extensive amount of data. Accordingly, benchmarked datasets, data annotation or labelling, and ethical considerations for data collection are the main challenges encountered in this research [35,36,37].

Fig. 1
figure 1

Alluvial diagram on Mental Disorder Detection using Physiological & Behavioural Signals using Machine Learning

3 Physiological and behavioural signals used for mental disorder detection

Mental health seems to be a significant reflection of human activity in the brain and impacts every part of the human body [38]. A daily monitoring schedule is necessary to identify and gauge the severity of many mental disorders, such as Post-Traumatic Stress Disorder (PTSD), anxiety, stress, depression, ADHD, schizophrenia, bipolar disorder, dementia, and other conditions of a similar nature. This section explains the physiological and behavioral indicators used in mental health research and machine learning algorithms, as shown in Fig. 2.

  1. a)

    EEG

Fig. 2
figure 2

Physiological & Behavioural Signals

EEG signal data is widely used to understand brain functionality and irregular brain activities. It is frequently used to identify mental diseases like depression, epilepsy, schizophrenia, and mental stress due to its high spatial and temporal resolution, noninvasive approach, and condensed setup characteristics.

Earlier research for classifying neurological disorders (epilepsy, mental and motor imagery tasks) based on automatic features extraction used minimum, mean, mode, median, standard deviation, min, maximum, first, third and inter-quartile range features of EEG signals. A clustering technique-based least square support vector machine (CT-LS-SVM) was used [39]. In a further study, Approximate Entropy (ApEn), Phase Entropy, and Sample Entropy (SampEn) features were used to identify epilepsy. Machine learning algorithms, including Naive Bayes (NB), Fuzzy Sugeno (FS), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), were used to achieve high accuracy [40]. Wavelet entropy, Relative Wavelet Energy (RWE), interhemispheric asymmetry, and band power features were extracted to classify the normal and depressed individuals using Linear Discriminant Analysis (LDA), Radial Basis Function Network (RBFN), and quadratic discriminant analysis with deep learning approach Multi-Layered Perceptron Neural Network (MLPNN) [41]. Fractal dimensional features, including Higuchi, Katz, and Petrosian, along with band power, autoregressive coefficient, and wavelet used with eyes opened and closed to classify ADHD and Bipolar Mood Disorder (BMD). Multiple classifiers were applied, such as Linear Discriminant Analysis (LDA), Direct LDA, boosted JD-LDA (BJDLDA), and XCSF Extended Classifier System for Function approximation -LDA (XCSF- LDA). Findings indicate that BMD patients' brain function produces high fractal dimensions compared to ADHD patients, which results in a more unstable behavior [42]. Particle Swarm Optimization (PSO) technique is used to process and extract alpha and theta bands. The hybridization of PSO with Artificial neural networks (ANNs) is used to identify unipolar and bipolar patients [43].

Power Spectrum, Synchronization Likelihood, Wavelet Transform, Fractal Dimension, Fast Fourier Transform (FFT), Detrended Fluctuation Analysis, Lyapunov Exponent, correlation dimension, and entropy were used to detect Autism Spectrum Disorder (ASD). The feature data was filtered of the artifacts using a density-based clustering. KNN and SVM classification techniques were applied to classify the autistic and healthy groups [44]. Subsequently, an investigation used features of statistical (mean, median, etc.) and entropy (Renyi entropy, Log energy, Threshold, and Shannon entropy). Also, alpha, beta, theta, gamma, and delta frequency bands were measured. Multiple filtration methods were applied to remove noise and artifacts due to improper electrode placement, dirty hairy skin, ocular, and impulse. Combined Discrete Wavelet Transform (DWT) and Shannon entropy with ANN classifier outperformed for discriminating autistic and normal subjects. The Shannon entropy technique estimates the signal information average value and its average uncertainty [45].

Temporal and spatial features were extracted to classify the Major Depression Disorders (MDD) among depressed and healthy controls. A one-dimensional convolutional neural network (1DCNN) with Long Short-Term Memory (LSTM) was used [46]. Deep learning techniques Convolution Neural Network (CNN) and LSTM are used for the identification of depression using right and left hemisphere EEG readings of the brain [47, 48]. Similar studies show that the right electrode values are greater than the left in a depressed patient. CNN-based DerpNet model was used to classify healthy and depressed persons [49, 50]. Along with CNN, Random Forest (RF) and KNN used for depression detection with power spectral density features of theta, alpha and beta bands of EEG [51]. EEG-based machine learning algorithms, LSTM, CNN, KNN, and SVM were used for the identification of Schizophrenia patients along with healthy subjects [52,53,54]. Katz's, Higuchi's, Largest Lyapunov Exponent (LLE), Petrosian's, Approximate Entropy, beta, alpha, gamma, and theta bands along with CNN, Naïve Bayes (NB), Logistic Regression (LR), KNN, Multilayer Perceptron, and Adaboost were used to differentiate among healthy and ADHD children [55,56,57].

Features extracted from EEG signals can be categorized into nonlinear and linear analyses. Wavelet entropy, Detrended Fluctuation Analysis (DFA), Higuchi's Fractal Dimension (HFD), Petrosian, Approximate Entropy, Katz fractal dimension, relative wavelet energy, and correlation dimension are used in nonlinear techniques for the detection of mental disorders. Interhemispheric asymmetry, the power spectrum of alpha, beta, gamma, theta, and delta bands, and amplitude and frequency features of EEG signals are used in linear techniques.

  1. b)

    ECG

The most effective and frequently accepted method of utilizing cardiac impulses to predict mental health state is the ECG. Stressful situations influence the ECG signal's cardiac rate. Numerous physiological adjustments brought on by stress might raise the heart rate. Distinct morphological characteristics of ECG signals include P-QRS-T intricate peaks and varying time intervals. A genetic Algorithm was applied to the ECG signal information to select specific features of unorganized data. The selected feature information is further processed through Variational Mode Decomposition for discrete signal segmentation and complexity reduction. In addition, Artificial Bee Colony (ABC) was applied to the decomposed signal data for the optimum threshold value of the signal. Particle Swarm Optimization (PSO) was used for the final feature selection of ECG. The SVM classifier was applied to the selected feature data to predict the stress level among normal and stressed subjects [58].

To identify the stress level of an automobile driver, the average of QRS, RR, QR, QQ, SS, and RS intervals was calculated. Further, average intervals of heartbeats were extracted from the ECG signal using the NetBeans Java platform. The stress level classification was divided into classes 0, 1, and 2 for low, moderate, and high, respectively. Multiple machine learning classification algorithms were used, such as Multilayer perceptron, J48, RF, and Naïve Bays [59]. Further, the research used R-peaks, RR interval and extracted its time and frequency domain features, including Mean-RR, Quartile Deviation (QD), Standard Deviation (SD), Median (MED), Low-frequency (LF), and High-frequency (HF) power value and their ratio of Heart Rate Variability (HRV), 20th and 80th percentile of RR interval value. HRV, HF-energy related with parasympathetic and LF-energy related with sympathetic activity nerve. Conventional and deep learning classifiers include XGboost, KNN, Decision Tree, Linear SVM, Adaboost, Bayes, Logistic, RF, CNN, and Bidirectional LSTM (BiLSTM) were used for moderate and high psychological stress identification. Deep learning techniques were used for automatic feature extraction and showed better performance in 10 s ECG window size [60].

KNN and Probabilistic Neural Network (PNN) classification algorithms were used for stress detection by capturing short-term ECG signal data from 60 subjects (male and female) during a Stroop Color Word Test (stress induction task). Impulse Infinite Response (IIR) methodology was utilized for ECG signal filtration and preprocessing, and extraction of HRV was done through the Pan Tompkins method. HRV statistical features of time domain such as SDNN (SD of RR interval), NN50, PNN50, Successive Difference of RR interval (RMSSD), mean RR, power, and HR with geomatical feature Triangular Index of RR interval (TINN) were used. Frequency domain features like LF and HF with band power between 0.04 to 0.15 Hz, TF (HF + LF), and their ratio was used to calculate covariance, kurtosis, 2nd, 3rd, 4th order cumulant, power, entropy, SD mean, min and max for classification stress. The findings demonstrate that the Autonomous Nervous System (ANS) is directly related to stress in both signals employed for stress detection [61]. In subsequent research, along with Stroop Color Word Test, other stressors like stressful videos, events, and self-introducing tasks were included. Spectral and temporal features of HRV viz. HR (SD, mean), NN50, TINN, LF & HF (normal, peaks, ratio), RMSSD, SDNN, PNN50 were used. SVM, KNN, NB, LDA, Generalized Linear Model (GLM), and RF were used for stress detection [62].

KNN, RF, SVM, NB, C4.5 Decision tree, and Logic Boost were used to identify the normal and ADHD subjects through heart rate, health, and motor activity data. Kurtosis, skewness, SD, permutation entropy, continuous wavelet transform, absolute energy, autocorrelation values, and fast Fourier transform features of HRV were calculated. To accurately identify individuals having ADHD, essential features exhibiting strong patterns have been obtained using Principal Component Analysis (PCA) [63]. Shimmer, a wearable ECG instrument, was used to collect the ECG signal data. The Pan Tomkins algorithm was used for HRV and QRS data from the children with mild and moderate ASD while watching the audio and visual aids. Mean, median, Kurtosis, and approximate entropy features (emotional) of ECG, QRS, and HRV were used to classify ASD through SVM, KNN, and Ensemble classifiers [64]. RNN and LSTM were used to predict depression with different heartbeats like PVC (Premature Ventricular Contraction), abnormal and normal measured through ECG signal [65].

  1. c)

    GSR/EDA

GSR, also referred to as Electrodermal Activity (EDA) measured as one of the reliable predictors of stress due to its two major components, i.e., Skin Conductance Response (SCR) and Skin Conductance Level (SCL). GSR features such as the number of peaks, mean, maximum peak amplitude, and the difference between amplitudes EDA and the average amplitude of SCR peaks are used to predict the stress level based on NB, SVM, RF, and KNN classification techniques [66]. BioRadio 150 device was used to capture the EDA signals at 960 Hz from 20 subjects in a controlled simulation environment to detect a driver's anger and stress emotions. A bandpass filter was used with 0.5 Hz – 2 Hz, denoising the EDA signal data. Power Spectral Density (PSD) features of EDA such as median, variance, and mean, extracted to identify the three states, namely, stress-anger, neutral- anger and neutral-stress, using an SVM algorithm with tenfold cross-validation [67].

Further study was done to identify the driver's psychological stress divided into three levels: low, high, and medium, and foot EDA signal data explored with the LDA classification algorithm. Min, max, peak values, and their ratio features were extracted from the signals. The findings show that EDA signals are sufficient for detecting low, high, and medium stress levels [68].

Recognizing a child's mental development and the psychology of individuals suffering from various traumatic situations requires Cognitive load (CL) analysis. GSR data from 40 students were collected to assess the CL of individuals solving math problems, Sudoku, and comprehension reading tasks. GSR time domain features like SD, mean, mean of 1st and 2nd difference of filtered signal, mean of 1st difference of normalized signal, my pulse percentage rate (MYOP), log detector, PSD, and entropy were extracted for evaluating CL parameters such as Lempel-Ziv Complexity, Shannon Entropy, Correlation Dimension, and Hurst Exponent. For the classification of different task loads, RF, NB, and KNN classifiers were used [69]. EDA differential features like skewness of SCL, SD of SCL, non-specific SCR, and mean amplitude of SCL were extracted to differentiate between healthy and MDD patients. EMD signal data was captured during the autonomic activation of performing the five assigned tasks: baseline, mental arithmetic activity, recovering from stress, relaxing, and recovering from relaxation. SVM – recursive feature elimination was used to select significant features of EDA and NB, and decision tree, SVM, and KNN were applied to achieve higher accuracy for MDD detection [70].

  1. d)

    Behavioural

Behavioral biomarkers can be subdivided into verbal like voice and speech and non-verbal biomarkers like Facial Expressions (FEs), eye movements, text, and gestures. These biomarkers can be used to identify various developmental and neurological mental disorders. Besides the common FEs sometimes referred to as macro-expressions that people encounter on a daily basis, emotions could also be represented in a specific format called Micro-expressions (MEs) under particular circumstances. During high-stakes circumstances where people attempt to hide their genuine feelings, MEs are FEs that betray those emotional responses [71]. Different features of FE, such as Localized Binary Pattern Histogram (LBPH), localized Gabor, and Histogram of Oriented Gradient (HOG), were used for depression detection manually and automatically. The Facial Action Coding System (FACS) decomposes the FEs in Action Units (AUs), which refers to different facial muscles related to a certain emotion. Sixty-six facial landmarks were tracked through Active Appearance Models (AAM), and an SVM classifier was used to predict the severity and risk of depression [72]. In a subsequent study, vocal and facial features were extracted from a video based on a Motion History Histogram (MHH), including low-level descriptors (LLDs), Local Binary Patterns (LBP), Edge Orientation Histogram (EOH), and Local Phase Quantization (LPQ) from each frame of video data. LLD was extracted between 25 to 60 ms with overlapping 10 ms, and selected energy, entropy, Zero Crossing Rate, shimmer, flatness, PSY sharpness, and probability of voicing features. Linear regression and Partial Least Squares (PLS) algorithm were used to represent the association between the depression scale and feature set [73].

Spatial and temporal variations in facial texture, structure, and expressions are typically observable in depressed patients. 3D CNN-based Multiscale Spatiotemporal Network (MSN) was used to automatically recognize depression from video frames of Audio-Visual Emotion Challenge (AVEC) 2013 and AVEC2014 datasets. A Multi-Task Cascade Convolutional Network (MTCNN) was used to detect and align the faces [36]. A further study utilized facial expressions with CNN to classify individuals with or without depression. Amsterdam Dynamic Facial Expression Set (ADFES) used as stimuli while capturing the expressions with five emotions named happy, surprised, angry, sad, and disguested. The task completed by using algorithms that locate facial calibration regions to recognize and mark facial characteristics [74].

Further, the CNN algorithm was used to identify neurological disorders like Alzheimer's, Stroke, and Parkinson's through FE recognition. Viola Jones algorithm was employed for accurate face recognition and cropping images in 148 × 148 size. The descriptive and analytical findings were validated using the Karolinska-directed emotional faces (KDEF) dataset [75]. A MEs based deep learning framework was implemented to identify elderly depression. The classification of images was carried out using the Visual Geometry Group (VGG) -16 model. An additional jump connection architecture was introduced to VGG-16 in order to prevent the loss of image features, and feature amalgamation using the output of convolution segments 3–5 was performed [76]. A subsequent investigation employed CNN, RF and SVM classifiers to detect depression using MEs. Various geometric features, including 2D and 3D landmarks, eye gaze vectors, the presence and intensity of AUs, head orientation, head position, head rotation, and point distribution model parameters, were retrieved from MEs.. The results showed varied degrees of accuracy in identifying sad facial emotions using RF, SVM, and CNN utilising the ALexNet model as well as CNN through the VGG-16 model adopting eye region data [77].

Detection of developmental disorders such as ASD, Intellectual Disabilities (ID), Fetal Alcohol Syndrome (FAS), Progeria (PG), Cerebral Palsy (CP), and Down Syndrome (DS) using facial images was explored with Deep CNN and SVM. The Alex-Net model was used for the effective alternation in preprocessed facial data in the CNN model for global and local feature extraction. According to Alex Net's specifications, the image size was reduced to 227 × 227. mean average recall (MAR) and mean average precision (MAP) was calculated for comparison among normal and developmental disordered subjects [78]. In a similar study, a CNN-based features extraction algorithm was used with four facial attributes: AU, Arousal, Valance, and Expressions for the classification of ASD. EmotioNet and AffectNet databases were used to train the model, divided into training-validation and training-validation-testing subsets. F1-score was used for performance evaluation of AU and expressions. The correlation coefficient was used for arousal and valance. LDA, SVM, Logistic regression, and XGBoost were used for ASD classification [79].

Table 1 lists the benefits and drawbacks associated with different modalities that have been highlighted in various research.

Table 1 Pros and Cons of Physiological & Behavioural Signals

It's important to note that no single method is a definitive diagnostic tool for mental disorders. Instead, combining these methods and clinical assessment is often used to comprehensively understand an individual's mental health. Additionally, advancements in technology and research may improve the accuracy and specificity of these methods over time.

4 Machine learning classification techniques

Human behavior and physiological signals offer a variety of biomarkers for various mental states and illnesses. These behavioral cues or signals are called modalities. Data from these modalities can provide vital information to diagnose these diseases. This information can be processed using machine learning algorithms to determine mental disorders' prevalence, nature, and severity. A modality can be processed in a multimodal or unimodal way. Following are the examples of unimodal and multimodal techniques and how they are used.

4.1 Unimodal approach to detect mental disorders

The term "unimodal" describes the classification of emotions and mental illnesses using only one signal. Numerous studies have focused on one signal for diagnosing and predicting mental disorders [93]. Since unimodal systems are the cornerstone of multimodal systems, they must operate effectively to build effective multimodal systems [86]. Table 2 below lists essential information from earlier publications on diagnosing mental disorders using unimodal approach.

Table 2 Summary of Unimodal and Machine Learning Techniques used for used Detecting Mental Disorders

4.2 Multimodal approach for mental disorder detection & emotion recognition

Multimodality refers to the use of more than one signal for the diagnosis of a mental disorder. Multim-odal approaches could be used either at the feature level or decision level [93].

  1. a)

    Feature-level Fusion

When a single machine learning classifier is applied to multiple modalities, it is called feature-level multimodal fusion. For example, feature data from facial expressions and physiological signals like ECG, EMG and GSR can be classified using machine learning techniques.

When a single machine learning classifier is applied to features extracted from more than one modality, that is called feature level fusion in multimodal. For example, feature data taken from facial expressions and physiological signals (ECG, EMG, GSR etc.) are extracted, and then the classifier will apply this for results [95].

  1. b)

    Decision-level Fusion

When each modality has its own classifier, and the outputs are merged to make a decision, it is called Decision level fusion in multimodal. The outcome derived through this method is a combination of prior probabilities calculated by the unimodal systems' [96].

The evolution of innovative smart applications that strive to detect and analyze affective information included in multimodal sources has been aided by developments in signal analysis and Artificial Intelligence (AI) [86]. Table 3 provides information on previous research done with the combination of signals and their area, findings, limitations and future scope for mental illness identification and prediction.

Table 3 Summary of Multimodal and Machine Learning Techniques used for used Detecting Mental Disorders

It has been observed that when employing multiple algorithms, one consistently surpasses the others regarding classification and prediction accuracy, specificity, and sensitivity. Unimodal and multimodal-based system have their characteristics. Most of the frameworks used SVM, CNN, RF, NB, and KNN to achieve greater accuracy, as shown in Tables 2 and 3. For instance, in [105], SVM outperformed KNN and LR algorithms with high accuracy. Heterogeneous EDA characteristics and features gathered from stress-related and relaxation activities were highly significant by feature selection utilising a support vector machine recursive feature elimination (SVM-RFE) [70]. In another study [106], stress identification outcomes obtained by AB and RF classifiers are promising. However, binary and multi-class classification is the primary utilisation of ML algorithms. Multiple classes are suggested to be utilised to detect various stress levels; however, too many classes may impair classification effectiveness if there aren't enough combinations for each class. 1DCNN, 2DCNN, LSTM, and combinations of these classifiers are used to detect the MDD but 1DCNN with LSTM performed better than each method used separately [46]. SVM, RF, and CNN classifiers have all produced varying degrees of accuracy, however RF leveraging only the eye region produced the highest accuracy. Due to the unoptimized dataset, SVM algorithm performed worse than other classifiers [77].

5 Modalities features used in mental disorder detection

Several techniques can extract distinct features from different modalities corresponding to a certain mental health problem. As employed by various researchers, relevant modalities and their characteristics are listed in Table 4.

Table 4 Feature set associated with different modalities

6 Discussion and challenges

A systemic review of mental disorder detection and prediction has been discussed in this paper. The current section presents an description of scientometric analysis based on Scopus Data from 2010 to 2022, with prominent machine learning algorithms and physiological signals used for mental disorder detection and computerized decision-making. Figure 3 provides the detail of documents published from 2010 to 2022. The graph suggested that research was active from 2012 onwards but slowed in 2015 and 2016. Afterward, research growth expanded from 2017 to 2021, and it continued until 2022.

Fig. 3
figure 3

Publication Trend from the year 2010 till June 2022

From Fig. 4, it can be seen that most research was done on depression compared to other mental disorders. ASD research has gained significant attention as more than sixty percent of research publications are exploring it. Another prominent topic in the research on mental disorders comprises machine learning and physiological signals is stress and MDD.

Fig. 4
figure 4

Mental Disorder Percentage Distribution

Figure 5 displays journals about diagnosing mental disorders and decision-making using machine learning based on physiological and behavioral data. The most renowned journals that have published the relevant data are determined to be Nature, The Lancet and Science.

Fig. 5
figure 5

Count of Prominent Journals Publication

According to the Scopus data displayed in Fig. 6, Germany, United States, United Kingdom, and other notable nations have investigated the assessment of mental disorders through machine learning, behavioral and physiological signals. The United States has exerted the most effort in the relevant area.

Fig. 6
figure 6

Documents Count by Countries

The data from the authors' research on identifying mental disorders using behavioral and physiological data is shown in Fig. 7. For inclusion in the list mentioned above of prolific authors, a minimum of five documents and citations were specified for each author. Based on their publications, the five topmost active authors are Liu Y., Wang L, Kessler R.C., Li Y., and Wang J.

Fig. 7
figure 7

Proninent Authors Publication Count

Figure 8 displays how frequently different modalities are used to identify mental problems when machine learning is used. The most often utilized signal for detecting mental disorders was discovered using EEG. In this subject, ECG and GSR are widely investigated. Some studies have demonstrated remarkable results when various modalities are combined.

Fig. 8
figure 8

Percentage Distribution of Articles using Physiological Signals

In investigations of mental diseases, SVM, KNN, and Random Forest are the most widely implemented machine learning algorithms, as shown in Fig. 9. The studies show that CNN has the highest accuracy results due to its automatic feature extraction technique.

Fig. 9
figure 9

Prominent Machine Learning Algorithms

6.1 Datasets

With the development of technology, machine learning algorithms and pertinent physical and behavioral indicators are used to identify mental illnesses. The algorithm requires data to receive information from the signals and evaluate the results based on whether it is used for deep analysis, feature extraction, visual recognition, or decision prediction. Numerous datasets from different modalities are used to detect mental disorders. One can use datasets to accomplish specific objectives to arrange unstructured data from various sources. A training dataset is often the first data provided to a learning system. The training data that serves as a basis for further development determines how effective and beneficial the machine learning system performs.

The Department of Epileptology at the University of Bonn in California developed the primary EEG dataset used for epilepsy identification publicly available online. Three forms of a single channel and 23.6-s-long EEG signals were chosen: standard, pre-ictal epileptic background, and ictal epileptic seizure. One 128-channel amplifier setup was used to record complete EEG signals, which were then segmented with 12-bit A/D precision and sampled at a frequency of 173.61 Hz [39, 40]. EEG datasets for seizure detection with time and frequency domain algorithms are mentioned in this review article [112]. King Abdulaziz University (KAU) hospital Brain-Computer Interface (BCI) Group provided the EEG dataset for the detection of autism. The dataset was split into two groups while individuals were in a relaxed condition; the first group termed the "normal group," consisted of ten willing volunteers in good health. The second one has nine participants and is called the "autistic group." The 16 channels of the data gathering system are labeled using the 10–20 universal acquisition standard [45].

MIT-BIH Arrhythmia Database is used for stress detection, which contains ECG readings of 47 subjects in 48 h. This was developed at the Beth Israel Hospital laboratory in Boston and validated by Massachusetts Institute of Technology (MIT) experts for the investigation of arrhythmia as well as disorders connected to arrhythmia [58]. ECG5000 dataset was used for depression prediction through time series classification of heartbeat rate. Estimated heartbeat rhythms for 5000 patients were included in this dataset and had 140 timesteps. Unclassified beat, regular, R-on-T Premature Ventricular Contraction (R-ONT PVC), Premature Ventricular Contraction (PVC) and Supraventricular Premature (SP) or Ectopic Beat (BP) heartbeats data were collected [65].

AffectNet and EmotioNet datasets are used for autism detection through facial expressions. Images with labels for all facial characteristics are not included in the dataset and contain 1,000,000 million images. It can be used to train the algorithms for AUs, six basic facial expressions, valence, and arousal attributes [79]. Karolinska Directed Emotional Faces (KDEF) dataset was used to identify neurological disorders through facial expressions. KDEF dataset contains images with seven emotions named happy, normal, sad, afraid, disgusted, surprised, and angry of 35 male and 35 female participants. It has distinct, diversified, high-resolution images [75].

Some other work demonstrated several datasets for sentiment analysis and emotion recognition based on numerous modalities employed in various studies [113]. Another study used the Audio/Visual Emotion Challenge (AVEC) 2014 depression dataset to measure depression through vocal and facial expressions. This dataset comprises 340 short videos from 292 participants who completed a Human–Computer Interaction assignment while having their webcams and microphones captured in various peaceful environments [73, 109], AVEC 2013 and 2014 [36], AVEC 2013, 2014 and 2016 [111], AVEC 2014: for mood, depression and emotion recognition[108]. Amsterdam Dynamic Facial Expression Set (ADFES) dataset is used for depression detection using facial dynamics [74]. The physiological biomarkers, including such EMG, ECG, EOG, and EEG with respect to the associated datasets, were discussed in these review articles related to mental disorder identification [23, 24]. Frameworks for detecting depression and recognising emotions are trained and tested using spontaneous datasets of micro-expressions, such as CASEME II, Spontaneous Actions andMicro-Movements (SAMM), SMIC, SFEW, and VAM. The number of participants, sample size, frame rate, resolution, and emotion classifications vary across all data sets [114].

6.2 Challenges

Research challenges associated with detecting mental disorders can be categorised into several key areas. These challenges can impact the development of effective detection methods and tools. These may also help in understanding primary challenges related to mental health diagnostics.

  1. a)

    Ethical Consideration

    Data collection must have ethical clearance and user permission. Ethical procedures must be adopted to collect and use data generated from different physiological and behavioural signals of different subjects for research purposes. Collecting information can usually be intricate because it requires acceptable ethical and confidentiality standards. Several medical professionals have raised concerns about the reliability and complexities of applying machine learning prediction systems in real-world diagnostic testing and the accuracy and efficacy of automatic identification approaches for the same [25, 35, 108, 115].

  2. b)

    Data Collection

    While collecting data and annotating it, several essential personal aspects including an individual's basic mood and native temperament should also be taken into considered [25]. Data normalisation is necessary for each person since physiological data varies from subject to subject [69]. Drawing meaningful conclusions from the raw data is difficult as physiological signal data comprises significant peaks and irregular transformation patterns. Feature extraction is necessary to extract and evaluate relevant information from raw signal data [116, 117]. Therefore, at the time of data collection, feature-level or decision-level data fusion approaches require a thorough understanding of the features of various modalities [95, 108].

  3. c)

    Technological Challenges

    Machine learning approaches are greatly dependent on data. Removing outliers, missing data values, and eliminating undesirable features are all parts of the data-preprocessing procedure that must be carried out accurately. While training the model, underfitting and overfitting is also a significant challenge. Predictions with fewer training parameters may be inaccurate or biased [115, 118,119,120]. Some applications, particularly in healthcare, have succeeded with deep learning approaches. The potential benefits of deep learning algorithms for mental health have not been extensively researched. Such studies are essential for researchers to persuade the medical community to use the predictive mental disorders detection system [24, 121].

  4. d)

    Experimental Setup

    Collecting physiological data of a person with mental disorders might be challenging using generic equipment. If the patient is noncooperative, physiological signals may vary due to intense emotions experienced by the person. Due to the high expense of information gathering involving human subjects and the fact that research procedures with variable circumstances are being developed, using a small sample seems prevalent in the mental health sector [26, 122].

  5. e)

    Limited Benchmarked Datasets

    There are relatively few standard and open-access datasets available regarding mental illnesses. While some datasets can be accessed under user license agreements, not all data is accessible. The main challenge to creating a data model for detecting mental illness is the inability to access training and testing data due to confidentiality and ethical constraints [35,36,37, 123].

  6. f)

    Sample Size

    The majority of the assessed studies used a minimal sample size. Machine learning algorithms can be robust while analysing enormous sample sizes, depending on how the predictive algorithm is set up. The domain of mental health employs small samples considering the expense of data gathering involving human subjects and the ongoing development of experimental guidelines for multiple scenarios [25, 62, 92, 124, 125].

  7. g)

    Validation

    Several research categories remain at the evidence demonstrating stage due to limited sample sizes and the lack of adequate acceptable validation from outside sources. Studies using institutional cognitive imaging are typically conducted on individuals with a history of mental illness. Perhaps physiological changes to the brain represent the cause of the sickness, a symptom of the illness, or an undetermined risk factor [46]. The clinical validity of the findings could be impacted by ambient noise, inadequate collection of data, and ineffective compliance. The researchers may collaborate with a clinical expert because it is essential to provide details like validation, ground truth, and biases. This might result in data analysis, increase accuracy, and mitigate deployment uncertainties [24, 126].

  8. h)

    Manipulation in behavioural signal

    The most prevalent ways to identify mental states are through effective facial expressions. Facial expressions and speech can both be altered in a particular mood. Behavioural data can be manipulated by certain individuals having less severe mental conditions [95, 102, 103].

  9. i)

    Interdisciplinary Deployments

    The multidisciplinary nature of these types of studies makes it indispensable that all disciplines remain in regular communication with one another. Interdisciplinary research presents opportunities and challenges, particularly at the interface between computer science and medicine [127]. Understanding the goal, ethical constraints, publication culture, experimental designs, and infrastructure-based requirements are essential for interdisciplinary research [128]. Applications for mental healthcare include system interfaces and supervision tools for clinicians and caregivers. The infrastructure must be secure, scalable, reliable, and robust regarding software and hardware. Machine learning models are used for prediction or detection only after validation [129].

7 Conclusion and future scope

This review has provided a current state-of-the-art overview of the existing literature on Machine Learning algorithms used for detecting and predicting mental disorders with the help of physiological and behavioral modalities. Reviewing such works may uncover the critical parameters used to assess the state of neurological developmental impairments, emotions, and their effect on daily life. In comparison to other modalities, EEG is the most widely used. The combination of multiple modalities in several studies suggests that more research into signal data fusion is needed to obtain improved accuracy. In the future, deep learning approaches can be used to extend the success of machine learning techniques for mental health prognosis. As a result, researchers need to explore and analyze data using multiple classification models to select the approach with a high accuracy rate. Also, the researchers' challenges and limitations must be managed effectively to obtain effective results that could improve therapeutic interventions and prediction.