1 Introduction

Survey of Health, Ageing, and Retirement in Europe have reported a substantial rise in the count of aging depressive adults. Further, a significantly high risk of suicidal behaviour was said to be associated with depression. The dynamic global economy and intense competition for employment opportunities have resulted in a considerable rise in the display of depressive symptomatology. Anxiety and depression have affected nearly twenty present of the world population [1, 2]. Anxiety and depression being internalizing disorders can even impair the socio-emotional development of an individual [3, 4]. Such internalizing factors may lead to health and social problems like psychopathology, suicides, drug abuse and even functional impairment [5, 6, 9, 11]. A depressive disorder is also integral to many impairments related to social functioning, which may persist even during remission [12, 13, 20]. Depression is a leading cause of ill health and disability worldwide. As per the World Health Organization (WHO), depression would be the second leading causes of disability and death in humans in 2020. Further, missed or misdiagnosis of depression may subsequently result in a high recurrence rate [14]. Contemporary diagnosis is primarily dependent on self -report or clinician’s rating scales. Self-report scales and inventories are highly vulnerable to subjective factors of reporting individual and are insufficient to support the diagnosis of depression [15, 16]. The clinician’s rating scales require high-end clinical skills, professional knowledge, and training. Many researchers have related depression with medical domains like neuroanatomy, endocrinology, and physiology [2, 17]. Although there is sufficient literature exploring neuroanatomy, neuroendocrinology, and neurophysiology of depression, yet researchers have not cited any robust clinical test for it. There is no laboratory test available so far, which could be used as a diagnostic tool for this disease. The available laboratory tests sans required sensitivity and precision, hence are not confirmatory [15]. This situation warrants a need for evaluation methods for not only diagnosing depression but also to estimate its severity. To comprehend the underlying mechanisms resulting in disruption of social behaviour of depressive patients, researchers have started understanding social cognition. Social cognition relates to the metacognitive abilities of human beings, necessary to decode mental states [10, 18, 19]. Mastering such mental states may help individuals to understand mutual thoughts, feelings, and intentions [21]. Research suggests that fifty-five percent of human communication happens through expressions, while just seven percent of information is transmitted through language [22]. Expressions are the language of emotions, face, eyes, and hand gestures are their communication channels. Study of interrelated expressions and state of eyes may substantially reveal hidden, yet observable external manifestations of depressive episodes. Such manifestations are actually observed by experienced medical practitioners during personal contact with patients of depressive disorder. The Diagnostic and Statistical Manual of Mental Disorders (DSM-5) recommends the study of facial expressions and human behaviour to infer depression [17]. This observation concludes the fact that there are direct observable differences between the behaviour and facial expressions of depressive patients and non-depressive people. Affective computing has significantly contributed to the understanding of human behaviour, thinking pattern, and decision making. Affective computing is an amalgamative study of facial expressions, posture analysis, speech recognition, along with understanding the correlation between emotion and expression. Depression analysis through facial encoding and emotive analysis has recently gained the attention of the research community. Numerous studies have explored the possibility to understand and identify depressive disorder through visual cues. Till now there is no computer model trained using facial cues rendered by patients of depressive disorder. Depression analysis through facial encoding and emotive analysis has recently gained the attention of the research community. This paper intends to explore the feasibility to differentiate facial expressions rendered by a patient suffering from depressive disorder from that of a non-depressive person against any given stimulus. The outcome of this research intends to facilitate doctors to identify potential depressive patients and make an early diagnosis. Rest of this paper is organized as follows; Section 2 corresponds to the review of the literature and assumed hypothesis. Section 3 details the experiment design. Section 4 describes the case analysis followed by results and discussion in Section 5.

2 Related work

This section comprises of relevant theories, methods, and techniques proposing the fusion of depressive disorder and information technology. First part briefs the proposed assistive technologies for early identification of depressive or psychiatric disorders. Significant platforms, frameworks, and systems for identifying depressive disorder signature shall comprise the second part. Numerous attempts have been made to predict and explain facial patterns concerning the behaviour displayed by patients suffering from a depressive disorder. The proposed theories were based upon emotional and behavioural dimensions and needed a testable hypothesis based on facial structure. This was attained by linking the contractions of facial muscles concerning expressing emotions [23]. Subsequently, many emotional dimensions were explored [24,25,26, 111]. Mood Facilitation hypothesis was proposed to understand the association between moods and emotions. It states that moods would increase the likelihood and intensity of matching emotions, while it would decrease the probability and severity of opposing emotions. As depression is pervasively negative mood is a prime symptom of depression, this hypothesis assumes that a depressed mental state will experience facial expressions with negative valence [27]. Technological advancements have greatly affected established practices of clinical psychology and psychiatry. There are numerous cyber resources imparting knowledge on psychiatric conditions, assessment procedures, and diagnosis [28, 29, 31]. Recently, the research community has explored the use of smartphones for collecting sentiment data [32]. It has increased the reach and propagation of therapeutic assistance for patients with depressive disorder. Many smartphone apps dealing with mental health concerns, such as depression and stress, are freely available over the internet [33, 34]. Internet-delivered treatments are collectively known as Internet-delivered Cognitive Behaviour Therapy (ICBT) [35]. Different versions of ICBT are driven by a software platform, which integrates assessment instruments, treatment materials, and technologies to facilitate early diagnosis of depression [36]. Such software platforms require regular administration to monitor progress, the severity of symptoms, and anticipating the risk of self-harm [37]. Security of patient data is also a crucial concern for such software platforms [38]. A primary challenge for translational clinical psychology and psychiatry is that it cannot deploy machine learning paradigms for diagnosis, prognosis, treatment prediction, detection and monitoring of potential biomarkers of cognitive disorders [39, 75]. Machine learning was subsequently explored to enhance computer-aided psychotherapy [40]. Accurate predictions are attainable through any quantitative data. There are a few assumptions like normality, homogeneity of variance as the estimates of model performance are empirically determined. Machine learning techniques are designed for the multivariate analysis of data sets with high dimensionality even when the ratio of cases to variables is limited [41].

Early machine learning studies studied the fact if diagnostic divisions between individuals could be summarized using high-dimensional data like structural and functional neuroimaging [30, 42]. Early researchers have tried deploying machine learning on mental ailments like Alzheimer’s disease, depression, and Schizophrenia [43,44,45,46,47]. Recently, machine learning has been extended to a broader diagnostic spectrum ranging from an anxiety disorder, drug addiction, anorexia and other phobias [48,49,50, 69]. Literature suggests that machine learning can identify patients with psychiatric disorders with an accuracy of 75% [42, 51, 75]. Researchers have also successfully deployed machine learning to segregate cases of bipolar and unipolar depression [52,53,54]. Subsequently, machine learning assisted researchers with depression from schizophrenia and psychosis [44, 75]. These studies have found machine learning paradigms extremely useful for diagnostic studies and clinical decisions, where ever diagnostic circumstances are unclear. This appears to be a promising research direction, which is focused on enhancing the clinical utility of cases, where diagnoses are vague. Whenever diagnostic assessments are complicated, time-consuming, or costly, then machine learning could be the best option to look forward to [47].

Psychological traits of individuals could be easily identified through digital footprints with a high degree of accuracy. Behaviour on social networking platforms can reveal individual characteristics like sexual and political orientation and ethnicity. There are algorithms which can make personality predictions even better than acquaintances [55,56,57]. Studies reveal the fact that generic facial recognition algorithms can distinguish between hetero and homosexual orientation with a precision ranging from 71 to 81 percent. Further, the linguistic contents shared over social media posts could predict personality traits, gender and the linguistic features [58,59,60,61]. A variety of papers have documented the possibility of detecting mental health states from social media [60, 62] as well as physical health issues of communities such as heart disease.

Technology can facilitate continuous monitoring of psychoemotional state for individuals at high risk. The world community is looking for smart solutions to identify indices of cardiometabolic risks associated with stress reliably. Semiotics is one such project funded by European Union to develop a system for early assessment of depressive symptoms based on visual cues and facial expressions [63, 64]. The depressive disorder could even be revealed through numerous non-verbal signs [65, 66]. Sudden variance in the tonic activity of facial muscles, skin conductance, pulse rate often signifies intense emotions, which may help to identify depressive behaviour. Few of the researchers have noticed depressive disorder patterns even within electroencephalographic recordings [67]. Functional Near-Infrared Spectroscopy (fNIRS) has also been experimented with [68, 70]. The speech demonstrates a prominent nonverbal channel depicting the mental state of the speaker, which may include dominant attributes of depressive behaviour [71]. Depression being is a prominent mood disorder, predominantly affect facial appearance as well as body posture [65, 66]. Salient facial features like eyes, mouth, and eyebrows are predominant in accessing depression. Study of pupil dilation is one of the prominent areas of research. It was found that pupillary responses for positive stimuli were faster in non-depressive patients in comparison to negative stimuli [72]. Contrarily, depression patients display slow pupillary responses for positive stimuli with reduced cognitive load [73, 74, 76,77,78, 108]. Subsequently, attentional and pupil bias have also been investigated to predict depression symptoms. Saccadic eye movements in terms of latency and duration also differ for depressed and healthy participants [79, 80]. Further, facial action units were studies in terms of frequency of occurrence, mean duration and onset/ offset ratios for assessment of depression [81, 82]. Numerous studies have reported promising results on the application of action units to automatic depression assessment [7, 8, 83,84,85,86,87,88,89,90,91,92,93]. Typical facial expressions of varying intensity have been found associated with basic emotions like happy, surprise, sadness, disgust, anger, and fear. Measured intensity and frequency of these primary expressions have found to be low in case of depressed individuals [77, 83, 84, 91, 94,95,96,97]. Further, facial features like eyes and mouth were also studied along with gaze direction, averting eye contact, lazy eyelid activity, less blinking of eyes, reduced iris moment, reduced smile intensity, duration and mouth animation [98,99,100,101,102,103,104].

From the literature cited, it was observed that algorithms proposed to identify depressive individuals not only offer satisfactory results but also have better predictive power than those of friend ratings or other established procedures. Further, these algorithms provide an excellent way to collect data using machine learning algorithms. Furthermore, the published research sans recorded intensities of various expressions for depression patients. Moreover, there appears a need to associate and validate recorded emotive concentrations of depressive patients in comparison to non-depressive individuals.

3 Methods

The methodology is crucial for any experiment about the identification and treatment of Major Depressive Disorder (MDD). Attention Deficit Hyperactivity Disorder (ADHD) questionnaire [105, 106] was briefed to a group of four hundred one volunteers pursuing engineering education and between the age group of nineteen to twenty-three years. There were 254 male and 147 female volunteers. Subsequently, ADHD questionnaire was distributed amongst volunteer participants. The filled-in ADHD responses were studied by practicing clinical psychologists, to find out potential patients of depressive disorder. Amongst three hundred and eighty-seven ADHD questionnaire responses received, seventy-two respondents were identified as potential patients of depressive disorder. Out of theses anticipated depression patients, thirty-eight were called for a personal assessment/ interview as per DSM-IV [17] criteria. The assessment of thirty-eight volunteers made by clinical psychologists, confirmed eighteen cases of depressive disorder while twenty were declared not depressive. Eighteen confirmed patients were shown Amsterdam Dynamic Facial Expression Set (ADFES) as a stimulus. ADFES offers annotated faces displaying five basic emotions. Images each of angry, happy, sad, surprised, and disgusted expressions were presented successively on a computer screen for 500 ms, followed an immediate blank screen. The faces had variable intensities of each emotion from 0% full emotion to 100% full emotion, in 25% steps. Neutral facial expressions were also presented, giving a total of 120 facial presentations. Sample images from ADFES are shown in Fig. 1. Participants were instructed to press one of six labeled buttons on a response pad (the five emotions and neutral) as quickly and as accurately as possible. Their facial expressions and relevant emotions were videos graphed while they were responding to ADFES stimulus inputs.

Fig. 1
figure 1

Sample images from ADFES dataset used as the stimulus

Subsequently, the same procedure was repeated for twenty proven non-depressed individuals. Frequency of recorded facial expressions and quantitative estimates of emotions expressed against the given stimulus were used to train a computer model. Facial emotion recognition of participants was done using Convolutional Neural Network (CNN), while facial features were identified using Dlib Machine Learning library [107]. The architecture of CNN model used is (Conv 5 × 5, Maxpool 2 × 1, Dropout, Conv2 3 × 3, Maxpool2 2 × 1, Dropout, Conv3 3 × 3, maxpool3 2 × 1, Dense, Softmax). This experiment does not consider the analysis of vocal responses or any other biometric like Electrocardiography, skin conductance, pulse rate, eye gaze tracking, or Electroencephalography as they contribute towards the future scope of this experiment.

In the experiment, a Hikvision DS-2CE1AD0T-IRPF 2MP (1080P) camera was used to record the facial expressions. The resolution of the camera photo is 1920 × 1080 pixels. The resolution of camera video is 1080P, 25fps. The experiment was conducted in a well-lit room. The participations were seated at a distance of 1 meter in front of the camera. A monitor screen which will play the experiment content like basic facial expressions and emotional pictures to participations is placed in front of the subjects. The camera is beside the monitor, and the deviation angle is less than 10 degrees. At the same time, the head of participations and the collection equipment are at the same height. Facial landmarks can prove to be vital for facial expression analysis as shown in Fig. 2. It is driven by algorithms, which localize fiducial points of the face to mark and identify facial features. Constraint Local Models method, Active Shape Models, Active Appearance Model, 3D Landmark Modelling Matching, Elastic Bunch Graph Matching, and Landmark Distribution model are few prominent methods for identification of facial landmarks and emotion recognition [16, 109]. Further, facial landmark data have been analysed as time-series data. Landmark coordinates, along with displacement, acceleration, and velocity have been used as features for depression analysis. Displacement of each landmark from the mid horizontal axis signifies the motion relevant features. Action Units (AU) represent the coordinated activity of facial muscles while communicating actions and expressing emotions. AUs can be used to measure the variability of facial expressions. These AUs are fundamental to EMFACS (Emotional Facial Action Coding System) [110]. This research tries to follow an opposite approach in the identification of depressive disorder. We can anticipate behavioural attributes of depression patients concerning a video stimulus, instead of directly asking a dedicated set of questions from them. Recorded emotions, their intensities, and frequencies for depressive patients concerning ADFES stimulus were subsequently used to train yet another computer model. An algorithmic description of the proposed methodology is shown in Fig. 3. Remaining thirty-four potential depression patients, identified during the survey of three hundred and eighty-seven volunteers and were not called for personal evaluation by practicing psychologists, were called for examination using the CNN trained computer model. These thirty-four individuals were made to watch ADFES stimulus, while their facial expressions were simultaneously recorded. The volunteers need not to press any button to provide feedback, rather. Recorded videos of these volunteers were further analysed in terms of emotive responses recorded against the given stimulus. At times, when there is limited availability of expert psychologists, diagnosis of depressive disorder could become a major concern. The contemporary world is not only competitive but is also demanding. Individuals often struggle to share their mindset and opinions, hence become depressive. This situation warrants the need for a reliable, efficient, affordable, and believable system for early detection of depression symptoms. An algorithmic description of the proposed methodology is given below in Algorithm 1

figure a
Fig. 2
figure 2

Facial features identified using Dlib machine learning library

Fig. 3
figure 3

Recorded emotion intensities of a respondent suffering from depressive disorder

4 Results

This experiment was undertaken in two phases; the first phase of the experiment deals with showing ADFES stimulus to depressive and non-depressive individuals and recording of their emotive facial response. The recorded facial videos of confirmed patients of depressive disorder and non-depressive individuals, along with the results of ERT were used to train an ERT_CNN model. Facial emotions were classified using Dlib-ML library as given below:

4.1 Facial emotion recognition

One subject was made to perform emotion recognition task for 6 s 25fps on ADFES dataset. Total frames per person: 25∗6 = 150. Each frame was analysed for 6 emotions & their intensity.

  • In fi-ith-frame in the sequence

  • In Hafi-Happy emotion intensity of ith frame

  • In Safi-Sad emotion intensity of ith frame

  • Sufi-Surprise emotion intensity of ith frame

  • Anfi-Angry emotion intensity of ith frame

  • Difi-Disgust emotion intensity of ith frame

  • Nefi-Neutral emotion intensity of ith frame

  • Intensity formula = Probability of prediction of CNN

Recorded emotions of respondents as identified by DlibML library are shown in Figs. 3 and 4. Training and validation phase accuracy are loss are depicted in Figs. 5 and 6.

Fig. 4
figure 4

Recorded emotion intensities of a non-depressive respondent Origin software

Fig. 5
figure 5

CNN accuracy for training and validation

Fig. 6
figure 6

CNN loss for training and validation

Emotions and their measured intensities from recorded videos of patients suffering from a depressive disorder, and non-depressed respondents are shown in Tables 1 and 2, respectively. Emotion intensity of Overall intensity per emotion (X) is calculated as Emotion (E) Intensity of emotion X (In-X) divided by 150, i.e., (X) = E(In-X)/150. Figures 7 and 8 corresponds to the graphical representation of recorded emotional intensities.

Fig. 7
figure 7

Graphical representation of observed emotions through facial analysis of depressive respondents

Fig. 8
figure 8

Graphical representation of observed emotions through facial analysis of non-depressive respondents

Table 1 The recorded emotional intensity of eighteen respondents suffering from a depressive disorder
Table 2 The recorded emotional intensity of twenty non-depressive respondents

Owing to the required brevity of the manuscript, the emotive responses, as expressed by depressive patients and non-depressive individuals against ADFES stimulus are shown in the following tables. As stated previously, ADFES stimulus is categorized into five sets of annotated emotive intensities ranging from 0% to full emotion to 100% with an increment of 20% emotional intensity in every set. Tables 3456 and 7 corresponds to the recorded response of proven patients of depressive disorder. Tables 891011 and 12 corresponds to the recorded response of non-depressive individuals against ADFES stimulus.

Table 3 Emotions recorded for confirmed depressive patients at zero percent intensity of emotive stimulus using ADFES dataset
Table 4 Emotions recorded for confirmed depressive patients at 25 percent intensity of emotive stimulus using ADFES dataset
Table 5 Emotions recorded for confirmed depressive patients at fifty percent intensity of emotive stimulus using ADFES dataset
Table 6 Emotions recorded for confirmed depressive patients at seventy-five percent intensity of emotive stimulus using ADFES dataset
Table 7 Emotions recorded for confirmed depressive patients at hundred percent intensity of emotive stimulus using ADFES dataset
Table 8 Emotions recorded for confirmed non-depressive individuals at zero percent intensity of emotive stimulus using ADFES dataset
Table 9 Emotions recorded for confirmed non-depressive individuals at 25 percent intensity of emotive stimulus using ADFES dataset
Table 10 Emotions recorded for confirmed non-depressive individuals at fifty percent intensity of emotive stimulus using ADFES dataset
Table 11 Emotions recorded for confirmed non-depressive individuals at seventy-five percent intensity of emotive stimulus using ADFES dataset
Table 12 Emotions recorded for confirmed non-depressive individuals at hundred percent intensity of emotive stimulus using ADFES dataset

Training (18 subjects –> proven depressed subjects)

Training (20 subjects proven non-depressed individuals)

An Emo_CNN classifier was trained using data collected about Tables 312. Based on the recorded responses, this classifier is programmed to identify any respondent as depressive or non-depressive. Subsequently, the intensity and frequency of recorded emotions were used to calculate the emotive score. Amongst seventy-two potential depressive patients identified during the pilot survey as per ADHD questionnaire, only thirty-eight were initially called for examination by qualified professionals. Remaining thirty-four individuals were subsequently called and made to repeat the process of recording their response against ADFES dataset and their facial expressions were recorded too in parallel. Their recorded responses at various intensities of emotive stimulus from ADFES dataset are shown in Tables 13141516 and 17.

Table 13 Emotions recorded for non-confirmed depressive patients at zero percent intensity of emotive stimulus by ADFES dataset
Table 14 Emotions recorded for non-confirmed depressive patients at 25 percent intensity of emotive stimulus by ADFES dataset
Table 15 Emotions recorded for non-confirmed depressive patients at fifty percent intensity of emotive stimulus by ADFES dataset
Table 16 Emotions recorded for non-confirmed depressive patients at seventy-five percent intensity of emotive stimulus by ADFES dataset
Table 17 Emotions recorded for non-confirmed depressive patients at hundred percent intensity of emotive stimulus by ADFES dataset

Testing (34 subjects)

Data collected from thirty-four individuals/ respondents were fed into ERT_CNN classifier, so as to classify respondents as potentially depressed or non-depressed. This data was used for testing the accuracy of the proposed method. To ascertain the accuracy and precision of ERT_CNN, entire of the respondents were subsequently examined by practicing psychologists, as shown in Table 19. It was noticed that, amongst fifteen respondents, which were classified as depressed, ten were actually found suffering from a depressive disorder. Twelve of the remaining respondents were correctly identified as non-depressed, whereas seven were wrongly classified as depressed. Table 18 corresponds to the observed emotional intensities of unconfirmed patients of depressive disorder. Graphical representation of observed intensities is shown in Fig. 9. The proposed method attained a precision of 64 in the identification of patients suffering from a depressive disorder as given in the confusion matrix shown in Table 19. Though the observed results are not outstanding, yet they are promising. Being at the experimental stage, the precision attained by the proposed method seems satisfactory. Imbuing facial emotions with time-proven techniques like ADHD questionnaire for diagnosing depressive disorders is not the only novel but also has ample scope of improvement too.

Fig. 9
figure 9

Graphical representation of observed emotions through facial analysis of potentially depressive respondents

Table 18 The recorded emotional intensity of thirty-four potential patients of depressive disorder recorded during the testing phase of the experiment
Table 19 Count of depressive and non-depressive respondents’ post-examination by expert psychologists Confusion matrix

Confusion matrix of ERT_CNN

Total test subjects: 34

Depressed: 15

Non depressed: 19

Precision = (12/15 + 14/19)/2 = (0.8 + 0.68)/2 = 0.74

5 Conclusion

his study concludes to the fact that facial expressions rendered by a patient suffering from the depressive disorder are different from that of a normal person against any given psychological stimulus. Further, it was also concluded that facial expressions rendered by a respondent against any annotated quality stimulus like ADFES dataset could provide results comparable to that of ADHD questionnaire. Facial analysis of respondents promulgated the fact that sad and neutral were the most prominent emotions displayed by depressive patents. On the contrary, non-depressive respondents enjoyed while responding to ADFES stimulus as their facial analysis identified happy as the most prominent emotion, displayed by any respondent, followed by neutral and surprised. It was also noticed that at depressive patients, wrongly identified lower intensity emotional stimulus from ADFES dataset. Their responses became somewhat better with high-intensity emotional stimulus from ADFES dataset, but the overall emotional intensity displayed by any depressive patient was far lower than any non-depressive respondent. This experiment is oriented towards diagnosing the depressive disorder based upon emotive display of respondents; at this stage, it cannot differentiate between depression and anxiety. Further, emotions displayed through facial movements only may not suffice to augment the precision of the proposed methodology. The future scope of this study lies with the deployment of other technologies like Electroencephalography, Galvanic skin response and pulse sensors, etc. along with an understanding of emotions through facial analysis to diagnose a person with depressive disorder. The proposed method could be used for self-diagnosis of depressive disorder within masses, or it could be of great help for a practicing psychologist to better understand the emotive state of any prospective patient seeking help.