1 Introduction

The World Health Organisation (WHO), in 2017, reported that more than 300 million people worldwide – \(\approx \)4.4% of the global population – were suffering from depression [34, 37]. Similarly, according to the Australian Institute of Health and Welfare, $9.1 billion was spent on mental health-related services in 2016-17, and 2.5 million people (\(\approx \)10% of the Australian population) received Medicare-subsidised mental health-specific services in 2017–18 [1]. These statistics highlight the severity and the widespread nature of mental health issues, and with the growing awareness of the problem, there has been a significant increase in research and funding for the detection and prediction of mental health issues.

Leightley et al. [21], while focusing on the identification of post-traumatic stress disorder (PTSD) in a United Kingdom military cohort, also assessed the impact of mental health on the day-to-day duties of serving and ex-serving soldiers, specifically on their retention and productivity. Similarly, Walsh et al. [36] outlined the significance of psychological distress in adolescents, with suicide being the second leading cause of death in adolescents. For each suicide in the United States, there are 100–200 non-fatal attempts [36]. Mental health and psychological distress is a global issue, which costs our health care systems billions of dollars each year, and it is clearly non-discriminatory.

The application of machine learning (ML) approaches towards mental health and psychological distress problems is an ongoing research endeavour. Several studies have successfully built prediction models for psychological distress using ML techniques such as the Support Vector Machine (SVM), Artificial Neural Network (ANN), Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbour (KNN), Decision Tree (DT) and Random Forest (RF) [13, 23, 28, 29, 34, 36]. Despite the growing body of research on improving mental health and psychological distress diagnosis with ML, the reoccurring theme, however, is the use of historical records or user reported surveys to train the ML models. Although different ML classification techniques can be used to accurately predict psychological distress, a vast majority of them rely on people self-presenting for assessment, or self-identifying their condition before the key features are available for analysis [20, 34]. There is, thus, a void with regards to generalised prediction from ecological factors alone [16]. We propose that the use of ecological factors would provide a proactive approach to generalised prediction. The few studies that have been conducted using ecological factors (e.g., see [25]) are based only on formulated questionnaire responses rather than scrutinised psychological assessment and screening tools. Therefore, this study aims to bridge this gap in the literature and supplement existing modelling research by providing a strategy to predict psychological distress based on ecological factors.

More specifically, the primary objective here is to bridge the gap between real-time ecological factors and existing psychological distress research. For this to be successful, the ML model should accurately and reliably categorise a specific person’s psychological distress based solely on their ecological factors. All measurements, such as the accuracy, precision, recall, F-measure (F1), and area under the curve (AUC), should be comparable to or outperform similar ML techniques in the referenced literature. It should be noted that, in the context of this study, recall must give high scores. Failure to predict positive cases of psychological distress would be a major disadvantage. If successful, the developed model could be complemented by other ML classification techniques, or used independently in real-time software to predict and report psychological distress, providing a proactive rather than reactive approach to mental health. Ultimately, a proactive approach could then be used to offer alternate content, or even to alert a third party to provide more intense intervention methods, before the person reaches the state of potential self-harm or suicide.

Trotzek et al. [34] conducted an exhaustive literature review to identify the ecological risk factors for PTSD. They used this information to develop a questionnaire, which was then used to generate the dataset for their study. Once ecological risk factors and psychometric properties of a questionnaire are established, further research is generally required to validate the questionnaire and verify its performance against existing questionnaire screening tools. Considering the limited time frame of our study, rather than devising a new screening tool, an ML model was developed to predict screening results based on existing and real-time ecological risk factors.

The K10 and K6 are two screening scales commonly utilised for assessing psychological distress [20], and both contain the psychometric properties required to quickly and efficiently categorise a person’s psychological distress. Although other screening tools exist in the literature, such as the General Health Questionnaire (GHQ-12), research has shown that the K10 and K6 surveys perform better and are more informative in recognising or ruling out target disorders [9, 15]. Considering that the K6 survey is a reduced version of the K10 survey (using six of the ten questions), this study uses only the K10 screening scale. The aim here is to propose an ML-based model to efficiently predict a K10 score, or psychological distress classification based on ecological factors. This base model can potentially be extended to incorporate other ML aspects, such as facial recognition [28] and text analysis [34], to further enhance its efficiency and effectiveness as a real-time, proactive prediction tool. It can also be easily included as part of a real-time tool or mobile application for predicting psychological distress.

The rest of this paper is organised as follows. In Sect. 2, we first discuss the relevant literature and how existing research has contributed to the psychological prediction space. The research methodology used in this study is then described in Sect. 3, and the results of the study are presented in Sect. 4. Finally, we draw our conclusion in Sect. 5, along with proposed future work.

2 Related Work

Mor et al. [25] conducted a study in 2018 to evaluate an ML approach for identifying individuals at risk of PTSD using ecological risk factors. Initially, they generated a list of ecological risk factors, which resulted in a 37-question survey. The questionnaire was distributed to 1,290 residents of southern Israel who had been exposed to terror attacks. An ML model was then trained – using 10-fold cross-validation – on the provided ecological risk factors with a value indicating whether or not the study participants had previously reported a PTSD diagnosis. Their model yielded the best results of AUC = 0.91 and F1 score = 0.83. This study was one of the few that included ecological factors. Even though the study does use ecological factors for assessment, it must be noted that these factors were assessed in the context of the study itself, and then used to assess a population of the same specific demographic. Although good results were achieved, the model could have been validated in a general manner by utilising commonly scrutinised psychological assessment tools and applying over a more generalised demographic.

A similar study in 2019 screened a total of 470 seafarers for anxiety and depression using ML [33]. This study also used a range of ecological factors such as age, educational qualifications, marital status and income as feature inputs to target a known Hamilton Anxiety and Depression rating. Results of 5 classification techniques produced high accuracy (>0.75) and AUC scores (>0.8, except for the SVM with 0.759) [33]. With accurate predictions, this study successfully predicted anxiety and/or depression from ecological factors. Results could have been further validated by including a control set of people from the general population, outside of the same occupation and demographic to that of the seafarers.

Kessler et al. [20] tested ML algorithms to predict the persistence and severity of major depressive disorder. This study consisted of an initial survey of 5,877 participants, and then a re-survey of 5,001 of those participants 10–12 years later. The study used ensemble regression trees and 10-fold cross-validated penalised regression to generate a model, which was then compared against the self-reported results 10–12 years after the baseline. The study resulted in 34.6–38.1% of respondents with high persistence and 40.8–55.8% with severity indicators being in the top 20% of the baseline ML predicted distribution. Interestingly, the ML model also showed that 20% of respondents with lowest predicted risk account for only 0.9% of all hospitalisations, resulting in a prediction model useful for both high risk prediction and ruling out low risks. This study successfully outlined the benefits of using ML algorithms in psychological prediction. Re-assessing after 10–12 years allowed the ML model to be validated against real-world data, instead of just data subsets. The disadvantage of this approach, however, is the requirement of self-assessment for reporting; because, with self-assessment, there is no way of validating whether a respondent is reporting based on prior diagnosis, reporting false diagnosis or failing to report positive diagnosis.

In 2020, Priya et al. [29] applied ML algorithms to predict anxiety, depression and stress in modern life. They focused on these mental health factors by collecting results of the Depression, Anxiety and Stress Scale questionnaire (DASS 21) of 348 participants of varying age, gender and demographic. The questionnaire results were then classified using the DT, RF, NB, SVM and KNN models, with results ultimately measured by F1. The dataset was divided 70:30 into training and testing subsets. NB classification resulted in the best overall accuracy, with anxiety, depression and stress ranging between 0.73-0.85. RF classification, however, produced the best F1 scores (0.47–0.76). Similar to the work of Kessler et al. [20], all classification models also produced good results for negative cases. However, this study focused specifically on the self-reported DASS 21 questionnaire – although accurate models can be trained, comparing results against generalised ecologically inspired models is difficult in practice.

Trotzek et al. [34] addressed the early detection of depression using ML models based on messages published on the social platform, Reddit. The study compiled a range of 10 to 2,000 messages collected from a total of 135 depressed users and a random control group of 752 users [34]. The 135 depressed users were identified as depressed by posting language such as “I was diagnosed with depression”. As with other studies based on self-reporting or self-diagnosis, such identification puts the validity of these messages into question. Without any context of the message, or some form of sentiment analysis (e.g., see [7]), it is possible that people in the depressed category were posting negligently, or people in the control group who may actually be depressed simply did not use depressive language in their comments.

A 2018 study by Walsh et al. [36] aimed to use ML to predict suicide attempts in adolescents. This retrospective study used data from 974 adolescents with nonfatal suicide attempts, 496 adolescents with other self-injury, 7,059 adolescents with depressive symptoms, and 25,081 adolescent general hospital controls [36]. Using a range of ML classification techniques, some accurate predictive models were found. Although ecological factors were not prioritised, this study still outlined the significance of medical history in prediction analysis, and suggested that a generalised model should utilise a holistic approach.

Studies have also been conducted in psychological distress prediction by analysing MRI images [23], relating whole-brain activity patterns to facial expressions [28], and further analysis of text-based comments on social platforms [13]. All these contribute to the research space; however, they fail to fill the void in research around proactive prediction without relying on historical data. Understandably, supervised ML requires historical data to train models, so it will always play a role in this field of research. The gap in the literature, and one that this study aims to address, is to use these known classification techniques to create a generalised prediction model based on ecological factors entirely.

Numerous studies on psychological distress prediction have also been done outside the ML domain. For example, Brooks et al. [3] conducted a study on self-reported psychological distress following a concussion incident among children and adolescents. Participants were assessed 4 and 12 weeks post-concussion using multiple psychological categorisation scales, and logistic regressions were used for prediction. Loula and Monteiro adopted a game theory-based model for predicting depression due to frustration in competitive environments [22]. This study introduced a game, relating investment in formal education to professional success, and proposed that an individual becomes depressed when the difference in their earnings and those of their neighbours in the game is above a threshold. Despite the research outside the ML domain, we have chosen ML in this study because of the motivating examples and existing work in the ML field, as well as the potential for future work in using a trained model in real time applications.

3 Methods

This work was initially broken into three phases: (1) Dataset and Targets, (2) Model Creation, and (3) Classification Analysis.

3.1 Dataset and Targets

In order to test the performance of the model proposed in this study, we used a public dataset by Every-Palmer et al. [12], which consists of 2 numeric and 15 categorical features, as shown in Table 1. This dataset was chosen because it contains quantifiable ecological factors as well as a K10 score, and is therefore highly suited for the purpose of our study.

Table 1. Normalised dataset dictionary

A number of metrics from the dataset were dropped from our study, such as internal identifiers and duplicate groupings – a numeric age metric was used instead of the categorical age range grouping. Given that we used only the K10 score in this study, the remaining psychological distress scales were also dropped, including the WHO-5 and GAD-7 scores. Specific COVID-19 metrics, such as infection and test results, were also dropped, since we wanted to generalise our study outside of the COVID-19 context. Even though the proposed model would have trained successfully with the data, the aim of this work is to create a generalised psychological prediction model. Scaling was used on the age and alcohol consumption numeric metrics. One-hot encoding was used to normalise the remaining categorical metrics, which mostly consist of 3 or 4 pre-determined string formatted answers. Before normalisation, any missing data in a categorical column was replaced with the string “no value” to prevent exception. Following this, any rows with missing data were dropped entirely.

3.2 Model Creation

The ‘r8.6_k10_fct2’ variable in the dataset is a two-level variable based on the K10 score, where the range 0–11 represents none/low/moderate and the range 12–40 represents high/very high [12]. Based on these K10 levels, we created binary targets: Low Distress (0–11) and High Distress (12–40). In this study, five single ML classifiers – the LR, SVM, ANN, NB and DT – and three ensembles – the RF, Adaptive Boosting (AD) and Gradient Boosting (GB) – were used for modelling and prediction analysis. The LR, SVM, ANN, NB, DT and RF were selected based on their demonstrated success in related studies [6, 13, 25, 29, 34, 35], whereas AB and GB were selected because they performed well in our previous work [4].

The LR classifier [24] is a generalised linear model [17, 26]. Generalised linear models overcome limitations of linear models – including the use of dependent variables that are continuous and normally distributed, which are not always desirable – by using non-normal dependent variables [10, 11]. In LR, the dependent variables can either be unordered or ordered polytomous, while the independent predictor variables can either be interval/ratio or dummy variables [24].

The SVM is a supervised learning model that learns from training data and performs classification on new data. It separates different classes by a hyperplane, and then maximises the separation distance as much as possible. Larger the margin, lower the error generated by the classifier [5].

The ANN is a feedforward neural network that uses supervised learning. This algorithm continually computes and updates all the weights in its network to minimise error. It consists of two phases: a feedforward phase where the training data is forwarded to the output layer; and the second phase, where the difference between this output and the desired target (the error) is backpropagated to update the weights of the network [32].

The DT classifier is based on Hunt’s algorithm [18], and was developed by Quinlan [30]. It builds a tree-like decision model for classification and prediction, and is a useful explanatory tool for expressing the cause and effect chain [31]. It is typically used as a base classifier for ensemble models (e.g., RF, and AB).

NB is the simplest form of Bayesian network classifiers given the independence of each feature. Nevertheless, many applications have successfully implemented NB, and it is included among the top 10 data mining algorithms [19].

The RF is an ensemble of DT predictors where each tree is independently trained using a random vector. Error generalisation of RF depends on the strength of each individual tree and the correlation between them. This ensemble model is relatively robust to outliers and noise [2].

The AB ensemble algorithm iteratively combines multiple weak classifiers over several rounds. It starts with equal weights for all training data. When the training data points are misclassified, the weights of these data points are boosted, then a new classifier is created using the new unequal weights. This process is repeated for a set of classifiers [38].

GB is an ensemble of gradient boosted regression trees for the classification of dirty data, which produces a robust, competitive and interpretable algorithm for classification and regression. However, it uses only a single regression tree for binary classification [14].

As the dataset used in this study is relatively small (n = 2,010, and 1,985 after normalisation), the 10-fold cross-validation technique was applied. 10-fold cross-validation requires the dataset to be randomly partitioned into 10 equal subsets. 10 model building and test runs were then completed, each time utilising a different arrangement of 9 subsets for training and 1 subset for testing [25]. The entire experiment workflow is described in Fig. 1. All the code for the experiments was written and run on Google Colaboratory using Scikit Learn [27].

Fig. 1.
figure 1

An overview of the experiment workflow

Five metrics, namely the accuracy, precision, recall, AUC, and F1 score (weighted average of precision and recall), were used for analysing the results of the proposed model, as well as for comparing them with the results obtained using other models from the literature. Accuracy was calculated by taking the number of correct predictions on the test set. Precision was calculated using Eq. 1, where tp is the number of true positives, and fp is the number of false positives [27]. Recall was calculated using Eq. 2, where fn is the number of false negatives [27]. Equation 3 is then used with precision and recall values to give the F1 score.

$$\begin{aligned} tp / (tp + fp) \end{aligned}$$
(1)
$$\begin{aligned} tp / (tp + fn) \end{aligned}$$
(2)
$$\begin{aligned} 2 * (precision * recall) / (precision + recall) \end{aligned}$$
(3)

3.3 Classification Analysis

Model validation is critical in ML training in order to prevent the over-fitting of a specific trained model. Hence, during the final phase of this study, the results of each classification model were compared to each other. As in the model creation phase, precision, recall, AUC and F1 scores were used as primary metrics to measure the performance of each classifier.

4 Experiments and Results

All experiments were run using the 10-fold validation technique and averages were taken over 10 runs. Hyperparameters remained constant based on their default implementations.

Table 2 shows the average accuracy, precision, recall, F1 and AUC scores of each classifier. The results indicate that the LR model provided the best results. Its AUC score of 0.730 indicates that it made more correct than incorrect predictions. Similarly, with a recall score of 0.918, the LR model accurately predicted those positive cases. Therefore, despite the potential for improvement, we can conclude that our model can accurately predict psychological distress using ecological factors alone. Among other single classifiers, the ANN also performed well (accuracy of 0.807, better precision but a lower recall value than LR). The NB classifier had the best precision but lower accuracy – meaning that it made more similar mistakes than other classifiers. The DT and SVM, while still providing accuracies above 70%, seem less suitable for psychological distress prediction than the other models.

The ensemble models tested in our study (AB, GB, and RF) also generally performed well, with the RF providing slightly worse results than the others. Given that the GB ensemble uses a regression tree as its base classifier, the good results are both expected and obvious. As discussed above, LR, which is based on regression analysis, was the best classifier; however, the DT, which is the base classifier of AB and RF, performed only moderately. In other words, the ensemble models performed well despite their base classifiers performing only moderately. This means that better results can be achieved if we boost the weak classifiers continuously (as in AB) or bind some weak single classifiers (e.g., DT) in the RF.

Table 2. Results obtained with different classification methods

As the input dataset was mapped 1:1 to the feature layout of the model, it is possible that further manipulation could enhance the performance of the model. Such manipulation may involve adjusting weights based on bias, removing non-dominant features or implementing feature crosses that would provide a better depiction of the data in the given context. Additionally, the performance of the model could also be enhanced through tuning of the hyperparameters.

Our experiments utilised the MLPClassifier class (i.e., multilayer perceptron) of the Scikit Learn library to implement an ANN [27]. Related studies have shown neural networks can successfully predict in areas of psychological distress [34]. Therefore, results may be further improved by re-implementing an ANN model, or implementing additional neural network models using specialised neural network frameworks such as TensorFlow Keras [8].

Further work on the actual ecological metrics included in the dataset would also be necessary to optimise the models. Removing metrics with little impact on K-10 scores, and adding further metrics with known positive impacts on K-10 scores would likely improve the model’s scores. It is also important to note that the data sample used in this study was generated in the context of the COVID-19 pandemic, for the purpose of a study in that context. Therefore, we believe that a different data sample within the context of a more holistic, generalised view may also be beneficial.

5 Conclusion

In this paper, we proposed an ML-based model for psychological distress prediction using only ecological factors. Implementing eight classifiers using Scikit Learn [27], our LR classifier produced the best results, presenting an AUC of 0.73. Although below the 0.8 target, its accuracy of 0.811, precision of 0.835 and recall of 0.918 suggested that the model can accurately predict positive cases of psychological distress. Our results indicated that, although it is possible to create an ML model to predict psychological distress, the challenge lies in finding suitable ML model parameters and ecological features. Future work in this area would be to further analyse and tweak parameters to enhance the current models. Accuracy may also be improved by implementing alternative ecological factors as metrics in order to provide a greater holistic view.

Once an accurate model has been built, it can be used to bridge the gap in existing research in the literature, and also incorporated into real world software or mobile applications. This could include the integration with brain activity data [28], text sequence classification [34], or possibly with wearable devices to provide sleep, activity and heart rate information. With an enhanced model using metrics from multiple areas, some in real time, it will then be possible to provide the proactive approach required to effectively deal with this mental health crisis.