Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

According to an AHRQ report [60], mental health disorder is one of the five most costly medical conditions in the U.S., with expenditures almost doubling from 1996 to 2006. The number of people with mental health disorders to a similar degree. This has led to mental illness becoming one of most societies’ largest public health burdens. The major challenges in mental health care can be grouped as follows: (1) availability of health care for remote and underserved population; and (2) making health care more cost-effective and adequate. Developed mental health informatics technologies include traditional information management systems, such as electronic repository of patient information, electronic clinical summaries, and patients’ care plans. These technologies have become critical as network of varying organizations (day hospitals, mental health centres, and small residential units) are involved in the management of mental health patients including chronically disabled patients [30]. One of the main challenges for mental health information management systems is the secure transfer of patient information.

Most notable support in the advancement of mental health informatics has come from the Internet, in particular Web 2.0. The massive production of social media enabled by Web 2.0 drives a wealth of clinical knowledge, thereby providing an efficient platform for patients to support each other. According to a 2008 survey, most Americans (up to 80 %) rely on the Internet to find health information in order to make their health care decisions [22]. This trend rivalled physicians as of 2008 [56]. An increasing number of patients (and families of patients) are also relying on the Internet for emotional support and to find clinical knowledge for self-care.

Slow but steady progress has been made on automated/assisted assessment technologies of mental health ranging from image based automated assessment of stress and anxiety [13], acoustic speech analysis of depressed patients [40, 44], and text analysis for the assessment of autism [59]. We do not expect to see fully automated mental health assessment and treatment systems in the near future, but many clinical trials indicate their potential to be integrated into relatively well established telepsychiatry (e.g., [68]).

Except for health social networks (e.g., Health 2.0), which are mainly driven by communities, more extensive and structured clinical trials for quantitative analysis are required to bring about wider acceptance, as most studies still report anecdotal qualitative analysis [68] or studies based on small numbers of subjects.

In this paper, we review current approaches to mental health informatics focusing on automated assessment methods, as we believe this is the current bottleneck for providing more objective and efficient mental health care.

1.2 Telemental Health

Telemedicine is the use of communication technologies, such as telephone, video conferencing, or the Internet, to provide and support health care to remote regions. Telemedicine was initially developed to provide health care to rural and underserved populations [4], but distance is no longer the major factor that defines the term. In fact, its applications have been extended to all types of health care including psychiatry consultations in inner cities [34, 35, 68]. While the greatest benefits come from its use in rural and underserved populations, as it makes otherwise unavailable health care available to many patients, its driving force is now more on making health care more cost-effective, affordable, accessible, efficient, and convenient for both health care providers and consumers. For instance, Harrison et al. [33] evaluated teleconferences between doctors for improving communication between primary and secondary health care, [68] reported case studies of telepsychiatry via videoconferencing and their potential to improve patient care and satisfaction, and reduce emergency department overcrowding. Frantzidis et al. [28] developed a remote monitoring system for the elderly and chronically ill. Tang et al. [63] used a multimedia system to improve medication adherence in elderly care. The growing number of aged population indicates that this form of telemedicine will become an important research area for many developed countries.

Almost all technologies we review here could be used in conjunction with tele-mental health technology. For instance, it will be possible to use automated speech analysis methods (e.g., [24, 40, 59, 65]) while the patients are being interviewed by psychiatrists to provide objective analysis of speech and language disorder or to monitor improvements in mental health conditions.

Much needed research is now being devoted to the development of more sophisticated tele-mental health technologies that integrate various technologies. This chapter reviews some of these technologies, with the aid of quantitative evaluation, rather than anecdotal qualitative evaluations [68]. Chapters 2, 3, and 4 of this book examine the use of modern communication technologies for tele-mental health and contemporary issues in mental health, such as how technology is changing the way health care services are delivered and how the relationship between patients and health care providers is changing.

1.3 Automated Assessment Systems

Many mental health assessment methods are time consuming and highly subjective. To improve efficiency as well as to provide more objective assessments, various automated mental health assessment methods have been developed. The methods can be broadly classified based on what type of data are collected and analyzed.

1.3.1 Image and Behaviour Analysis

Cowie et al. [13] reviewed the use of image analysis techniques in detecting emotional cues from facial expressions (still face images) and gestures (movements of facial features). According to [13], two emotion detection paradigms are one detecting (seven) archetypal expressions and another one detecting nonarchetypal expressions (e.g., modulated, falsified, or mixed expressions). The majority of existing literature and available data clearly focus on the method of detecting archetypal expressions (e.g., [48, 51, 67]). Much work needs to be done on the detection of richer nonarchetypal emotional expressions and gestures. These approaches now focus on the detection of action units (AU) defined by Ekman et al. [21]. For instance, Valstar and Pantic [66] used a facial point detector based on Gabor-feature-based boosted classifiers, achieving AU recognition rate of 95.3 %. On the application of facial expression detection method on the diagnosis of mental health problems, Stone and Wei [61] used a facial expression detector to measure cognitive load, which is conventionally assessed using EEG or EMG measurements. However, there is little research on the use of facial emotional detection methods for clinical diagnosis of mental health problems. This may be due to the fact that image processing techniques are more difficult than other techniques, such as sound processing, and large variations on emotional expressions make it difficult to develop reliable measurements that are correlated with underlying psychological problems.

Instead of relying on face detection methods, Nambu et al. [45] developed a simple image processing technique for monitoring behaviours and diagnosing poor health in the elderly. The researchers developed an automatic diagnosis system that can assess physical conditions as well as mental conditions from behaviours of elderly watching TV. They used only the recordings of the start time of watching TV obtained from a running monitor of the television. Initially, they tried to utilize various other sensors, such as a running monitor of electric appliances or door switches, but found the resulting data to be difficult to objectively analyze them, as their data appear to be not correlated to each other. The start times of watching TV are recorded by dividing each day into 15 min intervals. An interval is given a certain value, say 1, when an elderly participant starts the TV in the interval and another value, say 0, otherwise. This way, a 30 × 96 pixel image could be constructed for visualizing behaviours of elderly watching TV. Their hypothesis was that the subjects watched the television at roughly fixed times if they were healthy. To measure productiveness of their TV watching behaviour, the researchers measured non-randomness of the patterns on the visualized behaviour using the maximum entropy method (MEM). The image compression rate was measured as an indication of the health condition. The lower compression rate (larger size of compressed image) was used as an indication of unhealthy condition.

1.3.2 Biosignals

Biosignals, such as skin conductance, EEG, EMG, and EGA measurements, have been used to monitor various physiological and psychological conditions. Since the introduction of affective computing [53], applications of bio-signals have attracted a lot of research interest, but mainly on detection of emotion for their application to human computer interactions. However, there are a few studies that tried to combine HCI and biosignals in medical applications (e.g., [5, 39, 41]). Here we review some examples.

Tarvainen et al. [64] used derived features from galvanic skin responses (GSR) to detect emotional responses to external stimulus (surprising sounds). GSR responses are produced by changes in the electrical properties of the skin in response to sweat gland functions responding to different kinds of stimuli. In a sense, GSR are similar to skin conductance responses (SCR). Tarvainen et al. [64] derived the features using principal component analysis (PCA), and clustering technique was applied to build a classifier that can distinguishing whether a response pattern belongs to a control or psychotic patient. Their method was tested using measurements from 33 participants: 20 healthy controls and 13 psychotic patients. It was observed that healthy controls showed clearer response patterns to external stimulus. The system achieved the overall correct ratings of 82 %. As GSR measures are much easier to obtain than EEG signals, it has been used on wearable computers. For instance, Lisetti and Nasoz [39] used GSR in conjunction with heart rate and temperature monitors mounted on wearable computers to monitor certain emotions (sadness, anger, fear, surprise, frustration, and amusement) noninvasively.

Electrodermal activity (EDA) is another term for GSR. It also results from sympathetic neuronal activity. Critchley [14] provides a detailed review of the relationship between the central nervous system, periperi, and EDA measures, indicating that a vast amount of information on mental and physiological conditions can be extracted from EDA measurements.

Frantzidis et al. [28] developed a method of detecting early psychological disorders by means of detecting frequent mood changes using EEG and EDA signals. Their research was based on the assumption that affective states of depression, anxiety, and chronic anger negatively influence the human immune system [32]. This is a neurophysiology-oriented approach that monitors the subjects’ biosignals, providing a more objective mental health assessment. In their approach, EEG and EDA signals are recorded and features are computed. ERP, ERS, and ERD features are computed from EEG data, and SCR features are computed from EDA. Attribute selection is applied to EEG features using BestFirst search. The C4.5 data mining algorithm is applied to the selected EEG features to generate valence discrimination function. The subjects’ gender and the valence information extracted from the decision tree are then used to select one of four Mahalanobis distance functions (positive/negative pictures male, pos/neg pictures female) for arousal discrimination. The Mahalanobis distance of the vector of selected features from the centre of the selected Mahalanobis distance functions are then used to estimate the arousal level. The result of the detection is one of four classes: LL (Lower valence-Low arousal), LH (Lower valence-High arousal), HL (High valence-Low arousal), and HH (High valence-High arousal). Their average reported accuracy rate was 77.68 % for the discrimination of the four emotional states, differing both in their arousal and valence dimension.

1.3.3 Language Use in Patients with Schizophrenia

Speech and language disorders (SLD), such as incoherent discourse, occur in various mental disorders, such as mania, depression, and schizophrenia [1, 19, 36]. SLD is often termed thought disorder by schizophrenia researchers. Disorganized speech, such as incoherent discourse, has been considered one of the important aspects in diagnosing schizophrenia [1]. Indeed, incoherent discourse is considered an important symptom in diagnosing many psychiatric and neurological conditions [24]. DSM-IV [20] also uses disorganized speech as one of the diagnostic criteria for schizophrenia. Therefore, various formal methods have been proposed in an attempt to characterize disordered discourse related to mental disorders. The followings are some of the well validated rating scales of thought disorder: the assessment of Thought, Language and Communication (TLC) [3], the Clinical Language Disorders Rating Scale (CLANG) [9], the Communication Disturbances Index [17] and the Thought Disorder Index (TDI) [58].

SLDs occur in various mental disorders and provide critical information for assessing mental disorders. However, these formal methods are subjective, time consuming, and costly. Therefore, various automated SLD assessment methods for diagnosing mental disorders have been developed. Examples include statistical analysis of letter and category fluency [8], Latent Semantic Analysis (LSA) based speech incoherence analysis [24, 25], Ex-Ray [16], and Discriminant-Words [59].

SLD is also considered to be associated with cognitive impairments [15, 24] and as an early vulnerability indicator for schizophrenia [6, 42, 49, 52]. This is because schizophrenia is considered a neurodevelopmental disorder rather than a neurodegenerative disorder that occurs early in development, long before observable symptoms appear [18]. This suggests that deficits in brain function may present early in life [24]. This indicates that SLD analysis can provide early diagnosis of schizophrenia.

We can broadly classify automated SDL analysis tools into three categories:

  1. 1.

    Statistical analysis of letter and category fluency.

  2. 2.

    Latent Semantic Analysis (LSA) Based Analysis of Speech Disorder.

  3. 3.

    Machine Learning Approaches.

Statistical analysis of letter and category fluency is an earlier approach that measures word statistics of spoken or written content. LSA-based analysis approaches go further into word-to-word, word-to-sentence, or sentence-to-sentence semantic relations. Machine learning approaches use derived features from both word statistics and semantic relations to build models of underlying mental health conditions. Recent advances include generation of rules for explaining diagnosis results in terms of the input features.

1.3.3.1 Statistical Analysis of Letter and Category Fluency

The statistical analysis of letter and category fluency is one of the earliest SLD approaches in diagnosing schizophrenic patients. It measures performance on verbal fluency and semantic fluency tasks. These are measured by counting the number of words generated to letter cues or category cues within a certain time period [8]. Bokat and Goldberg [8] reported that patients with schizophrenia underperformed on both category and letter fluency tasks. In addition, they reported that schizophrenic patients demonstrated greater impairments on semantic fluency than letter fluency. However, these approaches may provide very limited information on disordered discourse, as they rely mostly on linguistic features, such as vocabulary size of patients [24]. Some researchers (e.g., [24]) suggested that communicative features, such as tangentiality (e.g., semantic similarity between questions and answers), provide more relevant information as to the underlying conditions [24].

1.3.3.2 Latent Semantic Analysis (LSA) Based Analysis of Speech Disorder

Elvevag et al. [25] used Latent Semantic Analysis (LSA) [38] to measure coherence of speech, and reported its performance in discriminating speech generated from schizophrenia patients and controls. They compared their research participants’ performance and correlation with a standard clinical measure of ThD (the Scale for the Assessment of Thought, Language and Communication; and TLC [3]).

The authors defined “coherence” of speech as the semantic similarity of words or sentences to words or sentences. The researchers measured discourse coherence in schizophrenics to assess abnormalities in the use of language to provide objective and reliable assessment of coherence in discourses of schizophrenia patients. Their method could be able to discriminate schizophrenia patients with both low and high ThD from controls. In particular, they showed that LSA could be used to identify which part of speech production is incoherent and estimate the levels of incoherence.

In their approach, four language tasks were designed: a word coherence test, a verbal fluency test, a two-way dialog coherence test (between utterances), and a similarity test between participants (for identical questions compare semantic similarity of responses between participants). The first two tests were compared with blind human ratings on the global ThD assessment. The third test was compared with the average scores for tangentiality, content, and organizational structure ratings of TLC. The last test was compared with the average scores on tangentiality, coherence, and content ratings of TLC. These tests were designed to test choice of words, expression of meaning, relatedness of discourse, and coherence.

The problem with these instruments lies in the choice of questions and what measures (words, sentences, window size) are best in increasing sensitivity to group differences. LSA was trained on a corpus of written texts which are somewhat different from spoken discourse [38]. However, LAS was able to discriminate the groups in spoken discourse [24, 25, 37].

In their follow up study [24], they showed that a similar method can be used to assess relatives of patients with schizophrenia, as schizophrenia is considered to be heritable. Transcribed free-speech samples were obtained from four groups of participants: schizophrenia patient, schizophrenia patients’ family, normal, and normal family. Although it is not exactly clear how features were represented, three types of features were claimed to be used: surface features, statistical language features, and semantic features. Linear Discriminant Analysis (LDA) was used for feature selection and classification. The overall classification accuracy over not-well vs. well was 77.1 %. The overall classification accuracy of well family participants versus control non-family participants was 90.0 %.

1.3.3.3 Ex-Ray on Schizophrenia

Ex-Ray [16] and Discriminant-Words [59] apply machine learning techniques, such as support vector machines, to a text classification task to diagnose mental health problems using transcribed speech samples from ‘‘structured-narrative tasks”. The task of Ex-Ray is to determine whether the speech belongs to a schizophrenia patient or not. Ex-Ray uses the bag-of-words feature representation where a fixed size vocabulary (e.g., 1,100 words in [16] is used to compute feature values, and each feature value is the frequency of each word in the vocabulary appearing in each speech sample. The feature values are also normalized to the length of documents. Ex-Ray and Discriminant-Words use Support Vector Machines (SVMs) [11] for the classification task. Ex-Ray achieved 80 % accuracy on a schizophrenia-versus-control classification task. Discriminant-Words achieved 75 % accuracy for an autism-versus-control similarity-measure task.

In Tilaka et al. [65], the performance of Ex-Ray was compared with two SLD scales: the Thought, Language and Communication Scale (TLC) [2] and the Clinical Language Disorder Rating Scale (CLANG) [9]. The same bag-of-words feature representation was used. Tilaka et al.’s SVM classifier achieved 98 % accuracy for a schizophrenia versus control task. This was comparable to the SLD scales on the same samples: TLC (98 % accuracy) and CLANG (97 % accuracy).

In a similar study by Felin et al. [26], it was also shown that a language model can be built from only one group of participants (schizophrenia) and used for discriminating them from others groups (e.g., normal). In this study, a one-class SVM classifier was built using 27 transcribed speech samples and tested against a total of 66 speech samples comprising speech samples from 27 patients and 39 controls. The SVM classifier achieved 81 % accuracy, whereas the two-class classifier trained on the entire database achieved 98 % accuracy. These findings suggest that machine learning approaches may be very practical, as there are often times when an appropriate control group is difficult to obtain. This one-class SVMs was compared to TLC and CLANG in diagnosing disorganized speech as well and shown to perform better.

Goh et al. [31] used Ex-Ray technology for the classification of autism assessment reports into positive and negative autism cases, using the term frequency features. The empirical evaluation was done on two sets of autism assessment reports: assessments that were confirmed to be autism or normal. The autism diagnosis reports were obtained from mental health clinics and comprised a total of 236 reports: 217 positive cases and 19 negative cases.. The autism text documents are represented as attribute-value vectors (“bag of words” representations) where each distinct word corresponds to a feature whose value is the frequency of the word in the text sample. The Ex-Ray method has been evaluated against well established validation tools: Gillian Autism Rating Scale (GARS-2) [29], Social Communication Questionnaire (SCQ) [7, 55], and Social Responsiveness Scale (SRS) [10]. In this study SVM-2C achieved ROC areas of 0.938, whereas GARS-2, SCQ, and SRS achieved ROC areas of 0.750, 0.810, and 0.810, respectively.

1.3.4 Acoustic Analysis of Speech

Speech provides nonlinguistic information, such as voice quality and prosody, that provide information on emotional states of speakers, and, therefore, acoustic analysis of the human voice can be used for the detection of various mental health problems affecting emotional states of affected individuals [57]. For instance, Ellgring and Scherer [23] have shown that simple acoustic properties of speech, such as increased speech rate and decreased pause duration, are effective indicators of the level improvements of depression. Automated voice analysis for the diagnosis of mental health dates back to 1930, when a German psychiatrist, Zwirne, developed a fundamental frequency (F0) tracking device to measure voice changes in depressed patients [47]. Since then, automated acoustic feature analysis of speech has regained interest only recently, due to the increasing number of mental health patients [27], the limitations of subjective approaches to diagnosing mental health conditions and the effects of treatments [23, 44], and the amount of resources required for clinical assessments [27].

Earlier research focused on prosodic features: fundamental frequencies (F0) [23, 47, 54], pause duration [23], and speech rates [23, 47]. Later studies utilized more complicated features: power spectral density measures [27], voice quality [12], vocal tracking [44], glottal features [40, 44], and TEO-based features [40]. Almost all approaches combine more than one category of features and report on the best combinations of features for the diagnosis of each mental health conditions. However, almost all approaches include prosodic features as part of their feature sets. This indicates that prosodic features can be effective indicators in the detection of emotion, stress, and depression. Table 1.1 summarizes developments in features and their uses. We now describe previous and current approaches in more detail.

Table 1.1 Summary of acoustic speech analysis methods for mental conditions

Pope et al. [54] investigated the association of acoustic features of speech with anxiety and depression in 10 min monologues. Anxiety was positively related to rate of verbal productivity and speech disturbance, and negatively related to silent pauses. Depression was negatively related to rate of productivity and filled pauses and positively to silent pauses. A positive relationship was found between anxiety and resistiveness in speech and a negative relationship between depression and superficiality.

Nilsonne [46] and Nilsonne et al. [47] used the rate of change of fundamental frequencies (F0) to measure the level of clinical improvements in depression. Based on their study involving 16 depressed patients, they found that the standard deviation of F0 (SDF0), the standard deviation of the rate of change of F0 (SDF0RC), and the average speed of change of F0 (AF0S) were significantly greater after recovery from depression.

Ellgring and Scherer [23], and Low et al. [40] have shown that there are large gender differences in depressive speech behaviour. For instance, Ellgring and Scherer [23] study showed that the minimum fundamental frequency (F0) of speech in female was a good predictor of mood improvement.

France and Shiavi [27] analyzed prosodic features (F0, formant measures, amplitude modulation) and spectral features (power spectral density measurements) in discriminating depressed individuals. To generate features, [27] segmented speech samples into frames and calculated the feature sets (range, variance, mean skewness, kurtosis, and coefficient of variation) for each frame slot. Based on their study involving a sample of participants comprising of 48 female participants (10 control, 17 dysthymic patients, and 21 major depressed patients) and 67 male participants (24 control, 21 major depressed patients, and 22 high-risk suicidal patients), formant (features derived from F1, F2, and F3) and power spectral density (the percentage of total power in four 500 Hz sub-bands) measurements were found to be the best discriminators in both the male and female participants. For male subjects, amplitude modulation (AM) features were found to be a strong class discriminators. Features describing F0 were generally ineffective discriminators in both sexes.

Cowie et al. [13] reviewed relationships between prosodic features (pitch, intensity, speech rate, voice quality) and emotion for use in their human computer interaction framework (the PHYSTA project), which is a hybrid system capable of using information from faces and voices to recognize people’s emotions. Their review consolidated relevant knowledge from both psychology and linguistics.

Moore Ii et al. [44] provided detailed statistical analysis on the use of a combination of glottal, prosodic, and vocal track features for clinical analysis of emotional disorders, in particular for discriminating depressed speech. Base on their study involving 15 males (9 controls, 6 patients) and 18 females (9 controls, 9 patients), Moore Ii et al. [44] reported that the combination of glottal and prosodic features produced better discrimination overall than the combination of prosodic and vocal tract features.

Low et al. [40] used acoustic correlates of depression in adolescents (aged 13–20 years) for early detection of depression. It was argued that emotional disturbances caused by depression affect acoustic properties of people with depression. They applied various time-series signal processing techniques, such as TEO-based features, Mel-frequency cepstral coefficients (MFCCs), and prosodic, spectral, and glottal features to extract acoustic properties of people with depression and to distinguish the speech of nondepressed from the speech of clinically depressed adolescents. They achieved classification accuracies of 87 % for males and 79 % for females using only TEO-based features, suggesting that TEO-based features most closely correlate with depression in adolescents among other combinations of those features. The combinations of glottal features with prosodic and spectral features was also found to be closely correlated with depression, achieving accuracies of 69 % for males and 75 % for females.

1.3.5 Knowledge Based Approaches

Panagiotakopoulos et al. [50] developed an integrated medical record management system that provides a series of applications and personalized services for patients with anxiety disorder by managing patient data collection and automatically assessing stress level based on user modelling to assist personalized treatments. Their experiment data were collected using a stress monitoring test comprising survey questions (e.g., “where were you at this time?”, “what were you doing at this time?”). Panagiotakopoulos et al. [50] used Bayesian modelling to build association rules for the prediction of stress levels of participants. The classifier were built and evaluated on 10 anxiety patients over 30 days. The classifier achieved Receiver Operating Curve (ROC) areas (AUC) of 0.841.

1.4 Online Support and Information Management

Other relevant technologies for mental health informatics are newly-emerging online support and care technologies, such as online health systems [43], health social networks (e.g., PatientsLikeMePatientsLikeMeFootnote 1 and the IBM Patient Empowerment SystemFootnote 2), and online forums.

A health social network is an online information service which facilitates information sharing between closely related members of a community. Also known as social media on the Internet, or Health 2.0, a health social network empowers patients and health service providers by promoting collaboration between patients, their caregivers, and clinicians [56]. At its basic level, a health social network provides emotional support by allowing patients to find others in similar health situations. They can also share information about conditions, symptoms, and treatments [62]. Other services include physician Q&A and self-tracking of conditions, symptoms, treatments, and other biological information [62]. The self-supporting community is particularly important for future sustainability in the case of lifelong conditions, such as autism.

Other forms of online support systems are being integrated into general support systems, such as web sites for supporting student services. For instance, McKay and Martin [44] discuss the use of information and communications technologies (ICT) in higher education as effective interventions for people with mental health issues, such as Web-mediated courseware design to meet the specific needs of people recovering from mental illness. As ICT tools have been successfully used for providing distance education, the same approach could be used to provide non-interrupted education for students with chronic mental health conditions as well as providing assessment and support services [44].

More traditional forms of health informatics are the uses of information systems to solve the data management problems arising from different forms and data requirements involving various stakeholders including patients. One of the challenges of mental health informatics is the data privacy issue, as the perceived lack of privacy is the main barrier to the growth of tele-mental health [69].

1.5 Conclusion

Mental health informatics is a multi-disciplinary endeavour incorporating many areas of research in information technology, communication, computer science, neuroscience, physiology, psychiatry, and psychology. Telepsychiatry and health social networks seem to be relatively well established with a great deal of industry and government support. In the recent years, many web-based online psychology counselling services have also been observed. Still, there remains much to be done, in particular more extensive and structured trials of the proposed methods seem warranted. However, many studies indicate the potential power of each of the approaches in solving current mental health care problems.