Vocal acoustic analysis and machine learning for the identification of schizophrenia

Espinola, Caroline Wanderley; Gomes, Juliana Carneiro; Pereira, Jessiane Mônica Silva; dos Santos, Wellington Pinheiro

doi:10.1007/s42600-020-00097-1

Vocal acoustic analysis and machine learning for the identification of schizophrenia

Original Article
Published: 29 September 2020

Volume 37, pages 33–46, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Research on Biomedical Engineering Aims and scope Submit manuscript

Vocal acoustic analysis and machine learning for the identification of schizophrenia

Download PDF

Caroline Wanderley Espinola^1,2,
Juliana Carneiro Gomes³,
Jessiane Mônica Silva Pereira³ &
…
Wellington Pinheiro dos Santos ORCID: orcid.org/0000-0003-2558-6602¹

620 Accesses
14 Citations
2 Altmetric
Explore all metrics

Abstract

Purpose

Psychiatry still needs objective biomarkers. In the context of schizophrenia, there are speech abnormalities such as tangentiality, derailment, alogia, neologisms, poverty of speech, and aprosodia. There is a growing interest in speech signals features as possible indicators of schizophrenia. This article aims to develop an intelligent tool for detection of schizophrenia using vocal patterns and machine learning techniques. The main advantages of this type of solution are the low cost, high performance, and for being non-invasive.

Methods

Thirty-one individuals over 18 years old were selected, 20 with previous diagnosis of schizophrenia, and 11 healthy controls. Their speech was audio-recorded in naturalistic settings, during a routine medical assessment for psychiatric patients. In the case of healthy patients, the recordings were made in different environments. Recordings were pre-processed, excluding non-participant voices. We extracted 33 features. We used the particle swarm optimization algorithm for feature selection.

Results

The classifiers’ performance was analyzed with four metrics: accuracy, sensibility, specificity, and kappa index. Best results were achieved when considering all 33 extracted features. Within machine models, support vector machines (SVM) models provided the greatest classification performance, with mean accuracy of 91.76% for PUK kernel. Our results outperform those from most studies published so far for the detection of schizophrenia based on acoustic patterns.

Conclusion

The use of machine learning classifiers using vocal parameters, in particular SVM, has shown to be very promising for the detection of schizophrenia. Nevertheless, further experiments with a larger sample will be necessary to validate our findings.

Detection of major depressive disorder, bipolar disorder, schizophrenia and generalized anxiety disorder using vocal acoustic analysis and machine learning: an exploratory study

Article 07 June 2022

Detection of major depressive disorder using vocal acoustic analysis and machine learning—an exploratory study

Article 12 October 2020

Automatic Detection of Voice Disorders

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Current clinical practice in psychiatry depends on diagnostic criteria built entirely on expert consensus, instead of relying on objective biomarkers (Bzdok and Meyer-lindenberg 2018). Such criteria, described in the Diagnostic and Statistical Manual, 5th Edition (DSM-5), and in the International Classification of Diseases (ICD-10), are still considered the gold-standard for diagnosis in psychiatry (American Psychiatric Association 2013). Nevertheless, those diagnostic systems have been criticized due to their absence of clinical predictability and neurological validity (Bzdok and Meyer-lindenberg 2018), and their poor diagnostic stability (Baca-Garcia et al. 2007). This ultimately leads to trial-and error treatment (Petzschner et al. 2017). While other medical fields hold markers of disease presence and severity, such as tumor volume measurement and biochemical blood tests, psychiatry still lacks routine objective tests (Bedi et al. 2015; Mundt et al. 2012).

Assessment and treatment in psychiatry are historically based on reports from patients and on clinical evaluation (Mundt et al. 2007). This makes diagnosis and therapeutic decision extremely sensitive to memory and subjectivity biases (Jiang et al. 2018). In this context, there was an intense search for biomarkers for diagnosis and follow-up of psychiatric patients in the last decade (Iwabuchi et al. 2013; Mundt et al. 2012). However, most of them are expensive and invasive (Higuchi et al. 2018). Therefore, despite all efforts, objective measures for assessment of mental disorders are still unknown (Mundt et al. 2007).

Other major challenges psychiatry faces are that nosology and clinical practice do not benefit from advances in neurosciences. These difficulties can be tackled by computational psychiatry, which applies machine learning (ML) with focus on clinical applications and single-subject treatments (Bzdok and Meyer-lindenberg 2018; Petzschner et al. 2017). Machine learning has successful implementations in problem-solving tasks in several medical fields, like supportive diagnostic tools based on neuroanatomical structures for Alzheimer’s disease (dos Santos et al. 2009; W. P. dos Santos et al. 2007), breast cancer (Cruz et al. 2018; de Lima et al. 2016; de Santana et al. 2018), and multiple sclerosis diagnosis (Commowick et al. 2018).

Schizophrenia is a group of severe psychotic disorders with heterogeneous etiologies, clinical presentations and responses to treatment (Sadock et al. 2017). It is characterized by hallucinations, delusions, thought and behavior disorder or catatonia, and “negative symptoms,” such as diminished emotional expression and avolition (American Psychiatric Association 2013). Since the first descriptions of this disorder, speech/language deficits have been described as remarkable features of schizophrenia, and are often associated with core negative symptoms and social impairment (Alberto et al. 2019). These symptoms comprise poverty of speech, disorganized speech, derailment, tangentiality, neologism, incoherence, mutism, perseveration, echolalia, thought blocking (Mac-Kay et al. 2018) inappropriate affect prosody or aprosodia (Chakraborty et al. 2018a; Covington et al. 2012; Elite et al. 2014). Also known as flattened speech intonation, aprosodia consists of diminished vocal emphasis (Alpert and Anderson 1977); reduced inflection and fluency (Alpert et al. 2000); and prosody comprehension deficits, such as difficulties in recognizing intonation patterns (Elite et al. 2014) Overall, these speech abnormalities result from disruptions in cognitive processes and contribute to the frequent communication deficits in schizophrenia (Mac-Kay et al. 2018).

In this framework, computational psychiatry has shown to be a promising method to deal with the complexity of psychiatric diagnosis, translating neuroscientific advances to clinical applications. Its data-driven approach applies machine learning techniques to high-dimensional data in order to improve classification diagnosis, treatment selection and even treatment outcomes (Huys et al. 2016). The use of ML models is appropriate for individual-level predictions, which would provide personalized therapeutic decisions in the future (Bzdok and Meyer-lindenberg 2018). Moreover, it may also enable mobile monitoring of patients and telemedicine applications that are accessible for clinical use (Cohen et al. 2016). In the context of speech-language deficits, vocal acoustic analyses using ML classifiers appear to be a promising venue for understanding their role within mental disorders (Cohen et al. 2012).

Thinking about this, this work proposes the application of ML techniques in audio-recordings to perform binary classification. For this, we collected data from 31 patients, divided into 2 groups: group of patients diagnosed with schizophrenia, and a control group, composed of healthy patients. In this context, we pre-processed all recordings in order to minimize environment noises. After that, we extracted 33 features from each 10 s-window of the signals. Finally, multiple classifiers were tested. Our goal is to provide an intelligent tool that performs accurate and non-invasive schizophrenia diagnosis with low computational cost.

This paper is organized as follows: Section 2 describes studies related to the characterization of schizophrenia based on vocal parameters. In Section 3, an instrument for the detection of schizophrenia is introduced and implemented. Results are presented and discussed in Sections 4 and 5, respectively. Section 6 states our conclusions with suggestions for future studies on this subject.

Related works

As speech-language abnormalities are a hallmark in schizophrenia, several related studies have been published, most of which on natural language processing and semantics/syntax (Bedi et al. 2015; Chakraborty et al. 2018a; Elvevåg et al. 2010; Kayi et al. 2017; Tovar et al. 2019), and a limited number of studies about vocal patterns in schizophrenia (Tahir et al. 2019).

Patients with schizophrenia tend to show slowed speech, reduced pitch variability, significantly increased number of pauses, and decreased variability in syllable timing than healthy individuals. These characteristics were observed in a semi-automatic analysis of vocal pitch or fundamental frequency (F0) during an emotionally neutral reading task performed by Martínez-sánchez et al. (2015). In a sample of 80 subjects, they reported a discrimination accuracy of 93.8% between schizophrenic patients and controls using signal processing algorithms. They also observed remarkable intergroup differences, with patients exhibiting slowed speech, low volume, and many pauses.

Likewise, Rapcan et al. (2010) compared vocal pitch, temporal, and energy parameters of 39 schizophrenic patients and 18 healthy controls during an emotionally neutral reading task. Their results demonstrated significant differences between groups, with patients showing decreased mean utterance duration, and increased values in number of pauses, proportion of silence, mean pause duration, total length of pauses, and relative variation in energy. On the other hand, no statistical significance was reported for total length of utterances and relative variation in vocal pitch. However, the lack of educational level matching between groups with reading task may represent an important limitation to their findings, because different educational status may translate into different reading speed and fluency between patient and control samples.

Vocal acoustic analysis is also capable of measuring the severity of negative symptoms such as aprosodia. Compton et al. (2018) analyzed audio recordings of schizophrenic patients with aprosodia, schizophrenic patients without aprosodia, and healthy controls, and compared variability in pitch (F0), first (F1) and second (F2) formants, and intensity/loudness. Their results showed significant differences among groups, with the group with aprosodia showing reduced variability in pitch, F2, and intensity/loudness than other groups.

Similarly, Covington et al. (2012) analyzed F0, F1, and F2 of 25 video-recorded interviews. They investigated tongue movement as an indicator of the severity of negative symptoms in first-episode schizophrenia-spectrum patients. Their study concluded that F2, a measure of variability of tongue anterior or posterior position, was significantly correlated with the severity of negative symptoms.

Chakraborty et al. (2018b) employed low-level speech signals (or low-level descriptors, LLD) alone or in combination with body movements to predict negative symptoms of schizophrenia using automatic classifiers. For that purpose, they applied support vector machines (SVM), a supervised machine learning technique widely used in classification problems (Russell and Norvig 2016). They reported a classification accuracy of 79.49% using low-level speech signals alone, and of 86.36% for their combination with body movements.

Likewise, Tahir et al. (2019) investigated conversational and prosodic features as objective measures of negative symptoms in schizophrenia. Conversational features relate to duration of speech, speaking turns, interruptions, and interjections, while prosodic features comprise F0, F1, F2, and F3; mel frequency cepstral coefficients (MFCCs); and amplitude (minimum, maximum and mean volume, entropy). The performance of some ML algorithms in discriminating between patients and healthy controls was evaluated in their article: SVM, multilayer perceptron (MLP), random forest (RF), and ensemble (bagging). The best results were reported for MLP (accuracy = 81.3%), with speaking rate, frequency, and volume entropy showing significant differences between groups.

In a meta-analysis of 46 papers about acoustic patterns in schizophrenia, Alberto et al. (2019) compared three categories of study design: qualitative ratings, quantitative univariate analyses, and multivariate ML investigations. Machine learning studies provided superior results, with overall out-of-sample accuracy of 76.5–87.5%, and appeared to be more promising. They also identified remarkable differences in acoustic patterns between schizophrenic patients and healthy controls, with the patient group showing decreased proportion of spoken time, reduced speech rate, and increased duration of pauses. These abnormalities were directly related to flat affect and alogia. Additionally, they observed that studies with dialogical and free speech provided the greatest differences between groups, in contrast with studies using constrained monologs.

Methods

In this study, a sample of 31 volunteers over 18 years old was selected and divided into two subsamples:

Healthy control: 11 healthy participants (6 males) were selected through the Self-Reporting Questionnaire (SRQ-20), a screening instrument for common mental disorders (Gonçalves et al. 2008; K. O. B. Santos et al. 2010);
Schizophrenia: 20 patients previously diagnosed with schizophrenia (12 males) were assessed using the Brief Psychiatric Rating Scale (BPRS; Overall and Gorham 1962), one of the most widely used instruments for the evaluation of symptom severity in schizophrenia (Leucht et al. 2005).

All individuals from the schizophrenia sample (mean age = 36.00; SD = 12.39; 54.5% male) fulfilled DSM-5 diagnostic criteria for schizophrenia and were previously diagnosed by an independent psychiatrist. Data for this group were collected at outpatient settings and at inpatient psychiatric units in Hospital das Clínicas, Federal University of Pernambuco, and in Hospital Ulysses Pernambucano, both in Recife, Northeast Brazil. Participants with coexistent neurological disorders or who made professional use of their voices were excluded.

Meanwhile, the control sample (mean age = 30.09; SD = 12.58; 60.0% male) was matched with the patient sample for age, gender and region of origin (Brazilian Northeast). The same exclusion criteria were applied to this group. Participants from both groups were literate, but the control sample had a higher educational level (p < 0.001). Unfortunately, it was not possible to match subsamples with reference educational level, as this was a challenging co-variable to match for in this particular population. Although this represents a limitation to our study, a similar approach was made in some previous studies (Cannizzaro et al. 2005; Cohen et al. 2008; Rapcan et al. 2010). Sample characteristics are presented in Table 1.

Table 1 Sample characteristics: the 31 participants were divided into two groups: control group composed of healthy patients, and the group of people diagnosed with schizophrenia. In both groups, there is a predominance of males. The average age of the control group is 30 years, while in the second group it is 36 years

Full size table

The use of SRQ-20 was designed to remove participants with current mental illnesses from the control sample. The SRQ-20 cut-off score of 6/7 was considered (Santos et al. 2010), whereas in the schizophrenia sample, participants with prior diagnosis were included, irrespective of their BPRS score. The mean BPRS score of schizophrenic patients in this sample was 44.55 and corresponded to moderate illness severity (Leucht et al. 2005). All participants have given written consent, and this study was conducted only after approval of a local Research Ethical Board.

Acquisitions of voice samples

A Tascam™ 16-bit linear PCM recorder was used, at 44.1 KHz sampling rate, in WAV format, without file compression. Audio-recordings of the schizophrenia sample were acquired during an interview with a psychiatrist in naturalistic settings, i.e., patients were recorded during a routine medical assessment at outpatient offices or inpatient units. After each interview, a trained clinician assessed their symptoms using BPRS. Meanwhile, healthy controls were audio-recorded in different environments (e.g., office, classroom, gym). Participants from this sample were asked to answer SRQ-20, as this questionnaire is self-applied. No duration limit was set for the recordings. As conversations were thoroughly recorded, voices from the clinician and possible third parties were also acquired and needed to be further removed. The total duration of the recordings of both samples was 407.3 min (6.79 h). The process of data acquisition is summarized in Fig. 1.

Audio editing

After data collection, voice signals from the interviewer and any potential companion were manually removed using Audacity audio software (version 2.3.2). This process yielded 222.6 min of recorded audio from participants (3.71 h) as follows: 96.9 min for the control sample and 125.7 min for the schizophrenia sample. Recording duration of both samples after audio editing is shown in Table 2.

Table 2 Recording duration after audio editing

Full size table

Feature extraction

All recordings were submitted to a vocal feature extraction on GNU Octave™; a free open-source signal-processing software. Rectangular windows, with frame length of 10 s. In order to determine the window overlap percentage, three overlap sizes were tested: 10% (1 s), 25% (2.5 s), and 50% (5 s). For this, the random forest classifier was used. We performed these experiments 30 times, using 10-folds cross validation in Weka environment. Boxplots in Fig. 2 shows the accuracy results for these three scenarios. As shown in the figure, 50% overlap outperforms the others. It reached higher mean accuracy value, as well as less dispersion.

As raw audio data were used, no filtering process was applied. Consequently, background noise was also captured. However, we believe such noise would not be able to interfere significantly, given the homogeneous spectral behavior of the acoustic features selected for extraction. At this stage, the following 33 parameters were extracted: skewness; kurtosis; zero crossing; slope sign changes; variance; standard deviation; mean absolute value; logarithm detector; root mean square; average amplitude change; difference absolute deviation; integrated absolute value; mean logarithm kernel; simple square integral; mean value; third, fourth and fifth moments; maximum amplitude; power spectrum ratio; peak frequency; mean power; mean frequency; median frequency; total power; variance of central frequency; first, second and third spectral moments; Hjorth parameter activity, mobility and complexity; and waveform length. The corresponding mathematical expressions of these attributes are presented in Table 3.

Table 3 Equations of the 33 extracted parameters

Full size table

The choice of the above parameters relies on their accurate representation of input signals to computational models, once decision-making process of machine learning classifiers is not associated with human interpretation. Additionally, attributes from different domains (e.g., temporal and spectral) were selected so as to avoid feature selection biases. Furthermore, such parameters have already been successfully used for representing other biomedical signals, such as electroencephalography. Subsequently, the most relevant parameters were selected using particle swarm optimization (PSO), a feature selection method for dimensionality reduction within classification problems (Xue et al. 2012).

Feature selection using particle swarm optimization

Particle swarm optimization (PSO) algorithms were created by James Kennedy and Russel Eberhart in 1995, respectively a social psychologist and an electrical engineer (Kennedy and Eberhart 1995). PSOs are based on the behavior and movement of flocks of animals, such as fish and birds, therefore being algorithms based on theories that describe animal social behavior, having elements in common with genetic algorithms and with evolutionary programming (Eberhart and Kennedy 1995; Kennedy and Eberhart 1995; Santos and Assis 2013).

Similar to genetic algorithms, PSO is initialized with a random initial population. However, while in the genetic algorithms, the individuals in this initial population are represented by chromosomes, in the PSO a position vector and a velocity vector are associated with each individual. In addition, in the PSO there are no mutations or selection of individuals. Thus, at each iteration, only positions and speeds of different individuals are adjusted in the direction of the best global position and the best individual position, according to a certain objective function, according to the following canonical expression (Eberhart and Shi 2011; Chuanwen and Bompard 2005; Van der Merwe and Engelbrecht 2003; Hu et al. 2003; Trelea 2003; Shi and Krohling 2002):

$$ {\boldsymbol{x}}_i\left(t+1\right)={\boldsymbol{x}}_i(t)+{\boldsymbol{v}}_i\left(t+1\right), $$

(1)

since

$$ {\boldsymbol{v}}_i\left(t+1\right)={w\boldsymbol{v}}_i(t)+{c}_1{r}_1\left[{\boldsymbol{p}}_i(t)-{\boldsymbol{x}}_i(t)\right]+{c}_2{r}_2\left[{\boldsymbol{p}}_g(t)-{\boldsymbol{x}}_i(t)\right], $$

(2)

for 1 ≤ i ≤ m, where m is the number of particles in the cluster; w is the inertia factor, where 0 < w < 1; r₁(t) and r₂(t) are numbers randomly uniformly distributed in the interval [0, 1]; c₁ and c₂ are constriction constants, also called coefficients of acceleration, so that c₁ + c₂ = 4 (typically, c₁ = 2 + D and c₂ = 2 − D, where D ≈ 0), where c₁ is the weight due to consciousness of the particle, individual consciousness or local consciousness, depending on the implementation, while c₂ is the weight due to global awareness; x_i is the position, while v_i is the speed of ith particle; p_g is the best global position, while p_i is the best individual or local position in relation to the ith particle.

Local and global best positions are considered according to local and global maxima of a determined objective function, whilst the position x_i defines the i-th solution candidate. In this classification problem, we defined x_i as a n-dimensional binary vector in which each coordinate is associated to the presence (“1” values”) or absence (“0” values) of the corresponding selected characteristic. Therefore, each solution candidate is associated to training and test sets composed by dimension-reduced feature vectors. As objective function, we used a J48 decision tree returning classification accuracies. The parameters w, c₁, c₂, r₁, and r₂ were all set to 0.33. We used a population of 20 individuals evolving in 500 generations. This solution was implemented in Java using the Java machine learning library Weka (Moraglio et al. 2007; García-Nieto et al. 2009).

Classification

Both databases (with all features extracted and after PSO selection) were balanced through the addition of artificial instances on Weka™ artificial intelligence environment. This is essential to avoid computational biases towards the class with more representativeness, in this case the schizophrenia sample. Edited audio samples were submitted to classification experiments using the following ML algorithms on Weka™: multilayer perceptron (MLP), logistic regression, random forest (RF), decision trees, Bayes net, Naïve Bayes, and SVM with different kernels (linear, polynomial kernel, radial basis function or RBF, PUK, and normalized polynomial kernel). Given the relatively small number of subjects in each sample, experiments were performed with 10-fold cross-validation in order to maximize training samples. Figure 3 illustrates the steps of the prediction system.

Results

Initially, computational experiments were performed using classifiers in their default settings. Subsequently, different setups for all algorithms with adjustable settings were tested (MLP; polykernel and normalized polykernel SVM, SVM PUK kernel, and random forest). The best performances for each classifier type are presented in the boxplots of the Figs. 3 and 4 below. Figures 3 and 4 show the accuracy and kappa index values, respectively. They also compare the classifiers’ performances using all 33 extracted attributes (white boxplots), and using the attributes selected by the PSO method (gray color). Using PSO, 12 attributes were selected, which are listed in the Table 4. As can be seen in Figs. 4 and 5, most classifiers have a better performance when considering all attributes. The exception occurs only for classifiers based on Bayes’ theory. However, the latter are classifiers with low performance for this problem. Thereby, they were not chosen. Furthermore, Table 5 presents accuracy, kappa index, sensibility, and specificity values for the best classifiers.

Table 4 List of 12 attributes after selection with particle swarm optimization

Full size table

Table 5 Classification performances of machine learning models (schizophrenia vs. healthy control). SVM with PUK kernel presented the best results in all four evaluated metrics (accuracy, kappa index, sensitivity, and specificity). It achieved an average accuracy of 91.76%, mean kappa index of 0.8352, sensibility of 91.9%, and specificity of 91.6%

Full size table

The results above demonstrate that classification accuracy for SVM models varied significantly (72.93–91.76%), depending on which kernel was used. SVM PUK kernel achieved mean accuracy of 91.76% (sensibility 91.9%; specificity 91.6%), which was the best performance of all classifiers used in this study. The confusion matrix of this kernel is shown in Table 6. SVM normalized polynomial kernel also achieved accuracy above 90%. The greatest performances of different SVM kernels in this dataset support findings from previous studies, which possibly indicate the superiority of this algorithm for classification tasks using vocal parameters.

Table 6 Confusion matrix for the model with the highest performance (SVM PUK): 91.59% instances from the control group were correctly classified, while 91.89% instances from the Schizophrenia group were correctly classified

Full size table

Discussion

This paper presents a study on discriminating schizophrenic patients and healthy subjects based on vocal features and machine learning classifiers. The process of data acquisition was designed to provide high translational power, as this is the first study to collect audio-recordings during actual psychiatric interviews. A feature extraction algorithm has been locally developed for the reliable extraction of 33 acoustic features, which have successfully been used for modeling classification problems in neurology and psychiatry. Some machine learning models tested in this paper have achieved high performances; in particular, SVM with PUK kernel yielded high classification accuracy both for schizophrenic patients and healthy controls. With the exception of Martínez-sánchez et al. (2015), our results outperformed those from similar studies using vocal parameters for the detection of schizophrenia.

Nevertheless, although promising, findings reported in this article should be considered preliminary due to limitations in study design. For instance, the small sample size and not controlling for possible confounding factors, such as smoking history and use of medications, may limit statistical analyses. Additionally, an important caveat is the difference in educational level between samples, given the fact that educational background is related to speech fluency. In future studies, we aim to address these limitations and perform the same experiments on a larger number of subjects. In Table 7 below a comparative analysis between some of the studies mentioned in this article and this study is presented.

Table 7 Comparative analysis of previous studies and this paper

Full size table

Conclusion and future works

Current psychiatric diagnosis still lacks objective biomarkers and relies mostly on specialist opinion based on diagnostic systems. Nevertheless, these criteria have been criticized due to their lack of correlation with the neurobiology and etiopathogenesis of mental disorders, leading to trial-and-error treatments. In this context, patients with schizophrenia may present with vocal acoustic abnormalities that may be used as objective parameters for the identification and assessment of this disorder.

Therefore, this paper aimed at the development of objective measures of schizophrenia to aid clinical practice in the future. For this purpose, we extracted vocal acoustic features and performed experiments using different automated classification techniques based on machine learning. Some of the most widely used machine learning classifiers were tested in this work. Our results demonstrate the viability of an inexpensive and non-invasive tool for the detection of schizophrenia based on vocal acoustic analysis through machine learning algorithms. In future studies, we intend to perform the same experiments in a larger sample, and also with gender-based datasets. We would like to evaluate if schizophrenia affects vocal acoustic properties from men and women in a different fashion, and if so, how these differences influence the performance of automated classifiers.

References

Alberto P, Arndis S, Vibeke B, Riccardo F. Voice patterns in schizophrenia: a systematic review and Bayesian Meta-analysis. Voice Schizophrenia Rev Meta-anal. 2019;1–40.
Alpert M, Anderson LT. Imagery mediation of vocal emphasis in flat affect. Arch Gen Psychiatry. 1977;34(2):208–12.
Article Google Scholar
Alpert M, Rosenberg SD, Pouget ER, Shaw RJ. Prosody and lexical accuracy in flat affect schizophrenia. Psychiatry Res. 2000;97:107–18.
Article Google Scholar
American Psychiatric Association. (2013). DSM-5 - Manual Diagnóstico e Estatístico de Transtornos Mentais. Artmed (5.). Porto Alegre: Artmed. 1011769780890425596.
Baca-Garcia E, Perez-Rodriguez MM, Basurte-Villamor I, Fernandez Del Moral AL, Jimenez-Arriero MA, Gonzalez De Rivera JL, et al. Diagnostic stability of psychiatric disorders in clinical practice. Br J Psychiatry. 2007;190(MAR):210–6. https://doi.org/10.1192/bjp.bp.106.024026.
Article Google Scholar
Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis onset in high-risk youths. Nature Partner Journals. 2015;1:15030. https://doi.org/10.1038/npjschz.2015.30.
Article Google Scholar
Bzdok D, Meyer-lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biologic Psychiat Cognit Neurosci Neuroimag. 2018;3:223–30. https://doi.org/10.1016/j.bpsc.2017.11.007.
Article Google Scholar
Cannizzaro MS, Cohen H, Rappard F, Snyder PJ. Bradyphrenia and Bradykinesia both contribute to altered speech in schizophrenia: a quantitative acoustic study. Cogn Behav Neurol. 2005;18(4):206–10. https://doi.org/10.1097/01.wnn.0000185278.21352.e5.
Article Google Scholar
Chakraborty D, Xu S, Yang Z, Han Y, Chua V, Tahir Y, et al. Prediction of negative symptoms of schizophrenia from objective linguistic, acoustic and non-verbal conversational cues. In: IEEE 2018 international conference on Cyberworlds prediction; 2018a. p. 280–3. https://doi.org/10.1109/CW.2018.00057.
Chapter Google Scholar
Chakraborty, D, Yang, Z, Tahir, Y, Maszczyk, T, Dauwels, J, Thalmann, N, … Lee, J (2018b). Prediction of Negative Symptoms of Schizophrenia From Emotion Related Low-Level Speech Signals. IEEE, 6024–6028.
Chuanwen J, Bompard E. A hybrid method of chaotic particle swarm optimization and linear interior for reactive power optimisation. Math Comput Simul. 2005;68(1):57–65.
Article MathSciNet Google Scholar
Cohen AS, Alpert M, Nienow TM, Dinzeo TJ, Docherty NM. Computerized measurement of negative symptoms in schizophrenia. J Psychiatr Res. 2008;42:827–36. https://doi.org/10.1016/j.jpsychires.2007.08.008.
Article Google Scholar
Cohen AS, Mitchell KR, Docherty NM, Horan WP. Vocal expression in schizophrenia: less than meets the ear. J Abnorm Psychol. 2016;125(2):299–309. https://doi.org/10.1037/abn0000136.
Article Google Scholar
Cohen AS, Najolia GM, Kim Y, Dinzeo TJ. On the boundaries of blunt affect/alogia across severe mental illness: implications for research domain criteria. Schizophr Res. 2012;140(1–3):41–5. https://doi.org/10.1016/j.schres.2012.07.001.
Article Google Scholar
Commowick O, Istace A, Kain M, Laurent B, Leray F, Simon M, et al. Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure. Sci Rep. 2018;8(1):1–17.
Article Google Scholar
Compton MT, Lunden A, Cleary SD, Pauselli L, Alolayan Y, Halpern B, et al. The aprosody of schizophrenia: computationally derived acoustic phonetic underpinnings of monotone speech. In: Schizophrenia Research; 2018. p. 1–8. https://doi.org/10.1016/j.schres.2018.01.007.
Chapter Google Scholar
Covington MA, Lunden SLA, Cristofaro SL, Wan CR, Bailey CT, Broussard B, et al. Phonetic measures of reduced tongue movement correlate with negative symptom severity in hospitalized patients with first-episode schizophrenia-spectrum disorders. Schizophr Res. 2012;142:93–5.
Article Google Scholar
Cruz T, Cruz T, Santos W. Detection and classification of lesions in mammographies using neural networks and morphological wavelets. IEEE Lat Am Trans. 2018;16(3):926–32.
Article Google Scholar
de Lima SM, da Silva-Filho AG, dos Santos WP. Detection and classification of masses in mammographic images in a multi-kernel approach. Comput Methods Prog Biomed. 2016;134:11–29.
Article Google Scholar
de Santana MA, Pereira JMS, da Silva FL, de Lima NM, de Sousa FN, de Arruda GMS, et al. Breast cancer diagnosis based on mammary thermography and extreme learning machines. Res Biomed Eng. 2018;34(1):45–53.
Article Google Scholar
dos Santos WP, De Assis FM, De Souza RE, Mendes PB, De Souza Monteiro HS, Alves HD. A dialectical method to classify Alzheimer’s magnetic resonance images. Evol Comput. 2009;473.
dos Santos, WP, de Souza, RE, & dos Santos Filho, PB (2007). Evaluation of Alzheimer’s disease by analysis of MR images using multilayer perceptrons and Kohonen SOM classifiers as an alternative to the ADC maps. In 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 2118–2121).
Eberhart, R, & Kennedy, J (1995). A new optimizer using particle swarm theory. In MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science (pp. 39-43). IEEE.
Eberhart RC, Shi Y. Computational intelligence: concepts to implementations. Amsterdam: Elsevier; 2011.
MATH Google Scholar
Elite A, Pedrão LJ, Zamberlan-Amorim NE, Carvalho AMP, Bárbaro AM. Comportamento comunicativo de indivíduos com esquizofrenia. Rev CEFAC. 2014;16(4):1283–93.
Article Google Scholar
Elvevåg B, Foltz PW, Rosenstein M, DeLisi LE. An automated method to analyze language use in patients with schizophrenia and their first-degree relatives. J Neurolinguistics. 2010;23(3):270–84. https://doi.org/10.1161/CIRCULATIONAHA.110.956839.
Article Google Scholar
García-Nieto J, Alba E, Jourdan L, Talbi E. Sensitivity and specificity based multiobjective approach for feature selection: application to cancer diagnosis. Inf Process Lett. 2009;109(16):887–96.
Article MathSciNet Google Scholar
Gonçalves DM, Stein AT, Kapczinski F. Avaliação de desempenho do Self-Reporting Questionnaire como instrumento de rastreamento psiquiátrico: Um estudo comparativo com o Structured Clinical Interview for DSM-IV-TR. Cad Saude Publica. 2008;24(2):380–90. https://doi.org/10.1590/S0102-311X2008000200017.
Article Google Scholar
Higuchi M, Tokuno S, Nakamura M, Shinohara S. Classification of bipolar disorder, major depressive disorder, and healthy state using voice. Asian J Pharm Clin Res. 2018;11(3):89–93. https://doi.org/10.22159/ajpcr.2018.v11s3.30042.
Article Google Scholar
Hu, X, Eberhart, RC, & Shi, Y (2003). Engineering optimization with particle swarm. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (cat. No. 03EX706) (pp. 53-57). IEEE.
Huys QJM, Maia TV, Frank MJ. Computational psychiatry as a bridge from neuroscience to clinical applications. Nat Neurosci. 2016;19(3):404–13. https://doi.org/10.1038/nn.4238.
Article Google Scholar
Iwabuchi SJ, Liddle PF, Palaniyappan L. Clinical utility of machine-learning approaches in schizophrenia: improving diagnostic confidence for translational neuroimaging. Front Psych. 2013;4(August):1–9. https://doi.org/10.3389/fpsyt.2013.00095.
Article Google Scholar
Jiang H, Hu B, Liu Z, Wang G, Zhang L, Li X, et al. Detecting Depression Using an Ensemble Logistic Regression Model Based on Multiple Speech Features. Comput Math Methods Med. 2018;2018:6508319. https://doi.org/10.1155/2018/6508319.
Article MATH Google Scholar
Kayi, ES, Diab, M, Pauselli, L, Compton, M, & Coppersmith, G (2017). Predictive linguistic features of schizophrenia. Proceedings Ofthe 6th Joint Conference on Lexical and Computational Semantics, 241–250.
Kennedy, J, & Eberhart, R (1995). Particle swarm optimization. In Proceedings of ICNN'95-International Conference on Neural Networks (Vol. 4, pp. 1942-1948). IEEE.
Leucht S, Kane JM, Kissling W, Hamann J, Etschel E, Engel R. Clinical implications of Brief psychiatric rating scale scores. Br J Psychiatry. 2005;187(2):366–71. https://doi.org/10.1016/j.physbeh.2017.03.040.
Article Google Scholar
Mac-Kay A, Jerez I, Pesenti P. Speech-language intervention in schizophrenia: an integrative review. Rev CEFAC. 2018;20(2):238–46. https://doi.org/10.1590/1982-0216201820219317.
Article Google Scholar
Martínez-Sánchez F, Muela-Martínez JA, Cortés-soto P, José J, Meilán G, Antonio J, et al. Can the acoustic analysis of expressive prosody discriminate schizophrenia? Span J Psychol. 2015;18(86):1–9. https://doi.org/10.1017/sjp.2015.85.
Article Google Scholar
Moraglio A, Di Chio C, Poli R. Geometric particle swarm optimisation. In: European conference on genetic programming. Berlin, Heidelberg: Springer; 2007. p. 125–36.
Chapter Google Scholar
Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J Neurolinguistics. 2007;20:50–64. https://doi.org/10.1016/j.jneuroling.2006.04.001.
Article Google Scholar
Mundt JC, Vogel AP, Feltner DE, Lenderking WR. Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry. 2012;72(7):580–7. https://doi.org/10.1016/j.biopsych.2012.03.015.Vocal.
Article Google Scholar
Overall JE, Gorham DR. The Brief Psychiatric Rating Scale. Psychol Rep. 1962;10:799–812.
Article Google Scholar
Petzschner FH, Weber LAE, Gard T, Stephan KE. Review computational psychosomatics and computational psychiatry : toward a joint framework for differential diagnosis. Biol Psychiatry. 2017;82:1–10. https://doi.org/10.1016/j.biopsych.2017.05.012.
Article Google Scholar
Rapcan V, D’Arcy S, Yeap S, Afzal N, Thakore J, Reilly RB. Acoustic and temporal analysis of speech: a potential biomarker for schizophrenia. Med Eng Phys. 2010;32:1074–9. https://doi.org/10.1016/j.medengphy.2010.07.013.
Article Google Scholar
Russell SJ, Norvig P. Artificial Intelligence: A Modern Approach (third). Harlow: Pearson Education; 2016.
MATH Google Scholar
Sadock B, Sadock V, Ruiz P. Compêndio de Psiquiatria: Ciência do Comportamento e Psiquiatria Clínica (11.). Porto Alegre: Artmed; 2017.
Google Scholar
Santos WP, Assis FM. Algoritmos dialéticos para inteligência computacional. Recife: Editora Universitária UFPE; 2013.
Google Scholar
Santos KOB, Araújo TM, Pinho PS, Silva ACC. Avaliação de um Instrumento de Mensuração de Morbidade Psíquica. Revista Baiana de Saúde Pública. 2010;34(3):544–60.
Article Google Scholar
Shi, Y, & Krohling, RA (2002). Co-evolutionary particle swarm optimization to solve min-max problems. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (cat. No. 02TH8600) (Vol. 2, pp. 1682-1687). IEEE.
Tahir Y, Yang Z, Id DC, Thalmann N, Thalmann D, Maniam Y, et al. Non-verbal speech cues as objective measures for negative symptoms in patients with schizophrenia. PLoS One. 2019;14:1–17. https://doi.org/10.1371/journal.pone.0214314.
Article Google Scholar
Tovar A, Fuentes-Claramonte P, Soler-Vidal J, Ramiro-Sousa N, Rodriguez-Martinez A, Sarri-Closa C, et al. The linguistic signature of hallucinated voice talk in schizophrenia. Schizophr Res. 2019;206:111–7.
Article Google Scholar
Trelea IC. The particle swarm optimization algorithm: convergence analysis and parameter selection. Inf Process Lett. 2003;85(6):317–25.
Article MathSciNet Google Scholar
Van der Merwe, DW, & Engelbrecht, AP (2003). Data clustering using particle swarm optimization. In The 2003 Congress on Evolutionary Computation, 2003. CEC'03. (Vol. 1, pp. 215-220). IEEE.
Xue B, Zhang M, Member S, Browne WN. Particle swarm optimization for feature selection in classification: a multi-objective approach. In: Ieee Transactions on Cybernetics; 2012. p. 1–16.
Google Scholar

Download references

Acknowledgments

We are grateful to the Brazilian research-funding agency CNPq, for the partial support of this research.

Author information

Authors and Affiliations

Departamento de Engenharia Biomédica, Universidade Federal de Pernambuco, Recife, Brazil
Caroline Wanderley Espinola & Wellington Pinheiro dos Santos
Serviço de Emergências Psiquiátricas, Hospital Ulysses Pernambucano, Recife, Brazil
Caroline Wanderley Espinola
Núcleo de Engenharia da Computação, Escola Politécnica da Universidade de Pernambuco, Recife, Brazil
Juliana Carneiro Gomes & Jessiane Mônica Silva Pereira

Authors

Caroline Wanderley Espinola
View author publications
You can also search for this author in PubMed Google Scholar
Juliana Carneiro Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Jessiane Mônica Silva Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Wellington Pinheiro dos Santos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wellington Pinheiro dos Santos.

Ethics declarations

Conflict of interest

Authors do not have any conflicts of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Espinola, C.W., Gomes, J.C., Pereira, J.M.S. et al. Vocal acoustic analysis and machine learning for the identification of schizophrenia. Res. Biomed. Eng. 37, 33–46 (2021). https://doi.org/10.1007/s42600-020-00097-1

Download citation

Received: 16 May 2020
Accepted: 22 September 2020
Published: 29 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s42600-020-00097-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Vocal acoustic analysis and machine learning for the identification of schizophrenia