Passive Sensing of Affective and Cognitive Functioning in Mood Disorders by Analyzing Keystroke Kinematics and Speech Dynamics

Electrodermal Activity in Ambulatory Settings: A Narrative Review of Literature

Technologies for Quantifying Sleep: Improved Quality of Life or Overwhelming Gadgets?

1 Introduction

Mood disorders take a sizable toll on the world’s population, affecting more than 1 in 20 people annually and nearly 1 out of every 10 people over the course of their lifetime (Steel et al. 2014). Bipolar disorder, which alone accounts for at least 1% of years lived with disability globally (GBD 2017), is a mood disorder that causes patients to alternate between manic episodes of abnormally elevated mood and energy levels, and depressive episodes marked by diminished mood, interest, and energy (APA 2013). Compared to major depressive disorder (MDD), bipolar disorder can be harder to diagnose, and even when an accurate diagnosis is made, it is often delayed. The depressive episodes in both disorders share the same diagnostic criteria, and it is known that individuals suffering from bipolar disorder on average spend more time in the depressive phase than in mania. In particular, bipolar disorder type II, a subtype which is differentiated by attenuated levels of mania-like symptoms (termed hypomania) is difficult to diagnose by non-specialists as it can be challenging to distinguish from recurring unipolar depression. The presence of mood episodes with mixed features, i.e., those that exhibit characteristics of both mania and depression, can further complicate the process of diagnosis (Phillips and Kupfer 2013).

1.1 Current State of Diagnosis and Monitoring of Bipolar Disorder

Clinical approaches to diagnosing and monitoring bipolar disorder usually start with careful history-taking by the clinician (detailed interviews with patients and their family members as well as probing for a family history of the disorder), followed by the frequent use of self- and clinician-administered rating scales that assess for a history of possible mania or hypomania in patients with depression. Even with these tools at their disposal, it is often difficult for clinicians to ascertain whether any noted changes in mood, sleep, or energy are within normal ranges—or whether they are evidence of, say, a manic/hypomanic episode (Wolkenstein et al. 2011). Achieving inter-rater reliability between administered assessments and scales poses its own challenges.

After a correct diagnosis has been made, monitoring of symptoms commonly relies upon self-reports that may include mood charting and self-ratings or clinician-rated scales. These scales can only assess the severity of symptoms experienced by the patients and cannot actually screen for mania or hypomania; patients in manic states also may not be cognizant of their manic symptoms, casting doubt on the validity of some of these assessments (NCCMH 2018).

Ecological momentary assessments (EMA) have been used for supplementary monitoring in mood disorders with varying degrees of success (Ebner-Priemer and Trull 2009; Asselbergs et al. 2016; Kubiak and Smyth 2019). Asselbergs and colleagues reported that the clinical utility of self-report EMA is too often limited by the heavy response burden that is imposed upon respondents—which can result in large dropout rates after an initial period of activity—and furthermore, that the predictive models constructed using unobtrusive EMA data were inferior to existing benchmark models.

In recent years, other techniques including neuroimaging (Phillips et al. 2008; Leow et al. 2013; Ajilore et al. 2015; Andreassen et al. 2018) and genomics (Hou et al. 2016; Ikeda et al. 2017) have also been used in attempts to discover biomarkers for bipolar disorder. Although they may not currently be feasible either for diagnosis or for monitoring on an individual level, in the near future we may begin finding immense value in these and related methods beyond their immediate research applications.

In addition to its affective components, bipolar disorder also influences cognitive ability (APA 2013). Among the most severely impaired domains of cognition are attention, working memory, and response inhibition (Bourne et al. 2013). These provide another avenue to further aid in distinguishing a possible diagnosis of bipolar disorder from other mood disorders and assessing its course and treatment.

1.2 Passive Sensing in Physical Health

Smartwatches, fitness trackers, and associated physical health and fitness apps in general have to a large extent enabled and encouraged users to self-manage chronic medical conditions and attempt to take better care of their physical health (Anderson et al. 2016; Canhoto and Arp 2017; Messner et al. 2019). The Apple Watch, for instance—which uses photoplethysmography to passively sense atrial fibrillation—and the associated Apple Heart Study (Turakhia 2018) have already been credited with saving several lives by alerting enrolled users to the onset of life-threatening conditions and directing them to seek immediate medical attention (Feng 2018; Perlow 2018).

1.3 What About Passive Sensing for Mental Health?

Portable sensors to track the health of the rest of the body have so far proven easier to develop than those that can track brain health. As yet, there are no portable functional magnetic resonance imaging (fMRI) scanners or brain-computer interfaces (BCI) that can be used to unobtrusively analyze brain functioning—although science fiction has proposed examples of each in the form of, respectively, cowboy hats that conduct brain scans to map wearers’ cognition in television shows such as Westworld (Avunjian 2018) and biomechanical computer implants called neural lace in author Iain M. Banks’ series The Culture (Banks 2002, 2010)—which science may in fact someday deliver instead in the shape of the startup Openwater’s fMRI-replacing ski hats that are purportedly being designed to use infrared holography to scan oxygen utilization by the wearer’s brain (Jepsen 2017; Clifford 2017) and implantable electronic circuits capable of neural communication such as those being developed by Neuralink and others (Fu et al. 2016; Chung et al. 2018; Sanford 2018).

Until these nascent technologies reach maturity, there is a need for passive sensing tools that can bridge the divide and perhaps eliminate the need for more onerous means of sensing altogether. Smartphones are already ubiquitous enough and offer a wide array of sensors, which when used in concert with mHealth and digital phenotyping tools, offer a greater degree of precision medicine tools to users, researchers, and healthcare providers than ever before. Indeed, the very use of smartphones, and mobile social networking apps in particular, has been found to be associated with structural and functional changes in the brain (Montag et al. 2017); the corollary that smartphone usage patterns can be used to quantify the presence of established biomarkers has also been explored by Sariyska and colleagues (2018) in their preliminary study examining the feasibility of probing molecular genetic variables corresponding to individual differences in personality and linked social traits, in this case a variant of the promoter gene coding for the oxytocin receptor, and simultaneously surveying their real world behavior as reflected by the myriad different ways and purposes for which they used their phones over the course of the day.

The proliferation of touchscreen smartphones with software keyboards has, at least for the time being, tilted the balance of telecommunications in favor of typed rather than spoken messages (Shropshire 2015). Combined with the data provided by a phone’s accelerometer, gyroscope, and screen pressure sensors, keystroke dynamics can be used to build mathematical models of a person’s mood and cognition based only on how, and not what, they type.

Voice itself, of course, remains a valuable instrument for gaining insight into the speaker’s mood state, and will only continue to become more so as the tide eventually turns toward speech-based interactions with both intelligent voice assistants and other human users of connected devices. Using similar statistical modeling and machine learning techniques, the acoustic features of speech are just as well-suited for analysis as typing kinematics (Cummings and Schuller 2019).

As more and more computing comes to be offloaded from personal devices to Internet of Things (IoT) devices and the cloud, and ambient computing becomes the norm, we expect that techniques like keystroke analysis will be supplanted by speech meta-feature analysis, facial emotional recognition (for more information on FER software, see Chap. 3 by Wilhelm and Geiger in this book), and altogether novel passive mood sensing tools. For the present time, being aware of the increasing ubiquity of algorithms and their influence on data analytics, digital architectures and digital societies (Dixon-Román 2016), as well as mindful of the absence of a codified analog for the Hippocratic Oath in the current practice of artificial intelligence in medicine as well as other applications (Balthazar et al. 2018), we nevertheless stand to learn a great deal from leveraging currently used input methods to derive models for sensing users’ inner states.

2 Mobile Typing Kinematics

In the first known study of its kind, researchers from the University of Illinois at Chicago (UIC), the University of Michigan, the Politecnico di Milano, Tsinghua University and Sun Yat-sen University used passively obtained mobile keyboard usage metadata to predict changes in mood state with significant degrees of accuracy. The team recruited subjects from the Prechter Longitudinal Study of Bipolar Disorder at the University of Michigan as part of the BiAffect-PRIORI consortium for its pilot study based on an Android mobile keyboard and associated app. After winning the grand prize in the Mood Challenge supported by Apple and sponsored by the New Venture Fund of Robert Wood Johnson Foundation, UIC is currently conducting a full-scale study on the iOS platform using an app based on the open source ResearchKit mobile framework, enrolling both people with bipolar disorder as well as healthy controls from the general population.

The BiAffect study (https://www.biaffect.com/) involves the installation of a companion app containing a custom keyboard that is cosmetically similar to the stock system keyboard. The app includes mood surveys; self-rating scales; and active tasks such as a the go/no-go task and the trail-making test (part B) to measure reaction time, response inhibition, and set-shifting as part of executive functioning—all overlapping domains of cognition identified by Bourne and colleagues (2013) to be the most affected in bipolar disorder.

All data collected by the app and keyboard are first encrypted and then transmitted and stored on secure study servers; these were hosted at UIC for the Android pilot app, whereas study management services are being supported by Sage Bionetworks for the ongoing iOS study with the data being hosted on their Synapse platform. The Android pilot phase, which has concluded data collection, involved the keyboard, trail making test, Hamilton Depression Rating Scale (HDRS), Young Mania Rating Scale (YMRS), and slider-based daily self-rating scales for mood, energy, impulsiveness, and speed of thoughts; the main iOS study included each of these [with the notable substitution of the clinician-rated HDRS and YMRS with the self-reported Patient Health Questionnaire (PHQ) and the Altman Mania Rating Scale, respectively] as well as a daily self-rating scale querying ability to focus, and the aforementioned reaction time task. Metadata collected for keyboard usage include timestamps associated with each keystroke, residence time on each key, intervals between successive keystrokes, and accelerometer readings over the course of all active typing sessions. The actual character corresponding to any given keypress is not recorded, apart from noting whether it was a backspace, alphanumeric, or symbol key. In addition to backspace usage, instances of autocorrection and autosuggestion invocations are also logged.

Table 13.1 summarizes the literature that has been published thus far based on analyses of data collected during the pilot phase of the study, which included 40 participants—between 9 and 20 of whose data were used for any given one depending on the number of days of metadata logged, diagnosis of the participant, and other requirements; up to 1,374,547 keystrokes and 14,237,503 accelerometer readings across 37,647 sessions were incorporated into some of the resulting models. Data collection for the main arm of the study is ongoing and has already resulted in over 8000 cumulative hours of active typing sessions culled from across hundreds of users.

Table 13.1 A summary of analyses published by researchers using data from the BiAffect study

Full size table

Zulueta and colleagues (2018) built mixed-effects linear models to correlate keyboard activity metadata during the week preceding when each pair of mood rating scales was administered to the corresponding HDRS and YMRS scores. A representative sampling of these metadata over several weeks from one study participant is illustrated in Fig. 13.1, while Fig. 13.2 compares the scores predicted by these models against actual scores for both mood scales. Autocorrect rates were positively correlated with depression scores, probably because error-awareness becomes impaired when depressed (Fig. 13.3a). Backspace usage rate was found to be negatively correlated with higher mania scores, possibly because it is reflective of decreased self-monitoring and impaired response inhibition (Fig. 13.3b). Accelerometer activity was positively correlated with both depression and mania scores, possibly because study subjects were experiencing depression with mixed features or agitated/irritable depression. The trail making test, which consists of circles with alternating consecutive numbers and/or letters that respondents are directed to connect in the correct order, is a standard neuropsychological assessment that measures processing speed and task-switching, which are both good indicators of cognitive functioning; Fig. 13.4 shows how typing kinematics data were just as predictive as trail making test results at establishing cognitive ability.

Stange et al. (2018) took a different approach by constructing multilevel models based on instability metrics calculated for EMA ratings and daily typing speeds (Fig. 13.5) using the root mean square of the successive differences (rMSSD)—a time-domain measure that takes into account the magnitude, frequency, and temporal order of intra-user fluctuations (Ebner-Priemer et al. 2009). Greater instability in baseline mood EMA ratings was significantly predictive of elevated future symptoms of both depression (Fig. 13.6a) and mania, whereas instability in energy ratings was predictive of future mania but not depression; other affective EMA ratings were not found to be significantly predictive of either. Typing speed instability was predictive of elevated prospective symptoms of depression (Fig. 13.6b) but not of mania. Interestingly, as little as one week of data provided levels of predictiveness comparable to data collected over durations of time longer than 5–7 days, perhaps because this time period is a representative enough snapshot to capture day-to-day typing variability (Fig. 13.7). Turakhia and colleagues (2019) have subsequently gone on to demonstrate the feasibility of exploiting variability in similar irregular noncontinuous datastreams to identify, predict, and prevent potential serious episodes—atrial flutters and fibrillations in the case of their app- and wearable-based study on cardiac arrhythmia.

Cao and colleagues (2017) were among the first to model keystroke dynamics data using deep learning. Their method, DeepMood, consisted of comparing the predictive performance of a multi-view machine layer architecture (Fig. 13.8) to that of other late fusion approaches such as factorization and conventional fully connected layers as well as early fusion strategies like tree boosting systems, linear support vector machines, and logistic ridge regression models. For the uninitiated, a review on current applications of deep neural networks in the field of psychiatry by Durstewitz et al. (2019) may serve as a primer. DeepMood’s early fusion approaches align each of the data views—alphanumeric characters, special characters, and accelerometer values—with their associated timestamps (Fig. 13.9), and then immediately concatenate the multi-view time series per session. However, this does not take into proper account unaligned features in certain views, such as special characters, that do not have corresponding data points from other views like acceleration or inter-key distance. This shortcoming is addressed by the late fusion approach, in which each of the multi-view series is first modeled separately by a recurrent neural network (RNN), and then fused in the next stage by analyzing first-, second-, and third-order interactions between each view’s output vectors. Cao and colleagues established that their late fusion approach significantly outperformed early fusion in the ability to predict mood disturbances and their severity (Fig. 13.10), with the multi-view machines demonstrating the highest rate of accuracy at 90.31% followed by the factorization machines at 90.21%.

In a subsequent analysis, Huang et al. (2018) found that an early fusion approach integrating both convolutional and recurrent deep architectures and incorporating users’ circadian rhythms allowed their model, dpMood, to attain even greater predictive performance as well as make more precise personalized mood predictions that took into fuller account an individual’s biological clock and unique typing patterns. Their approach consisted of using convolutional neural networks (CNNs) that focused on temporal dynamics to analyze local features in typing kinematics over small periods of time, in conjunction with a special type of RNN called a gated recurrent unit (GRU) to model longer-term time-related dynamics (Fig. 13.11). GRUs address the vanishing gradient problem—the inherent inability of simpler RNNs to effectively learn those parameters that only cause very small changes in the neural network’s output—and moreover have fewer parameters than comparable ameliorative approaches, allowing them to perform better on smaller datasets (Cho et al. 2014) such as the keystroke kinematics collected by BiAffect. This early fusion approach allowed for the alignment of features from multiple views to include additional information about temporal relationships between these data points that would otherwise be lost in late fusion models. In the final analysis, the proposed dpMood architecture with the best predictive performance and the lowest regression error rate was the one that made combined use of both CNNs and RNNs to learn local patterns as well as temporal dependencies, learned each user’s individual circadian rhythm, and retained accelerometer values that had no contemporaneous alphanumeric keypresses by filling the unaligned alphanumeric features with zero values instead of dropping unaligned accelerometer values altogether. Accelerometric and time-based analyses elucidated both daily (Figs. 13.12 and 13.13) and hourly (Fig. 13.14) variations in keyboard use, with the notably smaller Z-axis accelerations that help pinpoint when a phone is being typed on from a supine position having been observed more predominantly in the evenings (Fig. 13.14c) and on weekends (Fig. 13.13d). Modeling individuals’ circadian rhythms as a sine function with parameters automatically learned by gradient descent algorithms and backpropagation resulted in one of these parameters conspicuously clustering based on the subjects’ diagnoses, permitting dpMood to successfully classify users as participants with bipolar I disorder, those with bipolar II disorder, or healthy controls (Fig. 13.15). These sophisticated techniques can combine to provide extraordinarily insightful mood-sensing tools to users and precision medicine practitioners alike.

Preliminary analysis of study participants’ performance on the go/no-go task has indicated that reaction times vary both within and between individuals (Fig. 13.16a) as well as continue to change over time (Fig. 13.16b); variations in daily typing patterns in BiAffect users have been found to correlate with their performance on the go/no-go task, and concurrent analyses of both data streams are now under way to examine their interrelationships and interactions with mood and cognition as well.

Vesel and colleagues (2020) investigated the effects of mood, age, and diurnal patterns on intraindividual variability (IIV) in typing behaviors recorded in the iOS dataset, correlated against participants’ responses to the PHQ. Interkey delay (IKD) was calculated as the time difference between 2 consecutive keypresses and analysis was restricted to only IKDs between character-to-character keypress events; typing speed of a session was operationally inferred using the median IKD of that session. Typing variability at the session level was quantified using the median absolute deviance of IKDs. Typing mode (the use of one or two hands when typing) was classified using a novel approach utilizing linear regression. Growth curve mixed-effects (multilevel) models were established using maximum likelihood fitting to examine dependent variables of session-level typing speed, typing variability, typing accuracy, and session duration and their relationship to other session-level features and demographics (Fig. 13.17).

It was established that typing speed exhibits slowing with age, while pausing between typing and variability in typing speed increase with age. The relationship between keystroke dynamics features and mood was supported by the significantly higher variability in IKDs observed with more severe depression, consistent with reported findings of higher IIV in task performance in mood disorders. Typing accuracy, as encoded using session-level autocorrect rates, was also found to decrease in more depressed individuals. Finally, sessions corresponding to elevated depressive symptoms were found to be shorter in duration, suggesting a decrease in smartphone keyboard use during more severe depression.

Ross et al. (2021) evaluated the efficacy of using smartphone typing dynamics along with mood scores in cognitive assessment as an adjunct to formal in-person neuropsychological assessments through trail making tests. In addition to using the Android pilot app keyboard, participants were administered the pencil-and-paper version of the trail-making test, part B (pTMT-B) at the beginning and end of the study, as well as completed digital TMT-Bs (dTMT-B) throughout the study on their smartphones, and responded to the Hamilton Depression Rating Scale (HDRS) and Young Mania Rating Scale (YMRS) over the course of weekly phone interviews. For analysis, time windows were selected such that each consisted of one dTMT-B, one HDRS-17 score, and multiple keypresses, as shown in Fig. 13.18.

Intraclass correlations between the digital and paper-based forms of TMT-B were calculated to assess the consistency between both modalities. Comparison of the first dTMT-B to paper TMT-B showed adequate reliability. Longitudinal mixed-effects models were then used to analyze daily dTMT-B performance as a function of typing and mood. Participants who typed slower were observed to take longer to complete dTMT-B. This trend was also seen in individual fluctuations in typing speed and dTMT-B performance (Fig. 13.19). Moreover, participants who were more depressed completed the dTMT-B slower than less depressed participants (Fig. 13.20).

Depression severity was associated with the dTMT-B time at both the inter- and intrasubject level. Participants who were more depressed completed dTMT-B more slowly than participants who were not depressed. Typing speed was also associated with the dTMT-B at both inter- and intrasubject levels. Faster typists completed the dTMT-B more quickly than slower typists. Participants’ individual fluctuations in typing speed reflected their fluctuations in dTMT-B over the course of the study. A diagnosis of bipolar disorder was found to be a significant predictor of dTMT-B completion time, after controlling for depression score and typing speed.

Zulueta et al. (2021) analyzed participants’ responses to the Mood Disorders Questionnaire (MDQ) and self-reported birth year against Features derived from the smartphone kinematics, which were used to train random forest regression models to predict age. Data were split into training and validation sets (75:25). Two random forest regression models were trained using the caret and randomForest packages for R. The mtry value which minimized the Root Mean Square Error (RMSE) was selected as the value used in the final models. The models were constructed in a stepwise fashion with the first model including only typing related features, and the second model included all features from the first with the addition of gender and MDQ screening status. Each model's performance was assessed using the validation set to calculate RMSE, Breiman's pseudo R-squared, and median absolute error. Differences in model performance testing were assessed using paired Wilcoxon tests of their absolute errors. Feature importance was assessed using out-of-bag changes in Mean Square Error (MSE). Accumulated Local Effects plots (ALE Plots) were constructed for features which appeared important or interesting. Differences within model performance between participants based on MDQ screen status were assessed using Wilcoxon tests comparing raw prediction error scores and absolute prediction error scores.

Compared to participants with positive MDQ screens, participants with negative screens had a lower rate of reporting a diagnosis of bipolar disorder, a higher rate of reporting no history of bipolar disorder, and also provided no diagnosis history at a lower rate. The participants with negative screens tended to have lower MDQ scores comparted to those with positive screens and have a greater total number of keypresses. Plots A–D of Fig. 13.21 depict the ALE plots of four of the most important features: the median of mean interkey times, the mean session length, the sample entropy of the backspace rate, and the mean backspace rate. Many of the most important features are different summaries of the same essential feature (e.g., interkey time). Based on these plots, increased interkey time and session length are both generally associated with increased age; whereas, increased sample entropy of the backspace rate is associated with younger age, and the association between age and the mean backspace rate is not monotonic. Plots E and F of Fig. 13.21 depict the interaction between the median of mean interkey times and the mean session length and between the mean backspace rate and the sample entropy of the backspace rate, respectively. In these plots, it is observed that the existence and directionality of linear trends between the predicted age and these features both depend on the range of a second associated feature, highlighting the complexity of the relationship between typing behaviors and predicted age.

The tendency to underestimate the chronological age of participants screening negative for bipolar disorder compared to those screening positive is consistent with the finding that bipolar disorder may be associated with brain changes that could reflect pathological aging. This interesting result could also reflect that those who screen negative for bipolar disorder and who engaged in the study were more likely to have higher premorbid functioning. This work demonstrates that age-related changes may be detected via a passive smartphone kinematics based digital biomarker.

3 Speech Dynamics

Research on keystroke kinematics was inspired by the work of colleagues at the University of Michigan’s Heinz C. Prechter Bipolar Research Program on the Predicting Individual Outcomes for Rapid Intervention (PRIORI) project, which is based on analyzing voice patterns in participants enrolled in the longest longitudinal research study of bipolar disorder; BiAffect aims to infer mood from typing metadata just as PRIORI does from the acoustic meta-features of speech. Participants were enrolled in the PRIORI study for an average of 16 to 48 weeks and were provided a rooted Android smartphone with a preinstalled secure recording application that captured audio of the participant’s end of every phone call. Study staff called participants weekly to administer HDRS and YMRS mood assessments; these calls were labeled separately from personal calls. The dataset has accumulated over 52,000 recorded calls totaling above 4,000 h of speech from 51 participants with bipolar disorder and 9 healthy controls.

Karam et al. (2014) used a support vector machine (SVM) classifier to perform participant-independent modeling of segment- and low-level features extracted by the openSMILE audio signal processing toolkit, and were able to separate euthymic speech from hypomanic and depressed speech using an average of 5 to 8 judiciously selected features. In a later study, Gideon et al. (2016) used a declipping algorithm to approximate the original audio signal, and performed noise-robust segmentation to improve inter-device audio recording comparability. Rhythm features were classified using multi-task SVM analysis, then transformed into call-level features, and finally Z-normalized either globally or individually by subject. Declipping and SVM classification was found to increase the performance of manic but not depressive predictiveness, whereas segmentation and normalization significantly increased both. Khorram et al. (2016) captured subject-specific mood variations using i-vectors, and utilized a speaker-dependent SVM to classify both these i-vectors as well as rhythm features. Fusion of the subject-specific model—using unlabeled personal calls—with a population-general system enabled significantly improved predictive performance for depressive symptoms compared to the earlier approach used by Gideon and colleagues (2016). Khorram et al. (2018) went on to develop an ‘in the wild’ emotion dataset collating valence and activation annotations made by human raters drawing only upon the acoustic characteristics, and not the spoken content, of recordings from both personal and assessment calls.

Ongoing analyses, confounding challenges, and proposed solutions related to voice analysis have been outlined in a concise review by the PRIORI team (McInnis et al. 2017); their current focus is to isolate elements in the speech signal that are most strongly correlated with incipient disturbances in mood, enabling the development of on-device analytical systems without compromising limited mobile phone battery life.

4 Future Directions

The eventual goal of these projects is to be able to generate an early warning signal when changes in users’ patterns of typing, speech, and behavior identify them to be at risk for an imminent manic or depressive episode. This would allow for just-in-time adaptive interventions that can circumvent or at least minimize the acuteness of the episode and any resulting cases of hospitalization, medication adjustment, or self-harm (Rabbi et al. 2019).

It has not escaped our attention that these passive sensing techniques can have applications in conditions other than bipolar disorder and indeed beyond just mood disorders; we have been investigating the use of a voice-enabled intelligent agent that are responsive to users’ mood in order to provide emotionally aware education and guidance to patients with comorbid diabetes and depression (Ajilore 2018), as well as exploring the effectiveness of keystroke dynamics modeling in disparate conditions ranging from neurodegenerative processes such as Alzheimer’s disease to cirrhotic sequelae such as hepatic encephalopathy.

The BiAffect keyboard has not only proven extremely adept at enabling digital phenotyping of its users’ affective and cognitive states, but is also sensitive enough to their unique typing patterns that it can serve as an effective behavior-based biometric user identification and authentication tool. Sun et al. (2017) created DeepService, a multi-view multi-class deep learning method which is able to use data collected by the BiAffect keyboard to identify users with an accuracy rate of over 93% without using any cookies or account information. Until recently, the use of keystroke kinematics in hardware personal computer keyboards had been limited to similar continuous authentication applications, but physical keyboard sensing techniques are now expanding in scope to include identifying and measuring digital biomarkers as well (Samzelius 2016).

Mindful of the myriad potential concerns related to user privacy, data security and ethical implications inherent in the mass development and deployment of such applications, as well as in drawing conclusions based on findings generated using a relatively small number of smartphone users from a handful of geographic regions (Lovatt and Holmes 2017; Martinez-Martin and Kreitmair 2018), and remaining particularly cognizant of the clinical imperative to only use those methods informed by established transtheoretical frameworks—the overarching lack of which may have led to the current replication crisis in psychology and the medical sciences (Muthukrishna and Henrich 2019)—the research teams investigating BiAffect data streams have endeavored to adopt a deliberately paced approach that harmonizes the latest developments in cognitive science, psychological theory, nosology, and treatment with state-of-the-art deep learning techniques and statistical methods. By paying close attention to safeguarding the individual privacy and protected health information of its users, and by adopting the most transparent possible model of sharing research techniques and findings in order to prioritize the use of digital phenotyping data for ethical medical applications, the BiAffect platform has been built on the twin paradigms of open source and open science as an invitation to collaborators from around the world to replicate, validate, amend or correct our hypotheses.

Perhaps one day we will all sport brain scanning ski caps that tell us how we feel, and install BCI implants to communicate wordlessly with our gadgets and with one another, while our IoT devices infer our emotions by analyzing our behavior at a distance; in the meantime, there is already no dearth of data streams readily available for passively mining users’ mood, cognition, and much more with greater preservation of privacy and potential for predictiveness.

References

Ajilore O (2018) A voice-enabled diabetes self-management program that addresses mood—The DiaBetty experience. In: American Diabetes Association’s 78th Scientific Sessions, Orlando, FL, USA
Google Scholar
Ajilore O, Vizueta N, Walshaw P et al (2015) Connectome signatures of neurocognitive abnormalities in euthymic bipolar I disorder. J Psychiatr Res 68:37–44
Article Google Scholar
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders (DSM-5^®). American Psychiatric Publishing, Arlington, VA, USA
Book Google Scholar
Anderson K, Burford O, Emmerton L (2016) Mobile health apps to facilitate self-care: a qualitative study of user experiences. PLoS ONE 11(5):e0156164
Article Google Scholar
Andreassen O, Houenou J, Duchesnay E et al (2018) 121. Biological insight from large-scale studies of bipolar disorder with multi-modal imaging and genomics. Biol Psychiatry 83(9):S49–S50
Google Scholar
Asselbergs J, Ruwaard J, Ejdys M et al (2016) Mobile phone-based unobtrusive ecological momentary assessment of day-to-day mood: an explorative study. J Med Internet Res 18(3):e72
Article Google Scholar
Avunjian N (2018) ‘Westworld’ cognition cowboy hats are a step up from a real science tool (inverse). USC Leonard Davis School of Gerontology. Retrieved from http://gero.usc.edu/2018/06/20/westworld-cognition-cowboy-hats-are-a-step-up-from-a-real-science-tool-inverse/
Balthazar P, Harri P, Prater A et al (2018) Protecting your patients’ interests in the era of big data, artificial intelligence, and predictive analytics. J Am College Radiol 15(3, Part B):580–586
Google Scholar
Banks IM (2002) Look to windward. Simon and Schuster
Google Scholar
Banks IM (2010) Surface detail. Orbit
Google Scholar
Bourne C, Aydemir Ö, Balanzá-Martínez V et al (2013) Neuropsychological testing of cognitive impairment in euthymic bipolar disorder: an individual patient data meta-analysis. Acta Psychiatr Scand 128(3):149–162
Article Google Scholar
Canhoto AI, Arp S (2017) Exploring the factors that support adoption and sustained use of health and fitness wearables. J Mark Manag 33(1–2):32–60
Article Google Scholar
Cao B, Zheng L, Zhang C et al (2017) Deepmood: modeling mobile phone typing dynamics for mood detection. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 747–755
Google Scholar
Cho K, van Merrienboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. eprint arXiv:14061078:arXiv:1406.1078
Chung JE, Joo HR, Fan JL et al (2018) High-density, long-lasting, and multi-region electrophysiological recordings using polymer electrode arrays. bioRxiv:242693
Google Scholar
Clifford C (2017) This former Google[X] exec is building a high-tech hat that she says will make telepathy possible in 8 years. This former Google[X] exec is building a high-tech hat that she says will make telepathy possible in 8 years. Retrieved from https://www.cnbc.com/2017/07/07/this-inventor-is-developing-technology-that-could-enable-telepathy.html
Cummings N, Schuller BW (2019) Advances in computational speech analysis for mobile sensing. In: Baumeister H, Montag C (eds) Mobile sensing and psychoinformatics. Springer, Berlin, pp x–x
Google Scholar
Dixon-Román E (2016) Algo-Ritmo: more-than-human performative acts and the racializing assemblages of algorithmic architectures. cultural studies . Critical Methodol 16(5):482–490
Google Scholar
Durstewitz D, Koppe G, Meyer-Lindenberg A (2019) Deep neural networks in psychiatry. Molecular Psychiatry
Google Scholar
Ebner-Priemer UW, Eid M, Kleindienst N et al (2009) Analytic strategies for understanding affective (in)stability and other dynamic processes in psychopathology. J Abnorm Psychol 118(1):195–202
Article Google Scholar
Ebner-Priemer UW, Trull TJ (2009) Ecological momentary assessment of mood disorders and mood dysregulation. Psychol Assess 21(4):463–475
Article Google Scholar
Feng CH (2018) How a smartwatch literally saved this man’s life and why he wants more people to wear one. South China Morning Post. Retrieved from https://www.scmp.com/lifestyle/health-wellness/article/2145681/how-apple-watch-literally-saved-mans-life-and-why-he-wants
Fu T-M, Hong G, Zhou T et al (2016) Stable long-term chronic brain mapping at the single-neuron level. Nat Methods 13:875
Article Google Scholar
Gideon J, Provost EM, McInnis M (20–25 March 2016) (2016) Mood state prediction from speech of varying acoustic quality for individuals with bipolar disorder. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2359–2363
Google Scholar
Global Burden of Disease Collaborative Network (2017) Global Burden of Disease Study 2016 (GBD 2016) Results. Institute for Health Metrics and Evaluation (IHME) Seattle, United States
Google Scholar
Hou L, Bergen SE, Akula N et al (2016) Genome-wide association study of 40,000 individuals identifies two novel loci associated with bipolar disorder. Hum Mol Genet 25(15):3383–3394
Article Google Scholar
Huang H, Cao B, Yu PS et al (2018) dpMood: exploiting local and periodic typing dynamics for personalized mood prediction. In: Paper presented at the IEEE international conference on data mining
Google Scholar
Ikeda M, Takahashi A, Kamatani Y et al (2017) A genome-wide association study identifies two novel susceptibility loci and trans population polygenicity associated with bipolar disorder. Mol Psychiatry 23:639
Article Google Scholar
Jepsen ML, Open Water Internet Inc (2017) Optical imaging of diffuse medium. U.S. Patent No. 9,730,649
Google Scholar
Karam ZN, Provost EM, Singh S et al (4–9 May 2014) (2014) Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4858–4862
Google Scholar
Khorram S, Gideon J, McInnis MG et al (2016) Recognition of depression in bipolar disorder: leveraging cohort and person-specific knowledge. In: INTERSPEECH
Google Scholar
Khorram S, Jaiswal M, Gideon J et al (2018) The PRIORI emotion dataset: linking mood to emotion detected in-the-wild. ArXiv e-prints
Google Scholar
Kubiak T, Smyth JM (2019) Connecting domains—ecological momentary assessment in a mobile sensing framework. In: Baumeister H, Montag C (eds) Mobile sensing and psychoinformatics. Springer, Berlin, pp x–x
Google Scholar
Leow A, Ajilore O, Zhan L et al (2013) Impaired inter-hemispheric integration in bipolar disorder revealed with brain network analyses. Biol Psychiat 73(2):183–193
Article Google Scholar
Lovatt M, Holmes J (2017) Digital phenotyping and sociological perspectives in a Brave New World. Addiction (abingdon, England) 112(7):1286–1289
Article Google Scholar
Martinez-Martin N, Kreitmair K (2018) Ethical issues for direct-to-consumer digital psychotherapy apps: addressing accountability, data protection, and consent. JMIR Mental Health 5(2):e32–e32
Article Google Scholar
McInnis M, Gideon J, Mower Provost E (2017) Digital phenotyping in bipolar disorder. Eur Neuropsychopharmacol 27:S440
Article Google Scholar
Messner E-M, Probst T, O´Rourke T et al (2019) mHealth applications: potentials, limitations, current quality and future directions. In: Baumeister H, Montag C (eds) Mobile sensing and psychoinformatics. Springer, Berlin, pp x–x
Google Scholar
Montag C, Markowetz A, Blaszkiewicz K et al (2017) Facebook usage on smartphones and gray matter volume of the nucleus accumbens. Behav Brain Res 329:221–228
Article Google Scholar
Muthukrishna M, Henrich J (2019) A problem in theory. Nat Human Behav
Google Scholar
National Collaborating Centre for Mental Health (2018) Bipolar disorder: the NICE guideline on the assessment and management of bipolar disorder in adults, children and young people in primary and secondary care. In: British psychological society, pp 39–40
Google Scholar
Perlow J (2018) How Apple Watch saved my life. ZDNet. Retrieved from https://www.zdnet.com/article/how-apple-watch-saved-my-life/
Phillips ML, Kupfer DJ (2013) Bipolar disorder diagnosis: challenges and future directions. Lancet 381(9878):1663–1671
Article Google Scholar
Phillips ML, Ladouceur CD, Drevets WC (2008) A neural model of voluntary and automatic emotion regulation: implications for understanding the pathophysiology and neurodevelopment of bipolar disorder. Mol Psychiatry 13:833
Article Google Scholar
Rabbi M, Klasnja P, Choudhury T et al (2019) Optimizing mHealth interventions with a bandit. In: Baumeister H, Montag C (eds) Mobile sensing and psychoinformatics. Springer, Berlin, pp x–x
Google Scholar
Ross MK, Demos AP, Zulueta J et al (2021) Naturalistic smartphone keyboard typing reflects processing speed and executive function. Brain Behav 11(11):e2363
Article Google Scholar
Samzelius J, Neurametrix Inc (2016) System and method for continuous monitoring of central nervous system diseases. U.S. Patent No. 15,166,064
Google Scholar
Sanford K (2018) Will this “Neural Lace” brain implant help us compete with AI? Retrieved from http://nautil.us/blog/-will-this-neural-lace-brain-implant-help-us-compete-with-ai
Sariyska R, Rathner E-M, Baumeister H et al (2018) Feasibility of linking molecular genetic markers to real-world social network size tracked on smartphones. Front Neurosci 12(945)
Google Scholar
Shropshire C (2015) Americans prefer texting to talking, report says. Chicago Tribune. Retrieved from http://www.chicagotribune.com/business/ct-americans-texting-00327-biz-20150326-story.html
Stange JP, Zulueta J, Langenecker SA et al (2018) Let your fingers do the talking: passive typing instability predicts future mood outcomes. Bipolar Disord 20(3):285–288
Article Google Scholar
Steel Z, Marnane C, Iranpour C et al (2014) The global prevalence of common mental disorders: a systematic review and meta-analysis 1980–2013 43(2):476–493
Google Scholar
Sun L, Wang Y, Cao B et al (2017) Sequential keystroke behavioral biometrics for mobile user identification via multi-view deep learning. In: Paper presented at the joint European conference on machine learning and knowledge discovery in databases, November 01, 2017
Google Scholar
Turakhia MP (2018) Moving from big data to deep learning—the case of atrial fibrillation. JAMA Cardiol 3(5):371–372
Article Google Scholar
Turakhia MP, Desai M, Hedlin H et al (2019) Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: the Apple heart study. Am Heart J 207:66–75
Article Google Scholar
Vesel C, Rashidisabet H, Zulueta J et al (2020) Effects of mood and aging on keystroke dynamics metadata and their diurnal patterns in a large open-science sample: a BiAffect iOS study. J Am Med Inform Assoc 27(7):1007–1018
Article Google Scholar
Wolkenstein L, Bruchmuller K, Schmid P et al (2011) Misdiagnosing bipolar disorder—do clinicians show heuristic biases? J Affect Disorders 130(3):405–412
Article Google Scholar
Zulueta J, Demos AP, Vesel C et al (2021) The effects of bipolar disorder risk on a mobile phone keystroke dynamics based biomarker of brain age. Front Psychiatry 12(2284)
Google Scholar
Zulueta J, Piscitello A, Rasic M et al (2018) Predicting mood disturbance severity with mobile phone keystroke metadata: a BiAffect digital phenotyping study. J Med Internet Res 20(7):e241
Article Google Scholar

Download references

Acknowledgements

We are grateful to the Robert Wood Johnson Foundation, the Prechter Bipolar Research Fund, Apple, Luminary Labs, and Sage Bionetworks, all of whom have helped enable much of the research discussed in this chapter. Jonathan P. Stange was supported by grant K23MH112769 from NIMH.

Author information

Authors and Affiliations

Collaborative Neuroimaging Environment for Connectomics, University of Illinois, Chicago, USA
Faraz Hussain, John Zulueta, Mindy K. Ross, Claudia Vesel, Homa Rashidisabet, Olusola A. Ajilore & Alex Leow
Cognition and Affect Regulation Lab, University of Illinois, Chicago, USA
Jonathan P. Stange
University Neuropsychiatric Institute, University of Utah, Salt Lake City, UT, USA
Scott A. Langenecker
Heinz C. Prechter Bipolar Research Program, University of Michigan, Ann Arbor, MI, USA
Melvin G. McInnis
Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
Andrea Piscitello
Video Understanding Team, Applied Machine Learning Facebook, Menlo Park, CA, USA
Bokai Cao
Department of Computer Science, University of Illinois, Chicago, USA
He Huang & Philip S. Yu
College of Engineering, University of Illinois, Chicago, USA
Peter Nelson
Department of Psychology, University of Illinois, Chicago, USA
Alexander P. Demos

Authors

Faraz Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan P. Stange
View author publications
You can also search for this author in PubMed Google Scholar
Scott A. Langenecker
View author publications
You can also search for this author in PubMed Google Scholar
Melvin G. McInnis
View author publications
You can also search for this author in PubMed Google Scholar
John Zulueta
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Piscitello
View author publications
You can also search for this author in PubMed Google Scholar
Mindy K. Ross
View author publications
You can also search for this author in PubMed Google Scholar
Alexander P. Demos
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Vesel
View author publications
You can also search for this author in PubMed Google Scholar
Homa Rashidisabet
View author publications
You can also search for this author in PubMed Google Scholar
Bokai Cao
View author publications
You can also search for this author in PubMed Google Scholar
He Huang
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Peter Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Olusola A. Ajilore
View author publications
You can also search for this author in PubMed Google Scholar
Alex Leow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan P. Stange .

Editor information

Editors and Affiliations

Department of Molecular Psychology, Institute of Psychology and Education, Ulm University, Ulm, Germany
Christian Montag
Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, Ulm University, Ulm, Germany
Harald Baumeister

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hussain, F. et al. (2023). Passive Sensing of Affective and Cognitive Functioning in Mood Disorders by Analyzing Keystroke Kinematics and Speech Dynamics. In: Montag, C., Baumeister, H. (eds) Digital Phenotyping and Mobile Sensing. Studies in Neuroscience, Psychology and Behavioral Economics. Springer, Cham. https://doi.org/10.1007/978-3-030-98546-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-98546-2_13
Published: 23 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98545-5
Online ISBN: 978-3-030-98546-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Passive Sensing of Affective and Cognitive Functioning in Mood Disorders by Analyzing Keystroke Kinematics and Speech Dynamics

Abstract

Similar content being viewed by others