Keywords

1 Introduction

English Language Learners (ELLs) reported more anxiety over speaking than other language skills including reading, writing, or listening [9, 14] because Public Speaking Anxiety (PSA) known as social anxiety (e.g., being afraid of audience’ attention) [17] and Foreign Language Anxiety (FLA) (e.g., fear of making mistakes in using a foreign language) [1, 8, 11] are accompanied particularly during presentation performance. Even though ELLs struggle with these subtypes of speaking anxieties, many studies and educators focus on external properties in training [4, 6, 12, 15] rather than careful examination of discrete anxieties [2, 7] influencing performance. To improve performance, the ELLs need emotional clarity, which refers to abilities to identify the origins of emotions [3]. By clearly identifying and distinguishing speaking anxieties as the first step, they can determine emotional regulation strategies such as adapting to changing conditions to cope with it [16]. In this context, this study noted the potential to use physiological arousal of electrodermal activity (EDA), which is often considered as a biomarker to measure individual anxiety levels, [5, 13] in a way to support augmented emotional clarity of ELLs. The main research question of this study is “Can EDA features extracted from wearable sensors classify the main source of speaking anxiety (PSA and FLA) among English language learners during an oral presentation in English?”

2 Method

33 students (16 males, 17 female) with intermediate English proficiency were recruited from Speaking classes in the English Language Institute (ELI) at the University of Maryland, Baltimore County (UMBC). The participants were ranged in age from 19 to 43 (mean age \(\pm {5.67}\) years). The experimental protocol was approved by the university’s Institutional Review Board. The investigators took the presentation task from the ELI instructors to have an authentic experimental setting. To elicit a natural performance from participants, the location of an audio-video recording device was offset slightly to make the presenters less conscious of the camera and being recorded.

3 Analysis

As shown in Fig. 1 (a), we developed a framework of four sources of anxiety based on manual behavioral annotations of 33 audio-video recordings: eye contact linked to PSA more (P) as a social anxiety, the number of pauses and filler words (i.e. “um" and “ah") linked to FLA more (F), Both anxieties (B), and No anxiety (N).

Fig. 1.
figure 1

(a) Four anxiety framework referring to behavioral annotation data as: participant ID (ratio of eye contact (%), the number of pauses and filler words (%)). (b) Ten features of each phasic and tonic from EDA signal

The students were divided into two groups named Look (low PSA) and Not Look (high PSA) based on a 50% ratio of eye contact with the audience in the annotation. These two groups were divided into two subgroups again based on accumulated behavioral annotation on the number of pauses and filler words. These groups were labeled High pauses and filler words (high FLA) and Low pauses and filler words (low FLA). The reference percentage of dividing these coordinates was the 25%, which corresponded with the interviewees’ statements.

The EDA data collected from 33 participants underwent multiple cleaning and feature extraction steps. To reduce the severity of artifacts in EDA data, we used a smoothing method based on Hann function with a window size of 1 s. Once we removed artifacts in the EDA signal data, we used a range normalization function to normalize EDA data of all participants to mitigate the individual EDA signal differences between subjects and reduce bias. Once the data was cleaned, we extracted two sets of features from the EDA data. One set consisted of phasic and tonic components of one-dimensional EDA data, and the other set consisted of time-frequency (TF) and energy distribution extracted based on Hilbert Huang Transformation (HHT) method. The phasic and tonic features were further processed to extract mean, standard deviation, minimum and maximum values in a component, locations of minimum and maximum values, mean peak amplitudes, number of peaks, slope, and area under the curve as shown in Figure 1 (b). These EDA features were extracted based on a sliding window of 10 s with an overlap of five seconds that translates to 5326 windows. Similar to the phasic and tonic features, the TF features were extracted based on the same sliding window method.

3.1 Model Development

To understand the importance of different EDA features on ELL anxiety classification, we divide the datasets into multiple subsets based on features and labels. One subset of data consists of all features from Tonic and Phasic components of EDA signal, and time-frequency features from HHT. The other subsets include either tonic-phasic or HHT features. To classify ELL into one of the four anxieties framework,we adopted five machine learning algorithms: Decision Tree, Auto Multilayer Perceptron, Gradient Boosted Tree, Random Forest and Support Vector Machine. All the models are validated using a 10-fold cross-validation method that uses nine subsets of data for training and one for testing, and then it iterates until the algorithm predicts for all samples in a dataset. All the classification algorithms in this study are developed in the RapidMiner data science platform [10]. This study also focuses on identifying features that play a significant role in model prediction using LIME based feature importance method.

4 Result

The performance of each classifier is evaluated based on four metrics: Accuracy, Cohens Kappa, Recall, and Precision. Based on the comparison of these performance metrics between different classifiers on multiple datasets, gradient boosting algorithm outperformed other classifiers as shown in Table 1. Furthermore, we also developed binary classifiers to classify 18 ELLs with 2622 samples that belong to either PSA or FLA anxiety types. GBT classifier performed well in predicting ELL anxiety type based on different input feature sets. Table 1 shows that the performance of GBT classifier with all features (HHT + Phasic-Tonic) is the highest. Finally, we also extract the feature importance of both multiclass and binary class GBT model predictions with varying inputs based on a LIME method mentioned in the earlier section. The Table 2 shows the top three supporting features of each classifier.

Table 1. The performance of multi-class and binary class gradient boosting classifier on different feature inputs.
Table 2. Top three supporting features of a GBT algorithm on different data subsets based on a LIME method

5 Conclusion and Future Work

Our findings demonstrate the potential in using EDA to develop a classification model to identify subtypes of speaking anxiety (PSA and FLA). Our future work will focus on developing and evaluating an interactive education system where ELLs can identify their predominant speaking anxiety and apply it to emotional regulation strategies to cope with their anxiety.