1 Introduction

Autism spectrum disorder (ASD) is a lifelong developmental disability and the patients always have some social and communicative impairments which include poor social eye contact, lack of shared enjoyment, little response to name, hard producing speech to communicate with others, and lack of using gestures. These impairments will furthermore lead to failure to develop peer relations appropriate in play and communicative with others. Most of school-aged children with autism struggle with peer interactions and friendships, which play a critical role in social learning across the lifespan.

Back in the 1990s, researchers found that intensive behavioral intervention will lead to significant improvement and lasting positive impact when a child is between 2 and 5-years-old (Maurice et al. 1996). And now lots of groups and organizations provide autism support services with more individual attention and specialized instruction which tailored to child’s individual needs. However, the current manual early intervention services are costly for the parents and teachers (Buescher et al. 2014). Moreover, the prevalence rates of ASD appear to be gradually increasing and have been reported as high as 1 in 68 in the United States (Baio 2014) and 11.8 per 10,000 individuals in china (Sun et al. 2013).

For this problem, many technologies have been developed to support the special education services for children with autism in recently years (Kientz et al. 2013). These technologies includes tablets (Hourcade et al. 2013; Chen 2012), construction toy with tangible user interfaces (Farr et al. 2010), virtual agents (Tartaro and Cassell 2008), and games (Bartoli et al. 2014). The great potential are shown that these multi-sensory technologies will greatly help the autism in there rehabilitation training (Par et al. 2005).

One of these technologies, motion-based games have been increasingly concerned because of the natural and effective interaction experiences (Parry et al. 2014; Chang et al. 2013; Bartoli et al. 2013). Chia firstly designed a Kinect-based virtual environment with virtual dolphins emphasized for children with autism (Chia and Li 2012). After that Greef K developed a Kinect-based game system, which were serving as a platform to study motor and social behavior in children with autism and to target specific skills like self-awareness, body schema, posture, communication, and imitation (De Greef et al. 2013). In their experimental sessions, either one player at a time or in the presence of an instructor or therapist, will deployed by evaluating the activities of the children in separate games. However, the mental activities, which is one of the critical factors in the rehabilitation training, rarely involved in those studies.

For this problem, electroencephalography (EEG) signals, which has been proved have demonstrated correlation with actual or imagined movements by the numerous studies (Lang et al. 1996), are considered in our research. Actually, lots of EEG-based systems are proposed and shown high level performance for tracking the mental event (Wolpaw et al. 2000; Gerwin et al. 2004; Lotte et al. 2007). Recently years, EEG based features are used in the classification of Passenger’s motion sickness level (Yu et al. 2010), hand movement (Akrami et al. 2005), foot movement (Yasunari and Junichi 2013), auditory and visual perception processes (Putze et al. 2014; Acqualagna et al. 2015), the intention in relevant sentence reading (Dong et al. 2015).

To further explore this concept, we focus on motion and mental based activities detection for the children with autism when they are playing games in the presence of peers and mediated by the teacher. While reducing manual intervention, the training characteristics of the children with autism should be considered. Even lack of the accuracy in movement, the activists should be encouraged.

In this paper, we proposed a novel activities classification frameworks for the children with autism in their rehabilitation training. The motion and EEG features are applied for the training. Two support vector machine are integrated to perform a frame-based classification procedure. The experiment results show their effectiveness both in movement and mental evaluation in children’s rehabilitation training.

Section 2 describes proposed activities classification frameworks, including experiment setting in Sub-Sect. 2.1, data recording in Sub-Sect. 2.2. Sub-Sects. 2.3 and 2.4 shown the overall structure of the proposed frameworks and the detail of the features extraction. The proposed methods and evaluation are shown in Sub-Sects. 2.5 and 2.6. In Sect. 3, details of the results of the experiment are presented. Finally, the conclusion is given in Sect. 4.

2 Activities classification

2.1 Experiment setting

The aim of this work is to explore the better evaluation considering the motion and mental activities synthetically. Therefore, a set of cost-effective multi-sensory system, which are explored as a platform for interactive activities, are setting in a classroom with Kinect and MindWave.

As shown in Fig. 1, a 60 inch digital television was installed in the front of classroom and connected to a desktop computer. Kinect and HD camera are set up on the top of the digital television to acquire motion signal and record the behaviors of the children with MindWave mobile as they interacted with the activities.

Fig. 1
figure 1

The experiment setting in the classroom

In the experiment, the digital TV will play an interactive version of a story, which include similar gestures, images, and soundtrack to the original. The player is encouraged to make specific movements or gestures to animate elements of the story on the screen. At specific moments, the story would pause and wait for the player to perform the target movement; when the system recognized, the animation would play and the story would proceed. All the motion and mental data of those target moments were recorded by the desktop PC and evaluated by the teachers.

2.2 Data recording

Data were recorded from five (two with autisms) children (9 ± 1.7 years old). Teacher give five movement orders which includes nod head, clap, akimbo, wave hand, jump. All the movement and mental data of the children in the experiments are recorded and then evaluated with a subjective scores by the teachers, which are used to train our system as annotations. The motion and mental data of the children with autism are little difference with that of the normal children’s in a very short time, but the movement accuracy is lower. This means that the same activities classification system can be used for the training of the normal and autism.

2.3 Overall structure

The overall system structure is composed of the following steps. Firstly, removing the noise from the raw signals of the skeleton and EEG by using the 1€ Filter (Casiez et al. 2012), which use a first order low-pass filter with an adaptive cutoff frequency and suitable for a high precision and responsiveness task of the event-driven system. Secondly, filtered signals are segmented at a constant size by framing. Furthermore, the motion/EEG features are obtained from the segmented signals. In the training stage, the SVM training produces support vectors for four classes of movement, which includes good movement, NG movement, positive engagement, and negative engagement. During the testing stage, the SVM model classifies the input feature vectors into the different movement or engagement modes. The training and classification system are shown in Fig. 2.

Fig. 2
figure 2

Block diagram of the activities classification system for the children with autism

2.4 Feature extraction

For the movement and engagement classifier task, the Kinect based motion and Mindwave based EEG features are considered. As the Motion and EEG recording was annotated per second by the teachers, each feature vector was computed based on one second of motion and EEG signals. Each second-wise annotation indicates both the movement situation (good or NG) and engagement situation (positive or negative).

2.4.1 Kinect based motion features

Kinect is a motion sensing input device for the video game console (Parry et al. 2014), which can estimate the positions of 20 anatomic landmarks (human skeleton) at a frequency of 30 Hz when it is placed around 2–5 m away. The skeleton is described by the following joints: head, neck, right shoulder, left shoulder, right elbow, left elbow, right hand, left hand, torso, right hip, left hip, right knee, left knee, right foot, and left foot.

While the Kinect was designed to recognize gestures in gaming applications, it has the capacity to determine the position of the center of specific joints, using a fixed and rather simple human skeleton. This allows for the measurement of joint motion.

To uniquely identify a vector, two angles are measured for each of these vectors.

$$\begin{aligned} \theta_{Y} = \cos^{ - 1} \left( {\frac{{v_{iy} }}{{\left| {v_{i} } \right|}}} \right) \hfill \\ \theta_{XZ} = \cos^{ - 1} \left( {\frac{{v_{ix} }}{{\left| {v_{i} } \right|}}} \right), \hfill \\ \end{aligned}$$
(1)

where θ Y is the angle between joint vector v i and positive Y-axis, v iy denotes the Y component of v i . θ YZ is the angle between the projection of the vector on XZ-plane and X-axis as the Fig. 3. Therefore, the movement classification model can be built by follow motion features:

Fig. 3
figure 3

Angle calculation for a vector v 2

  1. 1.

    Mean differentiation of joint angle;

  2. 2.

    Median differentiation of joint angle;

  3. 3.

    Standard deviation of the differentiation of joint angle.

In proposed method, a set of fourteen vectors from the twenty joint points are selected. Some parts of the body are not considered in calculation (such as head, wrists, and feet) as they contribute little to classifier the movement.

Each one-second period are divided into eight equal epochs. For each epoch there are 14 × 3 corresponding feature vectors are calculated. Concatenated the results to form a 336-dimensional feature vector for each second considered. These feature vectors were then collected, along with their corresponding annotations, to train our classification models.

2.4.2 MindWave based mental features

In clinical, 21 electrodes are used to identified five fundamental waves, but the cost of this kind of devices are always very high (Valer et al. 2007), which greatly restricted its application. Until recently, Neurosky Inc. developed a low-cost single-channel EEG device MindWave. This device offers well portability and flexibility, and suitable for the children’s mental data acquisition.

The signal includes the power spectrums of the eight basic EEG waves which including delta wave (0.5–2.75 Hz), theta wave (3.5–6.75 Hz), low-alpha wave (7.5–9.25 Hz), high-alpha wave (10–11.75 Hz), low-beta wave (13–16.75 Hz), high-beta wave (18–29.75 Hz), low-gamma wave (31–39.75 Hz), and mid-gamma wave (41–49.75 Hz). Therefore, the mental classification model can be built by follow EEG features:

  1. 1.

    Mean differentiation of basic waves;

  2. 2.

    Median differentiation of basic waves;

  3. 3.

    Standard deviation of the differentiation of basic waves.

For each second, 8 × 3 corresponding feature vectors are calculated. These feature vectors and their annotations were then collected to train our classification models.

2.5 Proposed methods

Two frameworks are considered in our experiments: I. The motion and EEG features are used for the training and classification of movement and engagement, respectively (Fig. 4a); II. The motion and EEG features are both used for the SVM training and classification of movement and engagement (Fig. 4b). For the framework I, we considering the motion and EEG features are suitable for the movement and engagement classifier task; furthermore, a classification model considered integrating the motion and EEG features are built in framework II.

Fig. 4
figure 4

The training and classifying system

2.6 Evaluation

In evaluation scheme, the classification was performed independently on the snoring sounds recorded by microphones, and its performance was compared in terms of sensitivity, specificity and accuracy:

$$\begin{aligned} Sen = \tfrac{TP}{TP + FN} \times 100\% \hfill \\ Spe = \tfrac{TN}{TN + FP} \times 100\% \hfill \\ Acc = \tfrac{TP + TN}{TP + FN + FP + FN} \times 100\% , \hfill \\ \end{aligned}$$
(2)

where TP, TN, FP and FN are the number of true positive, true negative, false positive and false negative classified segments, respectively. Note that TP and TN refer to the number of correctly classification movement segments and the number of correctly classified positive engagement segments, respectively.

3 Results

In order to gain adequate results from the data of five children (two with autism), proposed system is tested using four independent experiments, which include good movement (good for short), NG movement (NG for short), positive engagement (positive for short), and negative engagement (negative for short) tests. For the testing data, episodes of four kinds of events (movement, NG movement, positive engagement, and negative engagement) were randomly extracted and manually annotated by using each of the recordings, and the data from the other subjects were used for training.

The proposed approaches were conducted using the datasets shown in Table 1. The data length for each class is given in the table, and the length in each cell was converted into frames using a frame size of 1s, with a 50% overlap. The four experiments were performed independently. In addition, the training and testing data for each experiment were non-overlapped. The motion and EEG features for the experimental datasets were produced and applied to the SVM for training and classification.

Table 1 The information of experimental datasets

Motion has been recommended as the best feature for gesture detection (Ren et al. 2013; Nguyen and Le 2015). Therefore, several related works have used motion feature, and were able to obtain good performance. On the other hand, EEG feature shown the mental activity of the children when they are training. The confusion matrix is shown in Table 2.

Table 2 The results using the framework I

Furthermore, the integrated features are applied to the movement and engagement SVM for training and classification according to the framework II. The confusion matrix is shown in Table 3.

Table 3 The results using the framework II

For two frameworks, the most errors occurred between NG-positive and good-negative. Integrated features achieved 96.2 and 93.3% accuracy for the good-positive, NG-positive classification, while the accuracies for good-negative, and NG-negative were 94.06 and 97.40%, respectively. Moreover, it can be seen from Tables 2 and 3 that the integrated features outperformed in movement and engagement classification task. Specifically, the motion and EEG based classification rates were 89.2, 85.3, 79.06 and 90.4%, for the good-positive, NG-positive, good-negative, and NG-negative, respectively.

The classification results of three frameworks are shown in Table 4. Comparing the classifier performances for motion and EEG based features, it is evident that the performance of the framework II with the integrated features is superior in all experiments.

Table 4 The classification results of 3 features

4 Conclusion

In this study, a simple and efficient movement and mental classification system is present. Based on the motion and EEG feature and two SVM classifier, proposed system out perform the motion-based system. By using a low cost of the hardware platform, which can be installed in a children’s home, classroom, or clubs, the proposed system can be convenient to implement and allow the natural interface for the rehabilitation training of the children with autism. Future work includes the post-processing of movement episode detection, identification of key steps of the rehabilitation training process for the individuals. The motion features from autism patients to guide automatic analysis and training plan building. Moreover, other kinds of interactive technologies, such as speech recognition, will also be explored in the training system.