Keywords

1 Introduction

Recently, wearable devices such as Apple Watch [19], smart glasses, and many others are coming to many peoples’ lives. Especially, after the Explorer Edition of Google Glass released, people are paying more attention to the ‘head-mounted’, ‘versatile’, and ‘augmented reality’ devices, such as Epson [20], Sony devices [21], and other manufactures are still working on many latest and innovative devices.

The wearable computers of smart glasses are able to detect voice commands, to identify the objects, to take pictures as well as to record video clips, etc. While wear-ing the smart glasses, the sensors such as Accelerometer, Gravity, Linear Acceleration, Gyroscope, and Rotation Vector, are able to produce logging data during various users’ activities. Meanwhile, wearable services are based on more reliable and insightful information for immediate interactions, Machine Learning analysis, personalized context-aware recommendations and activity based recognitions. Thus, wearable computers are gaining more attentions as possibly more natural, intuitive, hands-free, and user-friendly for our mobile experiences.

Our regular lives consist of many activities, such as working, exercising, socializing, shopping, sleeping, and so on. Contextual information of our behavior possibly provides us with more intelligent and suitable services in the future. Obviously, smart glasses bring users unique ‘head-mounted’ and ‘eye-wearing’ experiences, as well as fixed ‘on- body’ position of face wearing while exploring the external world, by comparing with using smartphones, which are often put in the pocket, backpack, or held in user’s hands. Thus, we currently focus on user’s personal sports and entertainment activities via an eye-wearing wearable computer, searching accuracy of real-time personal activity recognitions via Machine Learning approaches, and looking for how ‘head-tracking’ might matter in activity recognition, which are designed, evaluated, experimented, studied and discussed in our arguments.

Most early works are either lack of further discussions in ‘on-body’ characteristic, or ignoring the realistic usage of the smartphones; (e.g. jogging/walking with smartphones in hands/tied to the arm? It might be put in the pocket, backpack, or just personal preferences due to some conditions.) Furthermore, by considering user’s head motions, the results of accuracy and real-time reaction in activity recognition are achieved in the experiments. Based on our motivations to present the proposed system, shown in Fig. 1, we will anticipate that more context-aware wearable services are demanded in the coming future.

Fig. 1.
figure 1

The proposed system aims to classify user’s head motions to recognize activities according to wearable sensor data from the smart glasses.

2 Related Works

Earlier works have discussed about context-awareness and user’s mobile activity recognition on smartphones. More recent works also introduce using Accelerometer sensor and wearable computers to achieve some results of activity recognition.

2.1 Context Awareness

In the past decades, research works conducted to understand human context have been significantly achieved. They lead us into exploring the ways of communication between men and machines [3]. Context-aware applications in mobile computing help people’s lives intelligently and conveniently. The context-aware system can react to humans’ activities more dynamically and flexibly.

2.2 Mobile Phone Activity Logging

Some earlier research works look for user’s characteristic and behavior when a mobile device is either in use or standby. Especially for smartphones, data of sensors, processing, and communication could be analyzed, identified and utilized for more applications of activity recognition [1, 4]. Context gathering, learning, predicting, and monitoring analytically present solutions to the integration of context-aware mobile computing, cloud services and the user’s physical surroundings. Moreover, other works address mobile applications of smartphones to analyze user’s Accelerometer [810, 12]. However, data received by the smartphone depends on its position in the pocket, bag or hands. Smartphones are often rotated while locating in different places, directions and so on. Here we show a typical alpha rotation shown in Fig. 2 of the smartphone.

Fig. 2.
figure 2

An example of alpha rotation shows how smartphones might rotate with its coordinate frame.

2.3 Wearable Computer and Sensors

Research works on Google GlassFootnote 1 [2, 7] have shown the possibility of assistive use for patients. Besides, more other projects [11, 13, 14] are studying and discussing human motion through single ‘body-worn’ Accelerometer sensor, electrode sensors, or ‘armband’ and ‘backpack’ wearable array sensors. The works in [15] present experiments in recognizing human activities such as blinking and reading via the latest designed smart glasses. In our studies, other projects also talk about analyzing users’ behavior via Google Glass [5], and performing ‘lifelogging’ processes [6] to upload taken images and texts. Another work in [16] uses ‘textile integrated’ and ‘wearable sensor array’ to achieve the classification of human motions.

3 System Evaluation

In order to learn how smartphones might sense differently compared to smart glasses during the task of activity recognition, we prepare two hardware devices with separated software application to evaluate our assumption. The evaluated sensors on both devices include Accelerometer, Gravity, Linear Acceleration, Gyroscope, and Rotation Vector.

3.1 System and Software

We start to look into how smart devices like smartphones and smart glasses to perform activity recognition in the experiments. Our Android software, ‘Mobile4You’, is installed on the smartphone (Samsung S3 mini, Android 4.1) to collect logging data of sensors with the frequency of 20 Hz on each sensor, and each logging time is 20 s with 400 entries of recorded data. Another Android software is installed on the smart glasses (Google Glass, Explorer Edition XE 22), ‘Glass4You’, which is similar to ‘Mobile4You’ but with slight differences in the UI design and SDK implementations. The logging frequency is 5 Hz for each sensor, each logging time is 20 s, and total 100 entries of data are recorded on our smart glasses.

3.2 Experimented Sensor Values

Each sensor offers three values (x, y, and z-axis) and there are a total of 15 values of 5 sensors with 3 axes in each entry of data. For we target a near real-time response time, the first 5 s of data, called the first ‘data block’, is taken into our consideration. Since the first and second entry of data are missing some values of sensors possibly due to hardware warm-up behavior, we collect the entries from the 3rd entry until the entry of during a period of 5 s on both devices.

In our design, each ‘data block’ contains 5 s of entry and results in a series of mathematics expressions, which are sum (summation), mean (mean value), var (variance), max (maximum), and min (minimum) for each sensor with its axes shown in Table 1, where abbreviations show A (Accelerometer), G (Gravity), LA (Linear Acceleration), GS (Gyroscope), and R (Rotation Vector); and ‘all’ means the above 5 mathematics values.

Table 1. Data among 5 sensors and data for 5 s (‘data block’) are analysed, where smartphones (SP) and smart glasses (SG), C for coordinate axes of x, y, and z, MV as math values.

3.3 Activities and Experiments

We experiment four types of activity, which are Biking, Jogging, Movie Watching, and Video Gaming while carrying both devices separately. When carrying the smartphone for the four activities, we measure data of two subsets of behavior, which are Fixed Location (the smartphone is put in the pocket), and Random Location (the smartphone is put in the hand, backpack, or on the table, etc.). Meanwhile, the user performs all four activities while wearing the smart glasses in the experiments without specific subsets in any activity, shown in Table 2.

Table 2. Definitions of each activity and subsets performed in the experiments while carrying smartphone and smart glasses.

Each activity or subsets require collecting 60 times (20 s for each time) of data to be the training datasets. In addition, the users collect another 20 times of data to be the testing dataset. Eventually, data collected by the smartphone manifests 8 subsets in four activities, and data of smart glasses containing only four activities.

In the experiments, a Support Vector Machine framework, libsvm (released version 3.20) [17], is applied to our system in both training and prediction process. Besides, according to our system design, libsvm has not only been deployed to our server for the training, testing and modeling purpose, but also integrated into our Android program for the testing and prediction purpose.

4 System Analysis

Our proposed system is designed to serve as both real time services in data processing, classification and activity recognition, and batch processing services running at the backend servers, which deal with data computation and computed results of Feature Selection and F-Score in the training process. Moreover, Classification and Recognition Accuracy, and Time Consumption in Execution and Responsive Time are also measured in our work.

4.1 Feature Extraction

Feature Extraction is an important stage in our experiments for activity recognition. In order to learn how selected features may impact on the results of accuracy, the studies of feature strategy are conducted to understand them. Our strategy is based on Feature Dimension according to our presented works, and Feature Selection and F-Score of the dimensions in the system.

4.1.1 Feature Dimension

The current approach focuses on the dimensions of generated mathematics values in a ‘data block’, which has continuous data within 5 s from 5 sensors on smart glasses. Thus, a total number of 75 dimensions or vectors (5 sensors, 3 axes, 5 math values) are collected, shown as Eq. (1).

$$ \begin{aligned} fs_{t} = \left\{ {v_{0} ,v_{1} \ldots \ldots \ldots \ldots \ldots \ldots ,v_{n - 2} ,v_{n - 1} } \right\} \hfill \\ \hfill \\ \end{aligned} $$
(1)

where fs(t) means a Set of all feature dimensions or vectors containing math values of t seconds; n is the number of vectors.

4.1.2 Feature Selection and F-Score

Support Vector Machine (SVM) (Boser et al. 1992; Cortes and Vapnik 1995) methods are effective to classify data, but do not receive or select important features automatically to complete the classification tasks. In the experiments, the questions of what features are important and how many features should be selected to perform well in activity recognition have been raised. Therefore, the filters and thresholds of each sensor are considered to possibly identify and answer those questions.

Since some dimensions of data are possibly less effective or non-discriminative to classify the datasets, we exploit the technique of F-Score [18] to measure how important some features are, and they must be used to improve or maintain accuracy of recognition and eliminate unnecessary dimensions in the computation cost under certain thresholds, shown as Eq. (2) according to F-Score.

$$ F(i) \equiv \frac{{\left( {x_{i}^{( + )} - x_{i} } \right)^{2} + \left( {x_{i}^{( - )} - x_{i} } \right)^{2} }}{{\frac{1}{{n_{ + }^{ - 1} }}\sum\limits_{k = 1}^{{n_{ + } }} {\left( {x_{k,i}^{( + )} - x_{i} } \right)^{2} } + \frac{1}{{n_{ - }^{ - 1} }}\sum\limits_{k = 1}^{{n_{ - } }} {\left( {x_{k,i}^{( - )} - x_{i} } \right)^{2} } }} $$
(2)

The results of calculating F-Score show how effective some features are, and the ranking of features based on each F-Score provides us with more information of how features should be considered more, and filtered based on defined thresholds. Here we demonstrate the F-Score rankings of top 10 feature dimensions for activity recognition on smart glasses and smartphones, shown in Figs. 3(a, b), where S means which sensor, C means the coordinate axis, and MV means mathematics value.

Fig. 3.
figure 3

(a) Top 10 features with F-Score ranking on smart glasses, where GY means GS. (b) Top 10 features with F-score ranking on smartphone.

In Fig. 3(a) on smart glasses, we observe the top 3 features score much higher than the rest ones, which are var of ‘A y-axis’, var of ‘LA y-axis’, and var of ‘LA z-axis’. On the other hand, the scores of the smartphone in Fig. 3(b) seem more average among those top 10 selected features, which might hint us the importance of the fixed ‘on-body’ locations/positions is significant while users performing mobile activity recognition, instead of not fixed or randomly locating the device.

Furthermore, according to the smart glasses, we not only pay more attention to important features that should be watched, but other possible vector spaces that could be eliminated, and how those effective features should have been translated to identify head motions in the proposed activity experiments. Moreover, the question of how and why these features help in classifying proposed activities is raised.

4.2 Classification

In the experiment, the training model predicts all the testing four activities and builds the classifier. In order to look into how activities on sensors affect these features while wearing the smart glasses, we evaluate further information from experiments of two distinct categories, which could help us learn more informative results of head motions. Each category of ‘Sports’ (Biking vs. Jogging) and ‘Entertainment’ (Movie Watching vs. Video Gaming) discusses the specific features of head motions to separate the activities in each category. Here we take 10 s sensor data of each activity as our examples. From our analysis of sensor data in the ranking, we are able to identify the most effective features and head motions for classification quickly.

4.2.1 Biking Vs. Jogging

Our studies show that top scoring features are more effective in classifying activities. When we look into these features of these two activities. The activities show their specific characteristics in var from data distribution in A and LA, shown in Fig. 4. In Biking and Jogging, our data distribution suggests that the glass wearer’s head motions may shake more frequently while Jogging, where appears more var in ‘A, y-axis’ (Fig. 4(a)) movements. LA, of y and z-axis in Figs. 4(b, c) show in different velocities in the two activities. Meanwhile, smaller x-axis spins of ‘GS x-axis’ in Fig. 4(d) appear while Jogging is performed, compared to bigger x-axis spins of Biking.

Fig. 4.
figure 4

Sensor data of top features on smart glasses for Biking and Jogging (x-axis: 10 s, y-axis: value of entry data).

4.2.2 Movie Watching Vs. Video Gaming

When glass wearers perform Movie Watching and Video Gaming, they are more static than doing exercise. Often movements are ‘head-up’ as looking up and ‘head-down’ as looking down motions in our measurements. Our data shows that the important feature is ‘A z-axis’, in Fig. 5(a), helping to classify these 2 activities quickly. For instance, the glass wearer often performs head-up in Movie Watching (blue line of Fig. 5(a)) compared to ‘head-down’ in Video Gaming (green line of Fig. 5(a)).

Fig. 5.
figure 5

Sensor data of top features on smart glasses for Movie Watching and Video Gaming. (Color figure online)

We conclude that nodding behavior of head motions is identical to our ‘head-up’ following with ‘head-down’ immediately, which are more related to variations of ‘A z- axis’. The direction of the user’s sight is closely related to this feature.

Though we find that the glass wearer sometimes moves his/her head randomly while Movie Watching (possibly our laughing for the funny movies), this feature can still help to separate these two activities in most of the cases. In addition, some head motions in the feature of ‘A x-axis’ shown in Video Gaming activity in the experiments are possibly caused by our car racing game, which makes the user turns his/her head occasionally looking at left or right corner of the TV screen (green line of Fig. 5(b)).

4.2.3 Sports Vs. Entertainment

When the glass wearers perform either Sports or Entertainment activities, their characteristics in speed are more obvious, which could be measured by var of LA especially in the forward direction while wearers move ahead. Compared to Sports activities, Entertainment activities are more static and with no huge var observed in the feature dimensions of LA. Furthermore, behavior of head motions of ‘head-up’ and ‘head-down’ of ‘A z-axis’ could help to classify proposed two static activities in the experiments.

In Fig. 6, both Biking and Jogging cause user’s head shaking vertically in different var of ‘A, y-axis’, a smoother LA in Biking is measured, and GS rotates bigger spins in Biking than Jogging. In Fig. 7, the stretch (red) of ‘A, z-axis’ tells how high or down the user is facing and/or looking at. When the user turns to look at his/her left or right, the stretch of x-axis in A tells the angle of that turn.

Fig. 6.
figure 6

Sports head motions on smart glasses.

Fig. 7.
figure 7

Entertainment head motions on smart glasses.

4.3 Recognition Accuracy

Our goal targets high accuracy of activity recognition both on smart glasses (SG) and smartphones (SP). The experiments of activity recognition on smart glasses and the smartphone are conducted to observe the three actions. Thus, we locate the smartphone device with fixed position and random location while doing the proposed activities. On the other hand, activity recognition of smart glasses wearing is also performed.

In the experiments, we fix the locations and directions of the smartphone in the pocket, compared to random locations of that smartphone is put in the pocket, held in the hand, or located on the table. The results of activity recognition in three actions show that smart glasses wearing performs better than both fixed location as well as random locations of the smartphone, shown in Table 3. Therefore, more experiments on smart glasses are focused and studied.

Table 3. Activity recognition on the SP is compared to fixed locations (SP-fixed-loc), randomly located (SP-random), and SG as wearing.

In the experiments, we take different numbers of selected feature to study how accurate our activity recognition could be while wearing smart glasses. Currently, we experiment our activity recognition in two groups, A and B, which are based on the selection of top 5 scoring features and top 10 scoring features in the F-Score rankings. Besides, in order to look into how accurate of activity recognition in different ‘data block’, we take various portions of data to be our test cases, which are Window 1 (W1) as the first ‘data block’, Window 2 (W2) is the second ‘data block’, and Window 1–2 (W12) is the second half of W1 coming with the first half of W2, as results shown in Table 4.

Table 4. W1/W2/W12 on smart glasses are evaluated for activity recognition with the numbers of selected features (A and B are chosen groups as G).

In the comparisons of two groups, we observe that group B is slightly better than A in some cases. Besides, from the averages of recognition accuracy of W1/W12/W2 of four activities in two groups, we obtain the total average of these two groups around 87 %, as shown in Table 5.

Table 5. The averages of recognition accuracy in 4 activities on smart glasses with W1/W2/W12 are evaluated.

4.4 Time Consumption

The total time consumption in Execution Time consists of two major parts, which are the time spent on Data Pre-Processing and Classification, and the time spent in the rest parts of computation such as Warm-Up time of sensor and waiting for the first ‘data block’. Our analysis shows upon Classification to complete the recognition task, which means we assume the time will be spent since the beginning of Warm-Up, till either with or without Batch-Processing time (send data to the backend server). In addition, the testing of Responsive Time by considering ‘window size’ (the size of ‘data block’) and Overlapping shows the possibility of being more reactive in our proposed system.

4.4.1 Execution Time

Data in our experiments shows some values of sensor omitted in the first 2 entries. For that reason, we skip 0.4-s warm-up and continue for 5 s. The process of Data Pre-Processing to generate math values takes around 0.2-s. Following with the Classification process on the Android program, it takes around 0.45-s in average to complete the task of activity recognition with the training model on it. Thus, it is expected to complete activity recognition for the smart glasses within around 6.05 s since the beginning of the task.

While the training model is either not available or the newer one is needed, Batch- Processing will be required to transmit a ‘data block’ to the backend server for the process of Classification, which takes another 0.96-s to upload data in a WiFi (upload average speed of 5.56 Mbps) networking environment in order to be classified.

Therefore, the total execution time accumulates time of data processing and computation for Classification, as well as possible network communication time, which take either 6.05 s to be finished on the smart glasses, or additional Batch-Processing added to be finished in 7.01 s (Fig. 8).

Fig. 8.
figure 8

Classification of activity recognition is completed by the smart-phone, or more Batch-Processing is required.

4.4.2 Responsive Time

The experiments in verifying ‘window size’ and applying the Overlapping technique help to show the potentiality of being more reactive. The size of data is tested to see how activity recognition could perform. In order for considering all activities, our experiments show that accuracy in separating two activities, Movie Watching and Video Gaming, on smart glasses may reduce at this moment, when the size of data is smaller than 25 entries of 5 s, such that 15 or 20 collected entries drop average 7.05 % more failure in the experiments.

While performing experiments in our work, we verify the metrics of accuracy measurement for W1, W12 and W2. Overlapping technique could help the system react more quickly to achieve the result. Thus, the minimal Responsive Time on smart glasses for activity recognition is achieved in the proposed system, which could be less than 3 s (2.5 s) shown in Fig. 9.

Fig. 9.
figure 9

Responsive time of activity recognition is achieved by more frequent data computation of over-lapping.

5 Discussion

Our contribution aims to build a user-centric and real-time system to perform recognitions of user’s activity via sensing their head motions. By selecting top features from our ranking mechanisms, the system could quickly construct the behavioral patterns, learning model, and help to reduce the computation cost. We also demonstrate how special ‘head-tracking’ characteristics of each user among the experimented activities help to achieve the results effectively.

The current work includes four example activities performed by testing users during their casual time of sports and entertainment, although there are many other possible more activities in our daily life. However, we believe that our work is the beginning step to study and understand how user’s activity should be predicted and recognized in the continuous learning and analyzing environment, while eye-wearing wearable computers are conducted.

Due to the limitation of detected frequency of hardware sensors at present, our smartphone provides more amounts of data (20 Hz) than the smart glasses (5 Hz). However, we assumed more information of smartphone might give us better results while executing our feature selection and ranking process, but the outcomes seem to feedback the opposite answer. We anticipate the future design of smart glasses will come out with more accurate and refined sensors compared to the smartphone to help our new findings while efficient power consumption has been considered and resolved as well.

We also find that the most useful sensor data are among A (Accelerometer), LA (Linear Acceleration), and GS (Gyroscope) in our current proposed activities. However, there might be more possibilities and combinations of sensor data analysis beyond those three in our future experiments of any newly targeted activity. Moreover, we expect the eye-wearing wearable computers to understand and help users more, through a regular daily logging and life logging in our future experiments, as well as the future practical use by humans to bring us more research possibilities and living convenience.

6 Conclusion and Future Work

We study and analyze sensor data of both wearable computers and smartphones, and focus on how ‘head-tracking’ might provide more useful information for user’s activity recognition. By annotating and extracting features of user’s ‘head-mounted’ behavior in our proposed activities, the system shows high accuracy of activity recognition, user-centric and real-time classifications for human’s head motions.

Our future work will bring us to design more user-friendly and scalable wearable applications, more recognitions on complex activity, diverse input data of various types for Machine Learning analysis, and combinational services for on-going observations in furthermore contexts of glass wearing experiences. Applications from many personalized analyses to interesting wearable services are examples for our upcoming works.