Introduction

Pervasive healthcare is considered a promising technology to meet the challenge of demographic change [1]. The implementation of pervasive healthcare is becoming more achievable with recent advances in information and communication technology (ICT, [2]). At present, more efficient platforms in terms of both software and hardware, are available. With wearable or embedded sensors both physical parameters and environmental parameters can be measured in real-time. For those patients who do not need special care in hospital but are still at risk of relapse or complications, it is possible to observe their recovery progress under out-of-hospital conditions and the necessary notification can be transferred to the corresponding personnel in cases of emergency or critical change in health status. Studies have shown that older people exhibited a positive attitude towards devices and sensors that can be installed in their home in order to enhance their lives [35].

Changes in health status inevitably result in changes in the pattern of daily living. For instance, a study has confirmed that older people gradually lose the ability to perform activities of daily living when aging [6]. With pervasive healthcare, detecting resident’s daily movements and activities is achievable by installing sensors in the living environment. Thereby, behavior-related data are collected so that the living patterns can be extracted. Considering the issue of acceptance in real life conditions unobtrusive detection techniques are necessary. Unobtrusive detection is defined as methods that are acceptable and do not bring any inconvenience to the user’s normal life. There have been several research studies exploring the application of unobtrusive detection. The ongoing research usually makes use, for example of motion sensors, contact sensors and appliance sensors for human activity detection [711].

As for data processing and information extraction, supervised methods are often used. The supervised method firstly models the target through training data and then recognizes the new state or predicts the future trend. However, some issues are still open for real life application. With supervised methods, data labeling is likely to be inaccurate. It is also impractical to label complex and concurrent activities in real life situations. There are primarily three ways to label behavior data [12]: (1) manually done by the investigator, (2) asking the subject to label each activity during their daily living and (3) letting the subject perform certain activities in a predefined order. However, all of these are time-consuming and unacceptable for users in real life situations. The performance of the training model varies considerably as human performance varies between and within individuals [13, 14]. Additionally, the subject probably cannot conduct activities in a natural way, leading to the situation that the measured data probably cannot reflect the reality. Thus, unsupervised methods are necessary to extract human behavior patterns for practical applications. In contrast to supervised methods, unsupervised methods do not need data labeling and model training, which improves the applicability to real life situations. Because of the independence of training data, unsupervised methods are also more adaptable and suitable for different individuals.

In order to explore the feasibility of pervasive healthcare a variety of studies have been carried out for health assessment. Based on room occupation, Virone proposed the circadian activity rhythm (CAR) model to assess the behavior of older people [7]. The ORCATECH project has measured a few parameters, such as activity level, walking speed and room transition for different purposes, such as gait assessment and mental health assessment [10, 15, 16]. It has been shown that lifestyle regularity is associated with human health status. For instance, higher lifestyle regularity may prevent long-lasting bereavement-related depression [17]. The regularity of daily activities is related to sleep quality and the lifestyle regularity of the patients with Parkinson’s disease is impaired [18]. To our knowledge, however, there is no research on lifestyle regularity by using pervasive healthcare. Lifestyle regularity is expected to be measured via behavior patterns that can be extracted from a sensor-enhanced living environment. Assuming that regular lifestyle indicates a healthy status, this study intends to propose a systematic unsupervised approach to discover human behavior patterns from datasets obtained from sensor-enhanced living environments. Thus, it can be investigated how the health status can be reflected by lifestyle regularity. The contributions of this study are:

  • proposing unsupervised methods for information extraction from datasets measured in real life situations

  • exploring the relationship between lifestyle regularity and health status in the setting of pervasive healthcare.

Methods

Review of the GAL-NATARS study

The GAL project is intended to identify, enhance and evaluate new techniques of ICT in the design of environments for the elderly [19, 20]. As part of GAL, the GAL-NATARS study was a field study [21, 22]. Technology for sensor-enhanced living environments developed in the GAL project was installed in the subjects’ homes. The study collected two types of data: the subjects’ behavior related data were collected via the sensors installed in their homes and assessment data were collected via the weekly home visits by nurses. This means that, on the one hand, behavior patterns can be extracted from sensor data and on the other hand, assessment results represent health status. Therefore, the correlation between behavior patterns and health status can be investigated.

In the GAL-NATARS study all subjects are recruited from the patients who have been treated in one of three hospital geriatric departments for acute femoral fractures or rehabilitative completion of a femoral fracture. The main inclusion criteria are:

  • being treated with osteosynthesis or an endoprosthesis and participating in geriatric rehabilitation after hip fracture

  • full weight bearing

  • age ≥ 70 years

  • living alone (including assisted living) and within a 60 km radius of one of the geriatric hospitals

  • no pet with the resident

  • Mini-mental-state test  score ≥ 20.

Each subject is observed for approximately 3 months. During the observation period the subjects are visited weekly by a nurse and geriatric assessment batteries are conducted on each subject at four different times. Firstly, after the subject is discharged from hospital complete geriatric assessments are carried out (t 0), and these assessments are repeated at the end of the 3-month observation (t 3) period. A selected subset of these assessments is performed at monthly intervals (t 1 and t 2).

Unobtrusive sensors are installed in the subjects’ home to detect their movements within some specific zones and their interactions with some key objects. Presence sensors (motion sensors) are used for zone occupation; contact sensors are used for the action on the exits, vibration sensors are used in some furniture and current sensors are used for usage of some electrical appliances. An example of an apartment layout is presented in Fig. 1. The sensors are denoted by the boxes with different colors, e.g. greenbox indicates a presence sensor, yellow indicates a contact sensor and red is for a vibration sensor.

Fig. 1
figure 1

Example of an apartment layout with sensors used in this study

Processing of assessment results

The study used five assessments to assess subjects’ health status, including the Barthel index (BI), Tinetti I and II, timed up and go (TUG), short physical performance battery (SPPB) and visual analogue scale (VAS) to assess pain, of which BI, Tinetti I and II, TUG and SPPB are mobility-related. The VAS is designed to assess the subject’s feeling of pain. As these assessments have different scales, it is necessary to normalize them to a uniform scale. In the current work, they are normalized to a scale of 0–10. Additionally the four mobility-related assessments are integrated into a single parameter, m.val to represent the subject’s mobility. For the initial processing, the same weighting is assigned to each of the assessments.

$$ m.val = \frac{1}{4}TUG + \frac{1}{4}SPPB + \frac{1}{4}Tinetti + \frac{1}{4}BI $$
(1)

A higher m.val implies better performance of the subject’s mobility. Therefore, in the assessment result, two values indicate the subject’s health condition, i.e. the m.val indicating the mobility and the VAS indicating the perception of pain. From the visiting nurse’s points of view and the assessment results the subject health status was marked (Table 1).

Table 1 Subject health status

Data preprocessing

Constructing hierarchical time-lines

As is known, people usually experience a regular lifestyle. In other words, some activities are periodically carried out. Based on this assumption, in the current work the lifestyle regularity is adopted as the indicator to assess deviation in behavior. For the purpose of data preprocessing and measurement of lifestyle regularity, the time-line is seen as consisting of periods. To construct a hierarchical time-line structure, a number of continuous basic periods forms a multi-period. As some assessments are carried out every week, the basic period is set to be 1 day and the multi-period is set to be 1 week. Daily human behavior is distributed over the time-line and a certain set of activities is more likely to occur in particular time intervals. Taking the usual habits into account the basic period (24 h) can be further split into a number of segments. As illustrated in Fig. 2, 1 day is split into late night, morning, noon, afternoon, and evening and night. To more precisely present the behavior distribution, some segments are treated as the combination of subsegments.

Fig. 2
figure 2

Segmentation of 1 day (24 h) time-line

According to the aforementioned methods a hierarchical time-line has been established. As shown in Fig. 3, n basic periods P b construct a multi-period P mul and each P b consists of m segments. If all basic periods are placed in parallel, as shown in Fig. 3, the change of event distribution over a specific interval of all basic periods can clearly be shown by investigating the segments with the same index. If each of the basic periods of a multi-period are split into m segments in the same way, which means they have the same splitting points, the multi-period also has m segments.

Fig. 3
figure 3

The hierarchical time-line

Processing events

People usually perform some activities in a kind of rhythm. A certain set of activities tends to be performed in association with each other. For instance, if the resident enters the living room at night, the resident tends to turn on the lamp and the TV set, while the oven and the fridge are likely to be used when the resident is in the kitchen.

The activity can be broken down into a number of low level events, corresponding to simple sensor records. The event consists of two attributes, e.g. its event identifier (ID) and the time at which the event occurs. Considering the character of human activities and the sensors used to set up a sensor-enhanced environment, two kinds of events are defined based on time attribute, i.e., time-point-based events and time-interval-based events. The time-point-based event occurs at a time point and the duration is not important for information extraction and can be ignored; the time-interval-based event occurs over a certain time duration that can reflect variance of the event and cannot be ignored. Therefore, two types of events are extracted from sensor datasets.

When events are extracted, the sequence of sensor records is converted into an event sequence. As a low level representation of activity, the events also appear to emerge in groups; therefore, it is necessary to form a more structured dataset. For those events with higher correlation, the time distance is much shorter. To highlight the rhythm reflected by human behavior, the sequence of events is clustered based on a time gap constraint, which is defined as the time distance between two consecutive events.

It has to be pointed out that the event time order is not taken into account in the current work because of two points: firstly, the density of deployed sensors is not high enough to reliably detect the order of events that might reflect the resident’s mental status; secondly, according to the process of clustering event sequence, the time constraint is able to represent the correlation between the events included in one episode and the variance in time order cannot tell the difference in health status. As the time order is not taken into account, the events occurring in the same time interval are loaded into a bag. Consequently, the sequence of one period’s events is converted into a number of bags of events.

To sum up, the procedure of data preprocessing consists of firstly extracting events from raw sensor records and then clustering event sequences into episodes, which are further converted into bags. Based on the time-line splitting, the bag is assigned to a period and segment. The whole procedure is illustrated in Fig. 4.

Fig. 4
figure 4

The procedure of data preprocessing

Applying the apriori algorithm

If the subject conducts a certain activity regularly it is likely that the corresponding events tend to occur in the same time interval, say TI. In other words, a frequently occurring event set implies that the resident performs the corresponding activity in TI. It is reasonable to leverage these event sets to indicate typical activities within TI. With respect to a long-term duration, these evident event sets are able to indicate human behavior, i.e., human lifestyle. In the current work, to discover evident event sets a systematic method is proposed based on the apriori algorithm, by which the frequent items of a dataset can be discovered (Fig. 5). By means of event clustering, the raw dataset has been converted into bags of events, which match the standard input data form of apriori.

Fig. 5
figure 5

Applying an apriori algorithm (dataset a collection of item sets, support the proportion of an item set and len the length of an item set)

To highlight the frequent event set of a certain interval, the input data is reconstructed. Normally, there are many more events during daytime, while much fewer occur at night. If the events of a whole day are directly regarded as the input, the events occurring at night cannot be shown due to the lower density. Meanwhile, some typical events occurring in a certain time interval with a lower incident rate would be hidden by the others and cannot be discovered. Even the events with higher incident rate w.r.t. a certain day are known, but more detailed information is still undiscovered, such as during which time interval they tend to occur. Therefore, it is necessary to design an input construction mechanism.

As described, the basic period includes some segments and a number of basic periods form a multi-period. As for a multi-period, the input event data is selected from the time intervals with the same segment ID. Considering the regularity shift of some events and to reduce the effect of those uninteresting events in a segment, every selection is limited to a shorter interval. In this work, merely some segments are taken into account named interesting time interval (iti), which are morning, evening and night, and late night. To extract interesting information, a set of target event sets is defined and they are named an interesting event set (IES). If the event set that is discovered via apriori algorithm belongs to an IES, it is considered as evident IES with respect to a segment. As long as the human behavior is represented, the difference between two consecutive multi-periods will be calculated, which is change of regularity (Fig. 6).

Fig. 6
figure 6

Constructing the input data for apriori (P i a basic period, from P 1 to P n forms a multi-period)

Results

In the current work, four subjects’ datasets are analyzed. In the following content, the dataset of subject i is named in the form of Subject-i. The corresponding subject IDs in GAL-NATARS study are listed in Table 2. All methods are implemented via the open source programming language, R version 3.0.0 [23].

Table 2 The corresponding subject IDs in GAL-NATARS

Representing lifestyle

In this work events that may reflect basic self-care activities are selected to combine the IES. As for Subject-1, the IES is selected and listed in detail, followed by the interpretation. The abbreviations are defined according to the following rule: the first letter corresponds to a specific part of the living environment, e.g. ‘L’ is for living room, ‘K’ is for kitchen and so on; the last letter ‘S’ or ‘E’ stand for starting and ending, respectively. The meaning of the abbreviation is interpreted in Table 3. Similar work was also carried out for the other subjects. The support value for apriori algorithm is set to be 0.2.

Table 3 The meanings of the abbreviations
  • {“LE”, “BS”, “TS”}: to reflect the hygiene, bowel and bladder control and management

  • {“LE”, “KS”}: to reflect completion of house task

  • {“LE”, “BE”, “KS”}: to reflect eating and hygiene

  • {“HS”, “KS”}: to reflect eating

  • {“LE” “HS” “KS”}: to reflect eating

  • {“SE” “BS” “TS”}: to reflect bowel and bladder control and sleeping.

To illustrate the subject’s behavior patterns, a part of the results from Subject-4 are represented by means of a bar plot (Fig. 7). Figure 7 shows the event set distribution across four multi-periods (weeks), and each bar plot denotes a multi-period. Note that the threshold is shown as a red line. This shows that: (1) the subject uses the bathroom and toilet late at night (from 00:00 to 03:00), (2) as for this subject, entering the living room and kitchen also occurs late at night (from 00:00 to 06:00), (3) cooking usually happens in the daytime from 08:00 to 18:00 and (4) the patterns are not very stable and there is some change across the observation period.

Fig. 7
figure 7

Patterns of a number of consecutive multi-periods of subject-4. Each sub-figure shows the results calculated from a multi-period. The horizontal ordinate indicates the segments (time intervals) split in each multi-period, for instance, 03 denotes the segment of around 03:00, e.g. 02:00–04:00 and 06–08 denotes the segment from 06:00 to 08:00, LE KS indicates the route of living room - kitchen, LE BUS the route of living room - bathroom downstairs, FZE BS TS the route of TV set room - bathroom - toilet, BS TS KS the route of bathroom - toilet - kitchen, BS TS the route of bathroom - toilet, BE KS the route of bathroom - kitchen

Change in lifestyle

To obtain an overview of the subjects’ activity, the raw sensor records of these four subjects are visualized by means of spiral plots (Fig. 8). The clockwise circle illustrates the time of each day, from 00:00 to 00:00 of the following day, and each circle shows records for 1 day.

Fig. 8
figure 8

Four subjects’ sensor data visualized by means of a spiral plot. The different colored dots represent single sensor events

It is obvious that they exhibit different patterns. Subject-1 has an irregular lifestyle. The sensor records are evenly distributed and there is no evident border between daytime and nighttime. The spiral plots of Subject-2, Subject-3 and Subject-4 show quite different appearances between daytime and nighttime. It seems they have a more regular lifestyle. There is a high density of sensor triggering during daytime, while the density is quite low and consistent during nighttime. Additionally, some sensors are intensively triggered in the same time interval; however, relatively more sensors are triggered by Subject-4 than the other subjects except for Subject-1.

The change of iti, i.e. morning, evening and night, late night is calculated. The results are summed up for each multi-period, thus obtaining a single value between two multi-periods. In order to easily illustrate the results of change, they are normalized to a scale from 0 to 10. All subjects’ change is presented in Fig. 10. The results are denoted in different symbols and colors. The scale marks of horizontal axis denote the beginning of multi-periods (i.e. week). For example, the x-axis 2 means the starting of the second week. The change between 2 weeks, say Week (i) and Week (i + 1), is plotted at the beginning of the second week, i.e. Week (i + 1).

The calculated change value of Subject-1 has significantly declined in the first 5 weeks and during weeks 9, 10 and 11 the subject was not at home because of Christmas. The results show an unusual change value due to sudden data disappearing and arising. Because of data missing in the first 2 weeks of Subject-2, no pattern was discovered in these 3 weeks. When there are patterns evident in Week 4, high change is found between Week 3 and Week 4. Because of missing data, no change is identified in the first 2 weeks, and dramatic change shows up when data are measured. The first three values should also be excluded when considering the results with health. Apart from the first three points, in general Subject-2 has shown decreasing trend of change, which means that living was becoming more regular. Subject-3 showed decreasing trend but after that the change gradually increased. Subject-4 performed the best in keeping a regular lifestyle. The change is always under 2, even though there is slight increase in the last few weeks.

In order to analyze the difference in change of the subjects, the results are also plotted by means of a box plot (Fig. 10). By looking into the change shown in Figs. 9 and 10, generally speaking, the change of Subject-1 is the highest, except for an outlier. This is also in line with the raw data visualization in Fig. 8. Subject-2 and Subject-3 exhibit similar mean value of change, which is lower than Subject-1. Obviously, Subject-4 shows the least change. Besides, not only are the values low for Subject-4 but the distribution is also in a narrow interval, as shown in Fig. 10.

Fig. 9
figure 9

Change in the observation term of all subjects

Fig. 10
figure 10

All subjects’ change in behavior patterns during interesting time intervals (iti), which are morning, evening and night, and late night. Subjects 1, 2, 4 exhibit improved health status, while Subject-3 does not show significant improvement

Correlation with health assessment results

To intuitively show the correlation between the assessment results and the change value obtained through the proposed methods, they are presented in a single chart with three curves (Fig. 11). The results for each subject will be described in subsequent paragraphs.

Fig. 11
figure 11

The correlation between the change of behavior patterns and medical assessments

Subject-1

The plots of change at 9, 10 and 11 are outliers. In the first 5 weeks, the behavior pattern change decreased from 6.61 to 2.52. Meanwhile the mobility represented by m.val (Eq. 1) increased by circa 1 point. In the following 4 weeks, the change increased and a shallow valley was shown at the third assessment test. Note that the VAS generally showed an increasing trend, especially during the last 5 weeks when the feeling of pain is much stronger than the earlier weeks.

Subject-2

The plots of change at 2, 3 and 4 are outliers. Apart from the outliers, the change generally exhibited a decreasing trend. Meanwhile the mobility gradually increased. Note that this subject has shown very low VAS compared with the other subjects.

Subject-3

This subject’s mobility is relatively low compared with the other three. Nearly no improvement in mobility showed up and even slightly decreased in the last few weeks. The subject’s behavior change showed increasing trend in the last few weeks. The plot of VAS shows that the subject also felt significantly increasing pain in the last few weeks.

Subject-4

Through the whole observation this subject exhibited very little change, which was no more than 2 and the m.val maintained in a high level and progressively increased. The VAS fluctuated a lot during the observation.

Based on the results of these three parameters some conclusions can be drawn. Less significant change is usually associated with better mobility performance. For instance, the change of Subject-2 generally declined, while the m.val progressively improved and a similar situation has been shown during the first 5 weeks of Subject-1. However, the change of behavior pattern is not the dominant factor to represent the subject’s rehabilitation. As for Subject-1, the change in the last half term is high but the subject can still perform well. This mainly depends on the individual’s health condition. On the one hand, Subject-1 showed high mobility, which means the subject was able to move and conduct more activities; on the other hand, the subject suffered more pain, which may lead to the irregular and restless lifestyle. In another case, Subject-4 showed the least change, but some higher VAS values came out as well. Besides, the m.val of Subject-4 is not as high as that of Subject-1. This suggests that the feeling of pain did not bother the subject too much and Subject-4 can maintain a stable lifestyle and make some physical improvement.

Discussion

In the setting of sensor-enhanced living environment a paradigm is proposed to extract behavior patterns from sensor datasets using unsupervised methods and behavior patterns are extracted by discovering evident event sets. The methods have been applied to the datasets of the GAL-NATARS study, in which the data are collected in real-life situations.

During the whole procedure, merely unsupervised methods are used. Neither labeling data nor model training is necessary. It is assumed that triggering of sensors is closely related to certain activities, thus we can indirectly assess the resident’s behavior can be indirectly assessed. The primary information is identified through sensor ID, indicating which event is detected. To acquire the event location, the layout of deployed sensors is necessary. The subjects benefit from this approach, in which they can maintain their normal living without additional burdens.

To obtain useful results, an input data construction scheme has been designed. Based on time-line splitting, in each segment all possible combinations of subsegments are traversed and those combinations with support value over a threshold are regarded as evident. Through this scheme, more precise distribution of the evident IES can be discovered and the influence of irrelevant events is reduced.

The hierarchical time-line needs to be improved. As the time-line is dealt with in a hierarchical way, it is assumed that the resident’s behavior is periodical. A limitation of this approach is that it cannot discover those events which exceed the basic period. If the period of an activity is equal to the multi-period, for instance, the activity will be regarded as rare and it cannot be discovered via the approach proposed in this work. For instance, the laundry is also an important indicator of resident’s self-care and a way to detect it is to monitor the usage of the washing machine. However, for people living alone the washing machine is usually used weekly and its period is much longer than the period set in this work, i.e. 1 day. So they cannot be discovered merely through the fixed time-line. In future work it is expected to implement automatic time-line splitting according to the resident’s real case.

The IES (interesting event sets) are defined for each of the subjects. The evident IES, a subset of IES, in a chosen time interval is adopted as the pattern of the corresponding interval. Event sets need to be refined towards specific users, which is critical to optimally assess the resident’s health status. At present, they are determined based on the medical assessment items that are closely related to physical ability and behavior; however, the IES can be refined by clinicians. For a specific purpose, the sensor selection can be refined, thus leading to different event definitions.

In a simple approach, human health status can be regarded as consisting of at least two aspects, including physical health and mental health. Physical health determines the ability to conduct some movements, i.e., the mobility, while mental health determines the way to perform activities. Human behavior depends on both aspects as illustrated in Fig. 12; therefore, it is expected that the behavior reflects the health status. The correlation has been analyzed by comparing the discovered information and the results obtained through medical assessments (Fig. 11). From the analysis, some individuals have exhibited strong correlation between change in behavior pattern and health status, such as Subject-2 and Subject-4, while the others exhibit nearly no correlation. Therefore, determinant relation between behavior and rehabilitation has not been shown in the results.

Fig. 12
figure 12

Relationship between three aspects

Considering three aspects of the results, m.val represents the subject’s ability to carry out some activities, the VAS probably has more influence on subject’s mental health, both mobility and mental health are able to influence the behavior pattern, the variance of which is expressed by change in this work. Otherwise, behavior is also able to reflect the resident’s health to some degree. As for Subject-1, for example, the m.val indicates that there is more mobility but the VAS indicates that more pain is felt. From the change, this subject’s ability to maintain the lifestyle is relatively lower. A possible reason is the subject has acquired more physical ability but the pain necessitated frequent phases of rest.

Limitations

This work makes use of the apriori algorithm to discover frequent event sets. Although this approach is able to discover behavior patterns, the issue of time consumption is not taken into account. As the segment is split into more subsegments, much more time will be needed to traverse all possible combinations.

Only four subjects’ datasets are analyzed in this work. For the sake of finding convincing evidence to show the effect of pervasive healthcare in assessing human lifestyle regularity and its relation with health status, more real-world data sets from more subjects are necessary to investigate this association.

Merely environmental data are used and no wearable sensor data for activity detection or vital parameter measurements is taken into account. Therefore, this may not comprehensively reflect the reality.

Conclusion

In the context of pervasive healthcare, this work has proposed an unsupervised paradigm of extracting behavior patterns from the sensor-enhanced living environment. By analyzing the correlation between the change of lifestyle and the results of medical assessments, it is suggested that this approach may provide complementary information for health assessment. Some subjects in this study have exhibited strong association between change of lifestyle and health status, while others exhibited hardly any relevance. The dominant relationship between the change of lifestyle as assessed with the presented method and health status cannot be stated yet. As a primary research, this work will serve as a base for future research on information discovery in the setting of sensor-enhanced living environments. More real-world data sets from more subjects are necessary to investigate this association.