Keywords

1 Introduction

Use of information systems to influence behavior has been increasing in popularity in recent years. These systems are available in various forms both on desktops and in mobile devices and range from applications in the realm of the quantified self movement to applications and systems aimed at helping with lifestyle changes such as weight loss or even clinical or therapeutic systems that deal with conditions such as depression [1, 2] or diabetes management [3]. The behavioral problem domains addressed by the many systems are various, but in persuasive technology research the concept of Behavior Change Support System (BCSS) [4, 5] draws together the key elements of the systems themselves as information systems that are designed to form, alter and/or reinforce compliance, attitudes, and/or behaviors, and do so without using coercion or deception [5].

In the present experiment the research focuses on one important element of persuasive systems: the timing of persuasive communications. Such timing of persuasive messages has been suggested to have an effect on behavior [6, 7]. How the timing of persuasive messages is managed can also affect an end-user’s experience of obtrusiveness, since system-originated messages can produce interruptions and demand attention at moments that an end-user deems irritating [8].

The aim of the present study, therefore, is to shift the focus away slightly from trying to accurately spearfish for those opportune moments, and instead try and trawl for a coarser understanding of how persuasive message timing strategies affect behavior, behavior change and the user experience as regards the system. The key research question in the present pilot study, then, is: How do timing strategies affect behavior change and perceived unobtrusiveness when using BCSSs?

2 Background

There are essentially two perspectives on persuasive message timing that are present in this paper: the perceived interruptions from a system demanding attention, and the search for the opportune moment for demanding that attention. The objective here is not to pinpoint those opportune moments as such, but rather that the need to find them should be taken into account when studying timing issues. The present study aims to support that quest by exploring the broader scope of timing strategies. First, there is the effect of increased cognitive load that results from random interruptions demanding active attention [e.g. 810]. The second element is the concept of “an opportune moment” (or ‘kairos’ from the Greek) in the design of persuasive systems: the problem of when best to communicate a persuasive message to the system user for maximum effect [6, 7].

Cognitive load theory [11, 12] proposes that high working memory effort will have a negative effect on problem solving and other cognitive processes. The implications are notable for, say, learning [11] and decision-making [13]. Research into attention and interruptions has shown that attention is a limited cognitive capacity and that switching attention from one task to another comes at a cost to task performance [14]. Randomly timed interruptions have been found to have a negative effect on the performance of a main task [8, 9, 15] and such interruptions also increase negative affect as regards the system [8, 9]. Even in a broader context than specific tasks, some research suggests that extended and continuous exposure to constant interruptions by messaging and social media can have a detrimental effect on learning [16].

Miyata and Norman [10] argue that notifications from a system would have a lower interruption cost if delivered at a moment of lower mental workload. In other words, they advocate identifying kairos for sending a notification to a system user. Drawing upon this, a host of further research aims at identifying such moments for various systems to make use of [8, 9, 15]. For persuasive systems, the other type of ‘opportune moment’ has been identified as a factor in increasing the persuasive power of a message [6, 7]. Using the definition by Kinneavy [17], ‘kairos’ indicates an opportune moment and also the right measure of action. How much, how often and how frequently would all seem reasonable measures when discussing elements affecting perceived unobtrusiveness.

Earlier studies into factors affecting use and intention to use persuasive systems indicate that perceived unobtrusiveness affects at least the intention to use, if not the actual use of a system [18], and that it has an effect also on perceived persuasiveness of a persuasive system [19]. The PSD model [20] proposes that unobtrusiveness should be a key consideration in the design of persuasive systems: these systems should not disturb or interrupt users when they are carrying out their main tasks, and poorly executed timing of persuasive communications can lead to less than ideal results. Irritation resulting from interruptions may lead to levels of negative affect that influence one’s intention to use a system or its actual use. The balance between too low involvement, where the necessary effort associated with learning and reflection (both in-action and on-action) [21] does not take place, and too active demand on attention through interruptions is of great interest in efforts to design better persuasive systems. A balance between reflection as a behavior change tool and the level of attention demand is necessary: after all, for self-reflection to take place at all, some attentional resources must be allocated to the process.

Context-aware systems are an absolute necessity in ensuring at least a minimum level of control over interruptions [22]. Irritation or frustration from interruptions is not conducive to successful persuasion, as we have been found to be more open to persuasion when we are in a good mood [6]. Furthermore, having control over when to take a break has been found to produce better problem-solving results than if break times are dictated by someone else [23].

3 Research Setting

3.1 Hypothesis

The research aimed at discovering whether the type of persuasive communication timing strategy had any affect on persuasion outcome and Perceived Unobtrusiveness (PU) of the system. Based on earlier research on the topic, as presented in the Background section, the following hypotheses were formed: (1) Random Timing (RT) is perceived to be more obtrusive than User-defined Timing UT; (2) RT has a stronger positive influence on behavior change both in self-assessed task success than UT; and (3) UT results in a more positive response in terms of Perceived Unobtrusiveness than RT.

Should the presented hypotheses hold true, an interesting conflict emerges in persuasion outcome and perceived unobtrusiveness: a better result in terms of behavioral influence may lead to higher perceived irritation associated with the system in question. The discussion should then focus on how to address this imbalance in persuasive systems design.

3.2 Study Design

The experiment was a mixed design (N = 13) comparing two timing strategies, implemented in a purpose-built iPhone app. The experimentation period was five days, with additional days for pre- and post-test assessments. For the basic analysis of how the different types of interruption affected punctuality performance and perceived satisfaction over a period of time, a mixed ANOVA (analysis of variance) design was selected. The within-subjects factor was self-assessment ratings collected over the experimentation period, with the between-subject factor being the timing strategy. The dependent variables there were self-reported success ratings and perceived satisfaction ratings. Participants were divided into one of two groups as per timing strategy.

Five measures of both dependent variables were used. The participants reported on more than one punctuality task each day, and the average of the ratings for each day was used as the daily score. As the daily tasks were carried out independently by each participant without close supervision by researchers there were individuals who did not always complete three tasks per day and individuals who may have reported more than three tasks on some days. Listwise deletion approach was deemed most suitable for handling missing values.

3.3 Procedure (Data Collection)

Participants (N = 13) in the experiment represent a sample of opportunity. As an incentive to take part, completion of the experiment entered participants in a prize draw. At the time of recruitment participants were not told what the exact topic of the experiment was: the only information offered was a broad outline of the effort expected of them and the explanation that the experiment was to do with behavior change. As participants signed up, they were informed that they could withdraw from the study at any time if they so wished.

The participants were screened for punctuality schematicism. In short, self-schema refers to a person’s perception of his or her self in terms of his or her specific environment [24] and it is a construct linked with transition of intention into behavior [2426]. Research into behavioral intention and self-schemas have established that a positive self-schema towards a specific behavior acts as a moderator in turning intention into behavior [25] and that schema-matched persuasion can result in more effective change [e.g. 27]. The purpose of using such an attitude scale was to ensure that there were participants who would fall within the target group for an app that addressed punctuality problems. Short of finding volunteers who freely admitted to having problems with time keeping, using a punctuality self-schema assessment meant that we were able to identify participants who did not, perhaps, have the need to be on time as a prominent part of their self-identity; we could then assign such participants to both test conditions (RT and UT).

While the sample size in the present experiment is not overly large (N = 13), it was estimated using G*Power analysis tool [28]. Initially, 40 potential participants signed up, but for varying (personal) reasons many of them dropped out and did not complete the experiment. Aiming at effect size of .4 and power of .95 for two groups and five measures, the estimated total sample size required was 14. We were unable to meet this target minimum sample size in the time available for the study. Latin squares were used for balancing punctuality schematics and non-schematics between the conditions (Group A and Group B) at the start of the experiment, but owing to participants’ freedom to drop out of the experiment at any time it was not possible to control the final outcome of schematics/non-schematics per condition or to keep the test groups exactly the same size. Overall, 14 participants completed the tasks in a satisfactory manner, but one participant failed to complete the required information regarding schematicism and therefore had to be dropped from the analysis (Table 1).

Table 1. Schematic and non-schematic test participants in the experiment’s timing conditions.

After the schematicism screening the participants also filled in a background information questionnaire and were instructed how to download the app and use it with the relevant settings. Finally, after completing the tasks on the app, the participants were sent links to the final questionnaires by e-mail.

The two groups of participants were each instructed as to the specific settings they had to use on the test app in the experiment. Group A were the ‘Random Timing’ group, where the system would send them tips and hints throughout the day and also query their responses regarding task success and satisfaction some time after the completion of each task. Group B were asked to set one time for daily interactions with the system; in this way all the communications and requests to fill in the self-assessments were sent to the user once a day at a time of the user’s own choosing. For both groups the number of tips and hints was limited to three so that frequency would remain the same between the groups.

During the five experiment days all participants set themselves three timed tasks per day and one time-estimation practice task per day. The timed tasks could be anything that the participants determined they needed to be on time for or anything that they decided they wanted to do at some specific time. In the practice task the participants were asked to select some routine such as breakfast, journey to work, etc. and learned to estimate how long the activity actually took. The participants submitted the self-evaluations to the research database using a ‘send’ button in the test app.

3.4 Materials

Mobile Device Test App.

A purpose-made native iOS app called ‘RightOnTime’ was designed and developed by the research team and was entered into Apple Store as a free download. The particular purpose of the app was to help in rehearsing time management skills that are associated with many cases of chronic tardiness and it also provided features for actively practicing time-awareness and to encourage personal time-management related self-reflection. The app, then, was designed to support users in developing and/or maintaining punctual behavior patterns by making them see when their behavior was not based on realistic perceptions of time (or, indeed, when they were punctual and timely). The problem-domain content used in the app was a collection of existing knowledge and suggestions from the literature [29] and self-help websitesFootnote 1. In the experiment the app’s two settings configuration possibilities were used in studying the difference in response as regards perceived unobtrusiveness (Fig. 1).

Fig. 1.
figure 1

RightOnTime app: Home screen (left) for setting tasks, and an example of a self-evaluation view (competed after a task is done).

The app had two main parts: user-set punctuality targets/tasks and time-awareness practice. In the first category the user would set a number of timed events for each day. These could be any tasks or activities that the user wanted to be on time for. Each activity would be given an exact starting time and an estimation of duration. The main objective was to increase awareness of important meetings or tasks and to learn in that way to make the effort of getting there on time. Once the task was over, the system would prompt users to evaluate their performance in terms of timeliness. The prompt was set to appear within a 15-min frame after completion of the task so that it was not entirely predictable. Time-awareness assessments were not in the focus of the present pilot study.

A selection of Persuasive Systems Design [20] features was used in the design of the app: reflection (in action as well as on action [21]) is very much in the foreground not only in the form of Self-monitoring as such, but by generally offering the user ways of keeping the behavior change target in mind through continuous interactions, practice, etc. Further PSD features [20] used in the app included reduction, reminders, suggestion and real-world feel (see Table 2).

The app itself was constructed using iOS components and style as much as possible in order to minimize the effect of design on UX while maintaining an acceptable level of interface familiarity, functionality and usability.

Table 2. Persuasive system design features used in the RightOnTime app.

Attitude Questionnaire Development.

A self-schema assessment questionnaire was devised based on an established model by, for example, [24]. The questionnaire gauged each participant’s self image as regards punctuality by querying representative punctuality statements (measures). The use of an 11-point scale on the two measures (“describes me/does not describe me” and “is important to me/is not important to me”) has been used in earlier research on the relationship between self-schemas and behavior by, for example, [24, 30]. As such, the questionnaire style both in the listed studies and in the present experiment is a direct measure, but filler items were used in order to disguise the focus of the questionnaire at the beginning of the experiment. Also, the participants were not told what behavior specifically was going to fall within the scope of the test and the filler items referred to other lifestyle topics such as healthy eating. As mentioned, attitude was queried.

Perceived Unobtrusiveness UX Questionnaire Development.

In order to gauge the final experience of unobtrusiveness after using the test app for five days, a questionnaire to that effect was developed. By using established UX questionnairesFootnote 2 , Footnote 3 a perceived persuasiveness questionnaire [19, 31] the PSD model [20] and other related material [32, 33], five expert evaluators scored the statements, directing thus which statements were included in the final questionnaire. Again, when administering the questionnaire, filler items were added so as to diffuse the focus of the respondents.

4 Results

4.1 Participants and Sample Characteristics

Initially, a total of 40 people signed up to take part in the experiment initially. Of those, 31 completed the schematicism screener and were assigned to the test groups. Subsequently, more than half dropped out at various stages during the experiment period. 13 participants completed the entire experiment. Of the total initial number of interested participants (40) nine were non-schematics as regards punctuality. In the end, five of these nine completed the experiment and were included in the analysis.

Sample Description.

Average age for the sample was 36 years, with the youngest participant being 23 and the oldest 72 years of age. The oldest participant was notably older than the rest, but the participant was an experienced iPhone user, had no characteristics in choice of daily tasks or otherwise that stood out from the rest of the sample and so the participant was not considered an outlier. Nine participants out of the 13 were female, but the analysis does not consider gender a critical factor within the scope of the experiment. A clear majority (11) had used in iPhone for more than six months, making them familiar with the test device itself. As regards punctuality schematicism, the sample includes more schematics than non-schematics, meaning that the majority of the participants identify themselves as people who value punctuality and timeliness.

4.2 Data Analysis

The collected data was analyzed for variance both for task success (Tsucc) and task satisfaction (Tsat). These scores were also checked for correlation with Perceived Unobtrusiveness (PU). The PU scores were also analyzed for variance between the test groups.

Analysis of variance showed no significant effect of time of measurement in the Task success scores (time of measurement here referring to each of the five days of the experiment). The daily score used in the analysis is an averaged score from the three tasks participants reported each day in order to show a more consolidated day-by-day view of the development over time. The RT group’s task scores were measured at different times of the day, depending upon what time their set tasks took place, while the UT group reported all three scores at one self-set time. The extended delay between completing a task and reporting it in retrospect was part of the timing strategy difference between the conditions.

The collected means in Fig. 2 show the development of means for each variable and factor over the five days. The interesting movements in the chart (Fig. 2) concern the beginning and the end of the period. At the beginning User-defined timing group scores drop after the first day before they gradually climb back up to the starting levels and even beyond. The Random timing group is seen to climb up to the same levels at the end as the user-defined timing group, but task satisfaction in particular appears to climb up in a steady and even manner.

Fig. 2.
figure 2

Self-evaluation scores: means for each group per day.

UT group participants’ Tsucc and Tsat scores compared with PU scores pointed to a very small linear relationship (r = −31 for Tsucc, r = −.20 for Tsat). In the RT group the correlation coefficients indicated a stronger positive linear relationship, with r = .62 for Tsucc and r = .56 for Tsat. These indications of strength in the linear relationship are extremely interesting considering that the PU score is given on negative statements: the higher the score, the stronger the agreement with statements that indicate obtrusiveness. In other words, participants in Group A (Random Timing) tended to rate the system more obtrusive as their Task Success and Task Satisfaction increased. In the UT group, on the other hand, the system was rated less obtrusive as the Task Success and Task Satisfaction increased.

One of the initial hypotheses was that Random Timing of persuasive messaging may be experienced as being more obtrusive than timing set by the system user. We do not have exact information as to the reasons why participants dropped out from the experiment, as it was their right to do so without explanation, but it is worth noting that of the 31 initial participants assigned to test groups 17 were assigned to Random Timing and 15 to User-defined Timing groups. For these groups, the completion rate was only 5 participants (approx. 30 %) in the RT group and 8 (approx. 50 %) in the UT group. One possible avenue of thought for explaining the difference in the drop-out rate could be the difference in the obtrusiveness between the two experiment conditions.

A one-way ANOVA of the PU scores does not show statistical significance variance between the groups either, but as predicted, it is the RT group whose scores point towards a more pronounced experience of obtrusiveness. Therefore, while performance related scores improved over the experiment period, the participants in the RT group perceived the system to be more obtrusive than the participants in the UT group. In this questionnaire a higher score (the ‘agree’ end of the scale) indicated agreement on negative statements. In other words, here, too, there is some slight indication that Random timing might be experienced as more obtrusive than User-defined timing of persuasive messaging.

5 Discussion and Conclusion

The primary purpose of the experiment was to explore the problem field around the concept of unobtrusive user experience with persuasive systems so that a more robust exploration could then be carried out by means of a well-informed model of influencing factors.

The difference in the correlation coefficients between the test groups is a particularly interesting avenue of analysis. When using a system that sends messages, reminders and evaluation requests at random times, better perceived performance (Tsucc) and higher satisfaction in one’s own achievement (Tsat) do not seem to translate into an unobtrusive experience of the system. By contrast, there was an indication of the opposite effect when system users dealt with the system only at a time they had chosen themselves. Admittedly, the linearity in this case was weak, but it was observably in a different direction than with the Random Timing group. The difference prompts various questions as regards the role of cognitive load in perceived unobtrusiveness. How strongly does the awareness and sense of being disturbed, interrupted or otherwise reminded of an on-going task at random moments result in negative affect towards the system? In this small sample and over this short period of time performance and satisfaction did not differ significantly, and neither did the final experience scores; but does the correlation result suggest potential problems in terms of perceived unobtrusiveness in a longer time frame? Or does this suggest that it might be possible to achieve acceptable behavior change results even when minimizing interruption-based cognitive load?

As it was, statistically we ended up with an inconclusive set of results: the differences between the groups in terms of Task Success and Task Satisfaction ratings are not significant, and while there is an observable difference between the Perceived Unobtrusiveness rating between the conditions, the difference is not statistically significant. The significance of the drop-out rate difference between the two groups calls for further exploration in order to determine if the perceived unobtrusiveness was even more pronounced in the Random Timing group than presently reported. As it was, the freedom to drop out of the experiment does not allow for more than speculative conclusions based on the fact that more participants dropped out from the Random Timing group than from the User-defined Timing group.

Further research is certainly required, but the findings from this study can be used in identifying meaningful factors that are at play in defining appropriate timing strategies. One way of looking at the relationship between ‘timing strategies’ and an ‘opportune moment’ is to regard ‘opportune moment’ as one end of a continuum and an abysmally inappropriate moment as the other. Where a persuasive system falls on that continuum can depend on the timing strategies, whether random vs. set, sensor-based or context-based, etc. The ideal is to hit upon the kairos moment and achieve behavior change with minimal cognitive load, but such rarities apart, we must search for ways of providing an unobtrusive user experience without compromising the behavior change target.