1 Introduction

The unceasing aging of the population in developed countries is leading advanced societies to new problems that are not solved yet. It is expected that, in the near future, health care systems will be struggling to provide the proper services to the growing population of seniors, mainly due to limited economic resources and a shortage of qualified workers [20, 24, 39].

In addition, physicians and caregivers have known for decades that the elderly prefer to live independently in their home for as long as possible [42, 43]. However, many times, this independent living comes to an end earlier than expected because the elderly need to be monitored by physicians, or caregivers do not consider that an elderly person is capable of living independently.

New technologies can help to extend the period of the independent living of older adults but recently Knowles and Hanson [26] pointed to three important clusters of factors that limit the adoption of new technology: the risk of using the technology improperly, the perception of the technology as replacing something valuable to them, and the generally accepted excuse that they are too old to learn/use a new tool.

The rise of social robots represents an interesting opportunity to ameliorate the economic burden on health care systems and to extend the period of independent living of older adults. In addition, social robots are a new technology that can overcome the factors pointed to by Knowles and Hanson. In relation to the first factor and considering that social robots are intended to interact naturally with humans, the risk of inappropriate use might be reduced because older adults can naturally communicate with and understand these robots. This might ease their acceptance by the seniors. The emotional bonds created between humans and robots in long term interactions [12, 40] will make the elderly perceive a social robot as a valuable partner or even a friend, instead of perceiving it as a replacement (second factor).

In this line, robots have been used to reduce the number of falls and improve the feeling of safety [13], to practice physical exercises with the elderly [29], and to do music therapy [46]. In these works, researchers have adapted general purpose robots to assist the elderly. This could lead to untapped complex robots (with the consequent waste of money) or to simple robots with very basic possibilities. Depending on the different aspects researchers want to tackle with social robots, different technical approaches might be needed. In this work, we aim at designing a new ad-hoc social robot that supports and assists the older adults in cognitive and mental tasks. In particular, we focus on seniors suffering from the Alzheimer’s Disease (AD) and their caregivers.

Based on a previous study [38], we identified four application areas where social robots can benefit the elderly and their caregivers using cognitive and mental tasks: safety, entertainment, personal assistance, and stimulation. We conducted several meetings with three types of participants: subject matter experts in the fields related to AD (cognitive psychologists, clinical psychologists, therapists, and professional caregivers), roboticists, and relatives (some of them were seniors too and family caregivers). Within these participants, most of the potential end users of our social robot were represented and their opinions were considered from the very beginning of the design process. As pointed out by Bradwell et al. [7], we have followed a user centred design where the end users, that is older people and caregivers in our case, are involved in the decisions.

After the meetings, all participants approved several scenarios related with the four application areas that led us to the definition of the technical requirements of a new robotic platform. For the sake of concreteness, here we summarize these requirements that served as the basis for the new robot. These requirements are:

  • A stationary, desktop robot with a friendly look, which is easy to move from one room to another.

  • The robot needs to be endowed with expressive capabilities to ease communication.

  • Sensors for monitoring the elderly and identifying them during the interaction (e.g. 3D and RGB cameras).

  • Most of the above mentioned scenarios require the robot to perform a natural interaction with the elderly or their caregivers. To do so, it is important to allow verbal communication as well as tactile or visual communication. To this end, the robot needs the proper hardware.

  • A visual interface to display multimedia content.

  • An Internet connection to allow video conference capabilities, remote operation, and retrieving Web-based information.

  • A knowledge base where the individual’s information (for example, preferences, important dates such as birthdays, pictures of relatives, or favorite stories) is stored and can be retrieved for customizing the robot’s behavior.

Taking into account these technical requirements, in this paper we present the robot Mini (Fig. 1). This robot has been specially designed and developed to support the elderly and caregivers in their daily life either at home or in a nursing facility. It is important to notice that Mini has been designed as a tool for physicians, caregivers and relatives and it is not intended to replace any of these groups.

Fig. 1
figure 1

The robot Mini during an interaction with an elderly person

The rest of this paper is structured as follows. In Sect. 2 we review the most relevant robots that support the elderly and discuss the differences from our approach. Next, we present the design and the hardware elements of the new robotic platform Mini (Sect. 3). Section 4 presents the software architecture and describes the different modules. Next, in Sect. 5, we present a case of its use to illustrate the operation of the robot and how the human–robot interaction (HRI) is conducted. Section 6 describes the evaluation of the robot Mini and presents the preliminary results. Finally, the paper is concluded in Sect. 7.

2 State of the Art

As already stated, there is a growing interest in the application of social robots to improve the quality of life of the elderly. In this section we present a brief review of the literature on social robots and the elderly, distinguishing between those robotic platforms which have been specifically designed for the elderly, and those which have not been.

2.1 Social Robots Specifically Designed for the Elderly

In recent years, several social robots have been developed bearing in mind the special needs and characteristics of old people. The elderly commonly have problems related to mobility, loneliness, memory loss or cognitive impairment, etc., and these robots try to help them with these issues. In 2009, a literature review on robots and the elderly [8] proposed an initial classification, considering the main functionalities of these robots: companion type robots and service type robots. However, that same study concluded that not all robots can be categorized strictly in either one of these two groups. In fact, as will be presented in this section, the most popular approach is a combination of both: a personal assistant robot which offers different assistive functionalities and also companionship.

Nevertheless, there are still some robots that only offer company to their users: the companion type robots. There are studies that prove that they are able to enhance the health and psychological wellbeing of the elderly by providing companionship. For example, Huggable, a robot inspired by the Teddy Bear [44], and the social robot Paro, a baby seal-like robot, have been developed based on the succes of animal therapy with the elderly. Paro has been successfully used to facilitate therapeutic work with people with dementia, to enhance social interactions, and reduce social isolation [11, 25, 45]. This zoomorphic robot has been the most widely studied and adopted in practice. Nevertheless, some users simply do not like animals, so an alternative for these pet-like robots is presented in [17]: Babyloid, a therapeutic baby robot for the elderly. These companion robots are autonomous and provide continuous companionship, but they lack the ability to have a robust social interaction, such as spoken dialog and an expressive face.

On the other hand, as already presented, there are robots that are used not only as companions, but also as assistive robots. Their main functionalities are related to supporting independent living by supporting basic activities and mobility, monitoring those who need continuous attention, and maintaining safety [8]. For example, one of the first robots developed to assist the elderly was Pearl [34], a nursebot. It is a mobile robot that can help the elderly to navigate through a nursing facility. It does have a user-friendly interface with a face, and can also provide advice and cognitive support for the elderly [33], so it is able to provide companionship to the user.

The Care-O-botFootnote 1 is a mobile robot assistant designed with the abilities to speak, to communicate with an elderly person, to carry and lift things (it can have one, two, or no arms at all), and acting as an audio–visual portal connecting the outside world to the elderly person’s home. This platform has been used in the European Union-funded FP7 ACCOMPANY project (2011–2014), which has adopted the Care-O-bot as a home companion for older people as part of an intelligent environment [5].

There are other recent EU projects where robots have been designed for seniors, such as MARIO (2015–2018) [9]. That project addresses the difficult challenges of loneliness, isolation, and dementia in older persons, through innovative and multi-faceted inventions delivered by service robots. Another EU project is ENRICHME (2015–2018) of which the main purpose was to develop a socially assistive robot that can help the elderly, adapt to their needs, and behave naturally [2]. Both projects adopt the Kompai platform,Footnote 2 which is a mobile robot with a big touch screen on its chest and a head. This robot was designed for the elderly and offers different applications, such as health monitoring, entertainment activities, social connectivity, etc.

In this same line, the main aim of the EU project GrowMeUp (2015–2018) was to improve the quality of life of older persons and to increase the years of their independent and active living. GrowMeUp provides an affordable service robotic system: GrowMu. This robot is able to learn the user’s needs and habits over time and compensates for the the degradation of the capabilities of the elderly. GrowMu is also a mobile platform, with a touch screen on its chest and a head with eyes and mouth made with LED lights [18].

All these platforms, Care-O-bot, Kompai, and GrowMu, have very similar physical features: they are mobile platforms of about 1.3 m high, with a touch screen and a friendly interface. Nevertheless, none of them has the manipulation functionality.

The EU project HOBBIT focusses on how robots can be used to promote independent living among seniors. In the context of this project, a mobile robot with an arm and a gripper was designed [16]. The Hobbit robot also has a touch screen in the chest, a friendly interface with a head showing expressive eyes, and it is also small enough to navigate in a senior’s home [13].

Another approach is the one followed in the EU project MoveCare (2017–2019).Footnote 3 Although the main goal is also to try to let the elderly stay at home longer, the project is focussed on creating a virtual community providing assistance and a monitoring service to the elderly in their independent living. Therefore, the robotic assistant GiraffFootnote 4 is a mobile telepresence robot which will be adapted to the features of this project. This robot has a screen on top where the remote user is displayed and it also has a camera.

Apart from these projects there are other robots that have been used in other projects. In the University of Southern California, Bandit is a biomimetic anthropomorphic robot platform that consists of a humanoid torso mounted on a mobile base. The torso has two arms with hands, a head with expressive eyebrows, and an expressive mouth. This robot has been used as a socially assistive robot designed to engage elderly users in physical exercise [14]. Moreover, in [47] it is used for cognitive stimulation therapy for individuals suffering from Mild Cognitive Impairment and/or Alzheimer’s disease.

More recently, in [1], the robot Ryan Companionbot is presented as an intelligent, emotive, and perceptive social robot developed for improving the quality of life of elderly people with dementia and/or depression. This is an autonomous robot which offers uninterrupted companionship using spoken dialog combined with a rich list of other stimuli, such as eye gaze, head movement, and facial expressions. Similarly to Bandit, this robot has a torso with two arms and a neck, and it has an emotive and expressive face which is projected in the head.

2.2 General Purpose Social Robots Used for the Elderly

As already said, not all the research into social robots and the elderly use a robot specifically designed for these users.

In the previous section we presented some robots whose main functionality was companionship, such as Paro and Babyloid. Following the same idea of animal therapy with the elderly, some have used zoomorphic robots. This is the case with the commercial robotic dog Aibo,Footnote 5 developed by Sony, that has been used for stimulating social interaction in residents with dementia [27] and for providing further empirical evaluation of the effectiveness of animal-assisted therapy for the same type of users [31].

Other relevant works are those which use the social robot Nao,Footnote 6 the popular humanoid robot developed by Aldebaran. This robot has been used as an autonomous exercise instructor at a senior living community [28] and also as a cognitive stimulation tool in therapy for dementia patients[30].

Recently, the social robot Pepper,Footnote 7 another humanoid robot, also developed by Aldebaran, has been used to adapt its behavior to serve and meet the requirements of the elderly, while simultaneously maintaining its own system [50].

2.3 Our Approach

In this paper we present Mini, a social robot designed to coexist with the elderly in their homes, offering services for entertainment, assistance, and stimulation. It aims at extending the period of their living independently while maintaining a high quality of life. In contrast to other works, Mini combines applications for entertainment, assistance, and cognitive stimulation in a desktop platform. Therefore, this robot could be considered as a combination between the companion and assistant robot to be used at home.

As we have shown, the service robots described are specifically designed for the elderly. Nevertheless, as stated, the majority of them are mobile platforms with a considerable size (1.3m high and more), and others are robots with head, torso and hands of human size. They have very interesting applications and are designed to work in a nursing home or large spaces, but not in private homes. According to [38], elders do not want big and mobile robots at home. In this study, carried out by the authors, elders and caregivers referred that the idea of having a big robot that follows them was quite intrusive and threatening. For this reason, the desing presented in this paper is a desktop robot with a very pleasant design, as will be shown in the next section.

3 The Robot Mini

In this section, we present the new robotic platform from the point of view of the hardware and considering the seniors’ homes as one of the possible application scenarios. First, we present the external design and, then, the inner hardware components are described.

3.1 External Aspect

The external aspect of social robots is a key element to consider in the design process since it affects how people perceive it [22]. As already mentioned, social robots are intended to coexist with humans and interact with them. Therefore, the appearance of a social robot is crucial for its acceptance by the persons living with it side by side.

The external design of the robot Mini was made considering that it has to be perceived as a living entity, rather than a mere machine. To this end, it is important to endow the robot with the expressive capabilities and behaviors that make the robot perceived as alive.

In this case, the anthropomorphic cartoon-like shape undoubtedly helps. In a previous study [38], experts in the field expressed their preferences for an external appearance similar to the social robot Maggie [37]. Following these opinions, Mini has been inspired by its older bigger sister, Maggie. Mini is a 50-cm high desktop robot that is able to transmit its emotional state through its expressive eyes, an LED-based beating heart, cheeks, and the motion of different body parts.

The materials of the external parts also help towards a positive attitude towards the robot. Harlow already stressed in 1958 the importance of contact comfort in the development of affectional responses [23]. With this in mind, we decided to wrap Mini’s body with foam and fabric, giving it the appearance of a stuffed toy. Mini’s cover can be changed or washed but, in this first version of Mini, further hygienic consideration were not considered in the designing process.

Several studies have shown that customization of the robot’s external aspect can ease its acceptance and engagement [6, 21]. In the robot Mini, different shells have been developed, using different materials (soft and hard), colours (red, blue, and gray, among others), and accessories (hair style, scarf, etc.). See Fig. 2 to get an idea of the different looks of Mini.

Fig. 2
figure 2

Different external appearance of Mini

3.2 The Hardware

Mini is a self-contained social robot, which means that it is a stand-alone device containing all the elements that are needed to work properly without other external elements.

3.2.1 Mechanical Design

Mini is composed of two different parts: the lower base, where most of the electronic components are placed, and the upper body. The structural components of these two parts were made with a 3D printer, using acrylonitrile butadiene styrene (ABS) and polylactic acid (PLA), or polylactide, as the principal materials. ABS supports higher temperatures and, consequently, it has been used for the most demanding parts, mainly inside the base and the torso of the robot. The other mechanical parts located in the robot’s head, where the thermal and mechanical stresses are lower, were made of PLA.

Mini’s body is shaped as an anthropomorphic robot that resembles the upper body of a human or a biped animal. It consists of a waist, a torso, two arms and the head (see Fig. 3). The robot has 5 degrees of freedom: one in the waist, one in each arm, and two in the neck (pan and tilt). The waist contains a bearing that allows the body to rotate. The waist also serves to pass all the cables from the base to the electronic devices in the body.

The base is a square box used to store the principal electronic components of the robot, such as the processing units and the power supply system.

Fig. 3
figure 3

CAD view of the internal structure of the robot Mini

3.2.2 Processing Units

In view of the fact that all the computation has to be conducted on-board in real time, the robot has been endowed with high computational capabilities so as to manage all the necessary information. The robot is endowed with two processing units: a computer, which is in charge of the logical and arithmetical operations of the robot, and a data acquisition board (DAQ), which controls most of the sensors and actuators. Both of them support the ROS framework [36], which is the middleware on which is built the software architecture.

The computer of the robot was selected so as to fulfill two important requirements of the system. First, since Mini is a desktop robot, the size of the motherboard has to be as small as possible. Second, due to the high amount of information that the computer has to manage, a powerful microprocessor and high performance RAM is necessary to carry out all the processes of the system. For all these reasons, we selected the motherboard Asus Z170iFootnote 8 and mounted an Intel Core i7-7700KFootnote 9 processor with 8GB of high performance DDR4 RAM.

The selected DAQ is an Arduino MEGA 2560Footnote 10 board with an expansion board to easily connect the different devices. This DAQ was selected because of the large number of analog and digital signals it can handle, the multiple protocols included, the ease of programming it using the Arduino IDE, and its compatibility with ROS.

3.2.3 Power System

In the case of Mini, taking into account that it is a desktop robot, it is designed to be plugged in. However, considering the requirements presented in Sect. 1, the robot it should be easy to move it from one point to another, so it will be unplugged on a relatively regular basis. To allow safe transportation and to avoid electrical damage when Mini is suddenly unplugged, we equipped it with a battery that powers the robot while it shuts down in a controlled manner.

3.2.4 Sensors

The onboard sensors used in the robot allow the system to obtain information about the environment and the user with whom Mini interacts. The list of sensors is as follows.

  • RGB-D camera: the Realsense SR300 RGB-DFootnote 11 camera is intended for detecting people while conducting short-distance human–robot interaction. The infrared camera’s range is from 0.2 to 1.5 m. Moreover, it is compatible with ROS and Ubuntu.Footnote 12

  • Microphone: the robot has an unidirectional mono microphone to detect the sounds of the environment. It is placed on its chest and it is used to capture the voice of the user who interacts with Mini. The microphone includes noise cancellation hardware to reduce the ambient noise and facilitate automatic speech recognition and, consequently, improve the human–robot interaction.

  • Touch sensors: the robot is endowed with four capacitive sensors placed in the belly, the shoulders, and the head. They perceive when and where the user interacts with the robot by touching it.

  • Electronic Beacons: based on Bluetooth Low-Energy technology, these devices can communicate with a master bluetooth connected to the robot. Each eBeacon has an unique identifier and a power intensity. These eBeacons can be attached to objects or carried by users to identify them and determine their positions around the robot.

3.2.5 Expressive Devices

Mini has been equipped with several actuators that allow the movement of its different parts and endow the robot with expressive capabilities. The on-board actuators are:

  • Motion: Mini has five AX-12A servomotors located in the base (waist), arms (right and left shoulders), and neck (pan and tilt). They give the robot the ability to move and perform different actions by combining the movements of several of them. These motors were chosen because of their low weight (54 g), high stall torque (1.5 Nm), and reduced size. The AX-12A can be controlled in position or velocity mode, using both absolute and relative commands.

  • LEDs: Mini has several coloured LEDs to express emotions and different gestures combined with other actuators [32]. The coloured LEDs are placed in three body parts:

    • Heart: it is placed on the chest of the robot and is intended to simulate the beating of Mini’s heart. Several parameters of the heartbeat can be tuned to increase its expressiveness, such as the colour, rhythm of beating, and minimum and maximum intensity.

    • Mouth: this actuator is formed by an 8 LED array that is synchronized with the sounds emitted by the robot. The 8 LEDs turn on and off in a VU meter style.

    • Cheeks: two LEDs are located inside the robot’s head to emulate its cheeks. Similarly to the heart, their colour, rhythm and intensity can be tuned.

  • Eyes: Mini incorporates two \(1.2''\) screen modules which are used to depict the eyes of the robot. They are placed inside its head and are capable of displaying animated images and drawings to simulate Mini’s eyes.

  • Voice: a stereo speaker is placed in the chest of the robot to reproduce sounds, either verbal or non-verbal.

3.2.6 The Tablet

The robot is equipped with a tablet that is used to improve and ease the interaction with the users. The tablet is controlled by the robot and can be used to either display menus and collect the users’ answers, or to extend the interaction capabilities by showing different multimedia content (for example, images, videos, or even web sites).

4 The Software Architecture

The software of Mini is organized into 5 blocks that are shown in Fig. 4. Starting from the bottom, there are two blocks that communicate with the hardware: the Perception and the Actuation blocks, connecting with the sensors and the actuators respectively.

Fig. 4
figure 4

Black-box view of the 5 software blocks that form the software architecture of Mini

The different functionalities that Mini offers to the users are called applications. Depending on the scenario where the robot is being used, the existing applications can vary. Each application uses perceptual information and the actuators to achieve its goal. In addition, if an application needs to establish a dialog with a user, the HRI System is in charge of handling the communication.

The applications are initiated and interrupted according to the state machine that controls the operation of the robot. The transitions between different states in this high-level state machine can be triggered by interaction with users (e.g. a user request a particular functionality) or other external events (e.g. the robot is unplugged).

In the following, each one of these four blocks is described.

4.1 Operation of the Robot

This subsection describes the high level control of the robot. Mini’s behaviour is modelled as a finite state machine (SM) where each state corresponds with a functionality. The transitions can be triggered by a user request, automatically after a certain amount of time, or by an external event. A complete schematic of the SM is shown in Fig. 5. The states and the transitions are explained below.

Fig. 5
figure 5

State machine controlling the robot’s behaviour

  • Sleeping: this is the initial state after the robot is turned on. Here, Mini remains idle, acting as if it is sleeping. It wakes up when someone touches it. If it is in another state and is unplugged, a transition to this state is triggered just before it transits to the Off state.

  • Home: this is the state from which the applications of the robot are initiated. In this state the robot shows behaviours aiming at engaging the user’s attention (for example, greeting or tracking a person). When a user requests a functionality of the robot, a new transition is triggered. In case a request is not received, from time to time, Mini asks the user what to do. After a predefined period of time, if the user does not request any action, the robot transits to the sleeping state.

  • Applications: Mini is endowed with a repertoire of functionalities called applications. Each application is executed independently in a different state (see Fig. 5 for an example with two apps), so new applications can be added easily due to this modular structure. These applications are executed upon request by a user. When the execution of the application is completed or a user requests interrupting the current application, the control returns to the Home state. Depending on the scenario where the robot is employed, the repertoire of applications might be different.

  • Off: this state is devoted to performing a controlled robot shutdown. During this process, important information is saved and all processes are switched off in an orderly manner. This state is activated after the robot has been sleeping for a while and it ends by turning the robot off.

4.2 Perception in Mini

Perceiving the environment is crucial for a social robot since this allows a richer interaction. Therefore we needed to endow Mini with multimodal perception capabilities that use the information coming from the sensors described in Sect. 3.2.4. The output of the different perception mechanisms will be later used by the other elements of the software architecture.

4.2.1 User Information

Mini has been designed to interact with humans and therefore the information about people around the robot is of paramount importance. The robot has been endowed with user detection capabilities using RGB-D information. This detection component is able to detect users at different distances by taking advantage of the depth and colour information. For this purpose, this module implements two detectors, one using colour information to detect and locate the user’s face, and another one to retrieve the upper-body information from depth data. The face detection module includes a well-known technique, the Viola–Jones algorithm [49]. For the upper-body detection, the volume that is closest to the camera is analysed to check whether or not it meets some constraints regarding its shape. Given the operating range of the camera, depth information can be used only over a short range (up to 1.5 m), thus only the upper body is detected, whereas colour information is not that constrained by the distance to the camera, therefore, both detectors are fused to achieve a wider operating range.

Mini can also localize and identify the users in the environment using eBeacon technology [3]. This component uses the signal strength to calculate the distance between the eBeacon and the Bluetooth receiver placed in the robot. This allows establishing zones in which users carrying the beacons are located. Additionally, mapping each beacon ID with the information of each user, we can identify the users surrounding the robot. This detection is robust, as it is not affected by environmental conditions such as light or noise, nor does it need direct line of sight as does visual information. Moreover, the eBeacons are low-cost and light-weight devices, which constitute an advantage when compared to other wearable beacon technologies.

Apart from detecting users, Mini is able to detect the dynamic gestures of users in front of it [10]. This component discretizes the human body into a set of 15 joints defined by their position in 3D space. Speed information is also added by calculating the changes in position of each joint over a time interval (working at 10 frames per second). Finally, dynamic gestures are represented as a succession of position and speed information for all of the joints. This information is collected as 10 samples in a window of time in which each gesture takes place. This component uses a Random Forest classifier to identify and classify 14 dynamic gestures (e.g. pointing front, move hands to the head, greeting, crossing arms, etc.). To enhance robustness against false positives, the system makes a weighted average with the last 10 dynamic gestures recognised. That is, one classification result is output every second.

4.2.2 Tactile Information

There is a perception component in charge of acquiring information from the touch sensors installed in the robot’s body. These data contain information about whether the robot has been touched or not, where the touch has occurred, and when the touch is over. The operation of this component is simple but crucial in order to deal with the issues raised by capacitive touch sensors such as false positives.

In order to minimize the noise in the signal (mainly causing false positives), we have implemented a sliding window technique that computes the average value of the last 10 readings. This helps to provide reliable tactile information.

4.2.3 Speech

When perceiving the environment, detecting sounds and voices is also important for a social robot. Our platform includes an Automatic Speech Recognition (ASR) module [4] to understand users’ utterances. The ASR is structured in a two-level pipeline in which the lower level performs the speech recognition itself, applying acoustic language models to extract lists of words from the user’s utterances. Next, the second level uses grammar and extracts the lexical meaning from what the user has said.

4.2.4 Tablet Menu Information

Mini also exploits the capabilities provided by tablets for displaying menus as well as providing touch information when the user interacts with them. Mini is able to display menus with text buttons and images for interaction in the tablet. The layout can also be defined at runtime, ranging from lists of buttons arranged in a single column, a grid-like distribution, or even to moving buttons. This functionality allows implementing several interactive menus, to be used, for instance, in stimulation exercises. From the perception perspective, the tablet acts as another input device, providing information about the selections of the user in different interactive menus.

4.3 Expressiveness in Mini

The actuators described in Sect. 3.2.5 (motors, lights, screen-based eyes, and voice) give Mini the ability to move, communicate, and alter the environment. We have developed 5 modules which are in charge of controlling Mini’s actuators.

4.3.1 Body Movements

As already mentioned, Mini has 5 joints located in the base, arms (right and left), and neck (vertical and horizontal). Combining the motion of these motors the robot is able to take on different poses to enrich its non-verbal communication. Each motor can be controlled individually using velocity and position commands. In addition, the status of the motors can be read (current position, target position, error, speed, etc.) to perform relative or absolute movements, and to identify when the movement is completed. For the sake of safety, the temperature and the load of the motors are monitored and, in the case of overheating or overloading, they are disabled.

4.3.2 Gaze

To represent Mini’s gaze, two screens located in the head of the robot are used as eyes. Mini’s gaze is endowed with different expressions: angry, happy, neutral, sad, surprised, and bored (see Fig. 6). Each expression is composed of three different motions of the eyelids: (i) from the default position to the position with an open eye, (ii) blinking, and (iii) from the open eye position to the default position. The default position is shared by all gaze expressions and it facilitates smooth transitions between the expressions.

Fig. 6
figure 6

Mini’s gaze with different expressions. Video cycling through all gaze expressions: https://vimeo.com/394921822

The possibility of changing the direction of the gaze is very important for behaving naturally. Each gaze expression, depending on the position of the pupil, has 9 possible orientations: central, central-right, central-left, top, top-right, top-left, bottom, bottom-right, and bottom-left. Due to these 9 orientations of the gaze, Mini can look in any direction in the room or follow a user with the gaze, if necessary. In addition, Mini can vary the frequency of the blinking. This can help to express, for example, agitation (faster blinking) or calm (slower blink).

4.3.3 The Tablet

The expressive capabilities of Mini are extended by using the tablet screen as another communicative channel. This device is controlled by the robot and different types of information can be shown, through videos, images, gifs, audios, texts, or web pages. The robot decides what information is shown and how it is displayed (layout, size, fonts, etc.). Mini can interact with the multimedia components by starting, stopping and resuming them.

4.3.4 Voice and Sounds

The robot is endowed with a Text to Speech module which is in charge of converting text strings into utterances. This is a crucial element for verbal communication.

Moreover, to increase the perception of liveliness and for the sake of naturalness during HRI, Mini can reproduce nonverbal sounds, such as laughs, whistles or yawns. These sounds are formatted as audio files located in the robot and are reproduced when required.

4.3.5 Lights

Mini has several RGB LEDs to express different emotions in combination with other actuators (such as the eyes). All these LEDs can be configured to use different colours and intensities, and to fade in and out. However, considering the functionality of each one, some of these capabilities have been limited for the sake of naturalness. For example, in the case of the cheeks, the robot has one LED per cheek that can fade in and out, but using only one colour, red. We believe that using another colour for the cheeks could be perceived as strange.

Placed on the chest of the robot, the LED-based heart simulates the heartbeat of Mini. The colour, the rhythm of beating, as well as the intensity, can be tuned to reflect the state of the robot. For example, when the robot is sleeping, the heart beats slowly and fades in and out using light blue. Once Mini wakes up, the heart beats faster and has a brighter colour.

As mentioned previously, the array of LEDs placed in the robot’s mouth is synchronized with the sounds emitted by Mini in a VU-meter fashion. The colour of these LEDs can be changed on demand.

4.4 The Multimodal HRI System

The multimodal HRI system is in charge of handling the interactions between the robot and people. This part of the architecture allows Mini to process all the information collected by the perception modules and analyses whether that information is relevant to any active dialogue. If so, it generates the appropriate response.

In Mini, the approach followed to control the interaction is to divide the dialogue management into two levels: the Application Level, which is in charge of making all decisions that require task-related information and controlling the flow of the conversation; and the Manager Level, which is in charge of making those decisions that only involve interaction-related information.

When facing the task of modelling human–robot dialogues, we decided to look for the fundamental atomic elements present in every conversation. These elements are called Communicative Acts, or CAs [15]. The CAs are basic interaction units that can work independently or be combined to create complex interactions. Because Mini is a robot designed to interact with the elderly in (usually) one-to-one interactions, we have taken this into consideration when designing the CAs. We considered two variables when describing a dialogue as a combination of CAs: Intention and Initiative. The Initiative defines which of the peers starts the conversation and, in general, leads the interaction. The Intention is related with the goal of the leading peer: whether the aim is to obtain information from the other peer, or to give information. Considering these two variables, we identified 4 fundamental CAs:

  • Robot Gives Information: this CA conveys a message to the user through the appropriate output channels. The message and how it is conveyed can be decided by the application requesting this CA.

  • Robot Asks for Information: asks a question of the user and waits for an appropiate answer. This CA can request an open answer, where the user can respond freely, or a closed answer, where only a specific answer is accepted. The CA can also request a single value or a series of values, with or without an ordering. The CA can request that the answer has to come through a specific input channel, through several channels at the same time (for example voice and touch at the same time), or let the user choose from among a set of input channels.

  • User Gives Information: receives information from the user and sends it to the applications that are expecting it. In the case where multiple applications receive the same information, they decide whether this information is relevant or not.

  • User Asks for Information: after the user requests some information, the CA sends this request to the appropiate application, that is, the one that activated the CA. The application analyses this request, selects the appropriate answer, and the CA conveys it to the user.

Fig. 7
figure 7

Modules of the HRI architecture

The CAs have been designed to manage the different low-level tasks present in any conversation, such as error handling, not receiving any answer from the user or receiving wrong ones, perception errors (for example errors in the speech recognition), or changes in the initiative. These tasks are carried out by the HRI System and the applications do not have to take care of them.

In order to implement this system, we have designed an HRI architecture that is composed of three elements: (i) the Perception Manager, (ii) the HRI Manager, and (iii) the Expression Manager. These elements are shown in Fig. 7.

4.4.1 Perception Manager

The Perception Manager (PM) is the part of the HRI Architecture that connects the input modules of the robot with the rest of the architecture. This system collects information from the sensors, and aggregates them, providing a more complex meaning. It also can configure the input modules as requested by the robot’s applications and the rest of the HRI Architecture (for example, loading and unloading grammars into a grammar-based Automatic Speech Recognition module.) Fig. 7 depicts how the PM is connected to both the perception modules and the rest of the HRI architecture.

The PM processes and aggregates the information obtained from the sensors in three layers of abstraction. In the lowest level (Level 0), the information from the different perception modules (for example, a Natural Language Processing module) is received and translated to a standard format that is understood by the other elements of the HRI System. This standard format is an array of key-value pairs, where the keys indicate the input module that retrieved the data, and the values contain the data perceived serialized into a string, for communication purposes. Level 1, called Aggregation, combines the perception data acquired from different input sources within a certain time window into a single package that is shared with other modules.

In the highest level, Level 2, sensor data is fused to obtain more complex information. Sensor fusion will be performed depending on the type of sensory information. That is, in the case of spatial information (e.g. object positions or the detection of a face in an image or RGB-D data) traditional techniques, such as Kalman filters, are proposed. For example, if a face recognition module detects a user’s face and the spatial user localization module detects that a user is approaching the robot, Level 2 tries to correlate both detections and, in case they belong to the same person, the highest level of the architecture merges this information and informs Mini, with increased robustness, that a user is approaching. Adding non-spatial information (e.g. merging spatial information with speech recognition) is more challenging. So far, Level 2 has been considered in the architecture but it has not been developed yet. Currently, the Perception Manager outputs the time aggregation performed in Level 1.

Generally, the HRI Manager receives the information packaged by the PM but this information can be received by the aplications too, specially when the data retrieved from the sensors is not used for HRI purposes.

4.4.2 HRI Manager

As mentioned before, we decided to model interactions in Mini as a combination of basic interaction units called Communicative Acts. The HRI Manager is the module in charge of controlling the execution of the different CAs.

When one of the robot’s applications, or the high-level SM, needs to convey information to the user, or retrieve some information from him/her, it requests the activation of a specific CAs with particular parameters. These parameters could be the utterance that has to be conveyed, an image that has to be displayed in the tablet screen, or parameters that have to be relayed to the PM for configuring the appropriate input channels. The HRI Manager then loads the corresponding CA, configures it and executes it. Once the interaction has been completed, the HRI Manager returns the result to the Application Level, along with any information provided by the user. Our CAs have been developed as state-machines using SMACH, a Python library for developing hierarchical state machines.

In order to be able to respond to changes in the conversation topic, and to attend to unexpected user requests, the HRI Manager is able to maintain several CAs active at the same time. This can lead to conflicts if multiple CAs need to use the same input/output channels at the same time. We solved this problem by allowing the applications to specify the priority level of the CA (low, medium, and high). For CAs where the robot has the initiative, CAs are stored in priority queues, and then executed sequentially, from high to low priority. If during the execution of a CA a new one with a higher priority is received, then the current CA is stopped, stored in the priority queue (at the top, so it is executed immediately after the new CA ends), and the new CA is executed.

For CAs where the user has the initiative, several CAs can be running at the same time, as long as they do not use the same input/output channels, to avoid conflicts. This CAs also use a priority system, but in this case, no priority queues are used. If a conflict arises between a new CA and a CA currently active, the one with the higher priority stays active, while the other one is discarded, and the Application level is notified.

We decided to use different approaches depending on who has the initiative because, while it makes no sense to execute two CAs for the robot at the same time (the robot never should ask two different questions or convey two different messages at the same time), we need to be able to handle all the possible actions that the user might perform, and this is done by using several CAs at the same time.

4.4.3 Expression Manager

The last piece of the HRI architecture is the Expression Manager (EM). This module is in charge of controlling the communicative capabilities of the robot when interacting with an user. This manager is shown in Fig. 7.

The EM receives requests from the HRI Manager to execute expressions. An expression, or gesture, is a combination of different outputs with the intention of conveying a message. For example, the gesture for greeting could consist of raising the left arm, waving it, smiling, and then saying ‘Hello!’. The EM is in charge of controlling the proper execution of the different interfaces involved in a gesture.

Mini has a repertoire of predefined gestures with specific purposes, such as the just mentioned greeting gesture. These expressions can be modulated in runtime, for which we defined two parameters: speed and amplitude. Each of them affects specific aspects of the output channels (for example, speed controls the velocity of the movements, the fade frequency of the robot’s LEDs and the eyes blinking frequency, among others. The amplitude modifies the positions in the motors’ trajectories, the intensity of the LEDs, or the volume of the voice). In addition, the EM can alter an existing gesture to adapt it to external circumstances. For instance, continuing with the greeting example, the EM can change the utterance ‘Hello!’, depending on the time of day, and the ‘greeting’ gesture can be executed by saying ‘Good morning’ or ‘Good evening’.

All the gestures have been implemented as state machines using FlexBE, a high-level behaviour engine that allows creating SMACH-based state machines using a GUI [41]. Using a graphical tool allows people without programming knowledge to develop new gestures and improve easily the robot’s expressiveness. Then, the expressions are stored in a library of gestures, from where the EM loads the expressions that the HRI Manager requests.

On top of the just mentioned library of the expressions, the EM also can generate gestures dynamically in execution time. Applications have the possibility of defining particular task-related gestures that are not present in the robot’s gesture library. So, the applications can determine all the actions needed and the manager creates and executes the gesture.

The EM is composed of the following modules. The first piece is the Expression Scheduler, or ES. This module handles all the gesture activation requests and ensures that there are no conflicts between the gestures. Every time a request is received, the ES checks which output interfaces the new gesture needs and then looks to see whether there are any other gestures using, or about to use, those interfaces. In case of conflict, depending on the priority of the new gesture (only execute if the interfaces are available, or execute no matter what), the ES can either discard the new gesture, or stop and discard the gestures already being executed. The requests for gestures also specify whether the gesture has to be executed immediately or only after the current gesture being executed has ended. Every time a gesture ends, the ES checks whether there is any scheduled gesture that can now be executed. If this is the case, the ES sends an execution request to the next module of the EM, the Expression Executor (EE).

The EE is the module in charge of loading the corresponding gesture from our gesture library, and executing it. It can control several gestures running at the same time. Every time one of these gestures ends, the EE gets the outcome of the gesture, notifies the ES, and waits for the next execution request. The last modules of the EM are the Players. These modules are the only ones that communicate directly with the actuators of the robot (only when talking about interaction). Each player receives actions from the gestures and sends the appropriate commands to the actuator being controlled by that player. These players can be enabled or disabled, and can be interrupted at any time.

5 An Example of Mini’s Use

In this section, we describe how Mini operates in a real scenario and how its software modules are intertwined. In particular, we illustrate how our HRI architecture is applied to a particular scenario. Mini is endowed with several applications [19, 48] but, for the shake of simplicity, we focus on just two of them:

  1. 1.

    Cognitive stimulation exercises. This application includes psycho-stimulation exercises concerning temporal orientation, attention, gnosis or perception, memory, executive functions, calculating, and language. During the execution of each exercise, Mini asks questions using the different communication channels (e.g. voice or tablet screen) and the user answers them through different input channels (e.g. voice, a menu on the tablet, or touch sensors).

  2. 2.

    Dance. Mini is capable of dancing to rhythmic music. This application extracts the rhythm of an external audio signal and plays different randomized choreographies following the rhythm. The choreographies are combinations of several movements involving the head, the arms and the base of the robot, which the robot performs faster or slower depending on the rhythm. When this application is activated, the robot asks the user to play music or make a rhythmic noise (for example clapping).

Considering these applications, following we describe how the different elements of the robot’s architecture (Sect. 4) operate. In particular, we focus on the exchange of information between the high-level State Machine (SM), the applications, the Human–Robot Interaction Manager (HRIM), the Perception Manager (PM), and the Expression Manager (EM).

Fig. 8
figure 8

Sequence diagram of the connection between the different software modules of Mini during the execution of a cognitive stimulation exercise

Fig. 9
figure 9

Sequence diagram of the connection between the different software modules of Mini during the execution of the dance skill

First, we consider the case where a user wants to run several cognitive stimulation exercises with the robot (Fig. 8). We assume that the robot is sleeping so the user has to wake it up by touching the robot. In this situation, the SM is in the sleeping state and it has requested the HRIM to activate a user gives info CA. Using this CA, the SM waits until some touch sensor is activated to wake up Mini. This external event (the user touches the robot) is received by the PM and it sends it to the HRI Manager. The HRI Manager communicates the result of the CA to the SM, informing it that a touch sensor has been activated. Now, the SM deactivates the CA and transits to the waiting state.

In the waiting state, Mini asks the user what she wants to do, by activating a robot asks for info CA. This CA considers the maximum time to respond, the possible answers, and the corresponding grammar needed to understand those answers (the grammar is needed only when the CA expects an answer by voice). The HRIM receives the request and asks the EM to pose the question, and waits for the answer. If the user does not respond after the predefined response time, the CA requests the EM to ask the question again. This process is repeated as long as the number of attempts defined in the activation of the CA has not been reached. In the second attempt (see Fig. 8), the user vocally requests an exercise (“Start the monument exercise”). The PM obtains the semantic value of the user’s utterance and passes it to the HRIM, which decides whether this response is appropriate for any of the activated CAs. Nevertheless, if the PM is unable to obtain the semantic value due to problems with the communication (e.g.environmental noise), it reports a communication problem and, depending on the number of attempts, the CA repeats the question. In the case of not having more attempts (as in the Waiting State in Fig. 8), the HRIM concludes the CA and the communication problem is sent to the SM, since it requested the activation of the CA. After receiving this response, the SM activates a new robot asks for info CA but, this time, the user has to answer using the menu on the tablet. In the case shown in Fig. 8, the user responses through the tablet and the result is sent to the SM that, based on the value of the result, transits to the exercise state. This is a clear example of how recovery mechanisms in human–robot communication can be easily achieved with our approach by combining and parameterizing several CAs.

In the exercise state, the application cognitive stimulation exercises is started. In addition, in order to be able to interrupt its execution, a user gives info CA is activated too. This CA stays activated until the exercises are completed. The exercises are composed of several robot asks for info CAs in a row. The questions are asked using the voice and the tablet screen, and the user can respond vocally, using the tablet menus, or the touch sensors (depending on the exercise). When the CA is requested, the proper grammar to answer by voice and the available options in the tablet menu need to be defined. Similarly, the right answer needs to be specified too. The HRIM checks the user’s response and, using different expressions, congratulates the user or encourages the user to keep trying. In addition, the HRIM controls the number of attempts to answer correctly, the time out for each response, and eventual problems in the communication. When the exercise is finished, it notifies this to the SM and deactivates the CA and returns to the waiting state.

Now the robot is in the waiting state and again it asks the user what to do. In this case, the user asks the robot to dance (Fig. 9). Thus, the SM transits to the dance state where the Dance skill is activated and, similarly to the exercise state described above, a user gives information CA is requested to offer the possibility of aborting it. In this case, that CA is configured to receive information by the touch sensors so, in order to interrupt the Dance skill, the user has to touch Mini.

When the application detects sound, the robot starts dancing, following the rhythm by moving different parts of its body. Once the user wants to stop the robot’s dancing, the user touches the robot and the PM communicates this fact to the HRIM, where there is a CA waiting for tactile information. The HIRM communicates it to the SM (which is the module that requested the activation of the CA waiting for tactile information) and the SM stops the Dance skill and deactivates the CA. Finally the SM transits to the waiting state, where it will again ask for something to do and, after a certain time of inactivity, will transit to the sleeping state.

6 Preliminary Results

We run a preliminary evaluation of our robot Mini in a nursing home in Cadiz, Spain. In this scenario, Mini was used, mainly, to perform cognitive stimulation exercises with the elders. Mini was placed in the nursing facilities for two months and participants interacted with the robot freely. After the two month period, we run a questionnaire to assess how users perceived our robot in terms of usability, appearance and satisfaction. The questionnaire was extracted from the work of Portugal et al. [35] and it is composed by 25 items (see Table 1) that participants had to rate from 1 (strongly disagree) to 10 (strongly agree). These questions are related to the easiness of use of the robot, the motivation it brings, the happiness when using it, the animacy of the robot, its safety, and its performance during demonstration, among others.

The answers were measured using a 10 point scale. Three kinds of users were involved in this evaluation: elders (11 participants), caregivers (8) and relatives (3). Participants used the robot by themselves freely with an experimenter giving some directions about how to use the robot at the beginning of the experiment. Responses from two of them, an elder and a caregiver, where discarded since they provided invalid answers (e.g. text instead of numeric), so we used the responses from 20 participants.

Table 1 Summary of the questionnaire questions
Fig. 10
figure 10

Preliminary results considering the different user groups

Fig. 11
figure 11

Average values and standard deviations grouping together the three user profiles

The results are summarized in Fig. 10, with the details of the three user groups. Since the number of participants was not balanced among roles, Fig. 11 offers the average values, grouping together the three user profiles, and the standard deviation. Numerical values of the results can be seen in Appendix A. Regarding usability, there are some interesting insights from the answers.

Participants indicated that the robot was useful (UQ1) and easy to use (UQ2) and that such platform could make them feel more motivated to carry out their daily activities (UQ9). In contrast, participants did not fully perceive how the robot could help to reduce the demand for care from caregivers (UQ12) or how it could help users to gain autonomy (UQ11).

The evaluation of the robot appearance showed that the robot was perceived slightly more like a machine than as a human (AQ1) and reasonably lively (AQ2). Also, Mini was perceived friendly (AQ3), smart (AQ4), and safe (AQ5). These results seem to indicate that the external aspect of the robot (detailed in Sect. 3.1) made a good impression on the users but more effort have to be made to increase the liveliness of the robot, which in the end should lead to a better interaction.

In relation to user satisfaction, participants’ answers showed promising results: all questions were rated above seven points. These ratings point out a user’s positive attitude towards Mini.

Since this is a preliminary study, these results should be considered carefully. In order to validate them and demonstrate the possibilities of the robot Mini, a more extensive evaluation is needed. Considering this fact, results seem to indicate that relatives tend to give the robot a lower score in terms of appearance and satisfaction (see Fig. 11). In contrast, caregivers rated better the robot in terms of usability, with scores similar to the ones given by elders for appearance and satisfaction.

7 Conclusions and Future Works

In this paper we have introduced the robot Mini, a desktop social robot intended to assist and accompany the elderly, especially those with cognitive impairment or feeling alone. Considering the aim of the robot, the quality of the human–robot interaction is of paramount importance, and most of the elements of the robot have been designed with this in mind. The full stack of elements that Mini has been endowed with have been presented, from the hardware components to the high-level decision software module. Particular attention has been paid to how the robot architecture handles the interaction between the users and the robot. Mini combines the so-called Communicative Acts in order to achieve complex interactions. We have shown how these Communicative Acts are in charge of the low level tasks related with the interaction while higher level tasks remain at the application level. The application of Mini and its HRI architecture to a real escenario has shown how we modelled human–robot interactions by combining and parameterizing different CAs The proposed architecture facilitates the extension of the robot’s capabilities by including applications that can be requested by the user.

The evaluation of Mini by a group of end users showed promising results in terms of appearance and satisfaction. However, in terms of usability, participants did not perceive it as a tool that can help them to extend their autonomy. Thus, more effort will be devoted to this aspect.

In this work, the robot Mini follows the requests made by the user. In future work, Mini will be endowed with proactive behavior and will be able to lead the interaction by suggesting activities. In addition, Mini offers the possibility of combining stimulation exercises with other entertainment activities, with the aim of reducing the burden of long sessions of exercises. This will be explored in the future in collaboration with physicians and experts in the field.

We believe that the customization of the robot is a crucial aspect for the successful adoption of Mini by the elderly. In this paper we have shown how the robot can be tailored to the users’ preferences in terms of different external aspects, but customizing the robot’s behavior should be considered too. Therefore, in the near future, Mini will consider the user’s profile and preferences in order to adapt its behavior.

This article reflects the research work from the past years and contains the lessons learnt during the development of Mini, and other robots too. It is our hope that it can serve other researchers that are facing the problem of designing and building a new robotic platform.