Keywords

1 Introduction

The emerging technology of Robotic Process Automation (RPA) is said to enable the automation of the most repetitive, tedious, and mundane digital tasks that people are suing to do [1, 25]. However, not every task is suitable to be robotised since, besides these characteristics, it should (1) have a low level of exceptions, (2) require an enclosed cognitive effort, and (3) be susceptible to human errors [8]. Therefore, successful RPA projects require to start with an analysis phase [7] which identify those candidate processes—or part of them—which have more chances to be robotised in a cost-effective way, i.e., those which guarantees the highest return of the inversion with the lowest risk. Although most of the time this analysis mainly rely on interpreting process documentation, the latter may be of poor quality and may require substantial effort to understand [11]. In consequence, there is an increasing trend to capture the actual behaviour of the people interacting with real information systems (ISs) to amend the documentation problems, i.e., recording interaction events like mouse clicks or keystrokes.

Both academia and industry have acknowledged this issue and provide a variety of approaches. On the one hand, vendor-specific platforms (e.g., BluePrism [5], AutomationAnywhere [2], and UIPath [22]) offer tools to record macros-like scripts from the computer of a user executing the process tasks [23]. The obtained script can be analyzed later through the own vendor platform to discover the candidate process and, even, to support the robot code development. On the other hand, proposals can be found in the literature that suggest the creation of a standard log of events related to the interaction of the user with the graphical user interface, the so-called UI Log [6, 11, 12]. Obtaining this kind of log enables using the Process Mining paradigm [24] to disclose the knowledge that the log contains, among other things, the candidate processes to robotize. These proposals fall under the paradigm of Task Mining [16].

Although these solutions are reasonably mature, they lack support for real-world problems like those existing in the Business Process Outsourcing (BPO) industry, which presents one of the most suitable settings for conducting successful RPA projects [9]. The back-office of BPO departments is composed by large teams of workers performing digital processes through ISs of external companies. Creating a UI Log that comprises the behaviour of the whole team is a challenge for a series of reasons. Firstly, the distributed logging must be centralized in a common event log whose size increases with the size of the team. Secondly, the log of each member of the team may have some differences since not all the team share the same environment, e.g., screen resolution, text editor, WEB browser, etc. And, finally, each member of the team may perform the same processes differently than their teammates. Nowadays, this kind of distributed logging is not supported and, solutions that can be extended in this direction are not intended to be used in RPA, i.e., the generated log lacks detailed information for a thorough RPA analysis.

This paper motivates the minimum requirements that a UI logger should have in a distributed context based on an industrial collaboration with a company belonging to the BPO sector. In addition, software design is proposed to develop this logger as an Open Source project aiming to enable researchers and practitioners to easily get into RPA.

The rest of the paper is organized as follows. Section 2 describes the BPO context and identifies the fundamental requirements of this proposal. Section 3 describes a classification scheme where the most important features of the above requirements, and the tools related to this proposal, are categorized. Section 4 describes the proposed solution provided in this proposal. Finally, Sect. 5 summarises the work and presents some future work.

2 BPO Context

This section describes the knowledge flow which drives the interaction between all the participants in a BPO context which plan to acquire RPA capabilities.

In the context of the back-office employees, different training sessions provide them with the knowledge to perform an outsourced process against some ISs through their own computers. Different challenges are faced during the execution of such process since the real systems tend to present slight differences from the one learnt during the training, or even completely new process whose behavior is similar and they can adapt themselves to accomplish it. In addition, other issues may arise out of the processes like networking problems, operating system errors, etc. These challenges are typically addressed using their common sense and sharing the newly acquired knowledge with the rest of the back-office team.

In the context of the RPA analyst, similar training lessons are provided along with detailed documentation which is typically delivered by the company which host the process to outsource. This information is thoroughly analysed to (1) to understand the workflow and depict the as-is process, (2) to identify which parts of this process would be a good candidate to robotise, and (3) to provide a design of the robotised process to continue the RPA development. In this path, the RPA analyst recognises that there are chances that the prescribed process is not fully aligned with the real process. For this, this analyst uses to have periodic and informal interviews with some back-office employees to contrast to assimilate the on-the-field knowledge of them. Nonetheless, most of the details remain undisclosed within the back-office know-how.

For all these reasons, undesired effects occur from both perspectives, the RPA analyst and the back-office employees. RPA projects start with a higher uncertainty after long analysis periods and employees are sued to do mundane tasks for longer and unpredictable periods. For this, RPA analyst must be supported with a formal way to capture the back-office knowledge and which does not require intensive efforts from the back-office employees. Behavioural loggers present a suitable candidate that would be highly welcome by RPA analyst or the back-office employees.

Table 1. Classification scheme

After analysing this context, the following advanced requirements have been identified that are not typically offered by common loggers and that are the motivation of this paper: (1) The scalability level, i.e. the number of computers that can be monitored simultaneously, depending on the execution context, without impacting the system; (2) the method of sending the captured information to the user; (3) the facility to data processing, either through a database or some data structure that can be processed (log) and (4), the possibility of editing or complementing the features offered by the tool with the new software.

3 Related Work

Nowadays, different proposals provide the possibility of creating logs recording the behavior of a human interacting against a computer. In this sense, this section aims to describe the state-of-the-art regarding this topic, listing and categorizing the proposals found into a classification scheme.

In the context of parental and company employees control, many keylogger tools offer solutions for monitoring the activity of their users [4, 15, 18, 19]. In addition, platforms with broader objectives, such as the creation and management of RPA projects, also offer users the possibility of recording their activity [3, 20, 21]. However, the generated logs are frequently understandable only in the context of the platform itself. The closest solution to the proposal presented in this paper is the one presented by Volodymyr et al. [13]. In this work, authors propose a logger to generate results ready to be processed by process mining techniques with RPA purposes. Considering the requirements listed in Sect. 2 and the related tools mentioned above, a mapping between them was executed resulting in Table 1.

In the classification scheme, for each of the tool or platform, each requirement receives a weight. If the tool provides full support to the requirement, it is weighed as 1. If the tool provides partial support to the requirement (e.g., limitations by payment license), it is weighed as 0.5. If the tool does not provide support to the requirement, it is weighed as 0.

Fig. 1.
figure 1

Logger server

As can be seen in Table 1, all the analyzed tools or platforms provides functionalities to record the keyboard strokes and the name of the application that is being executed on the moment of the capture. The vast majority of them allows the capture of clipboard content. Very close by are those platforms that allow the capture of mouse clicks and screenshots. Slightly above average are the tools or platforms that let the user recording the moment of the capture and send all the information collected to a server. In addition, these tools can be executed in different operative systems. Capturing the mouse position or the computer’s characteristics that is being monitored can be only registered by half of the tools. Less than half of the proposals let the users make remote control of the computer that is being monitored. Finally, only three of the eight tools that have been analyzed have been classified as OpenSource (Fig. 1).

Although some OpenSource projects have been found, results show that most of them are independent modules that can be incorporated into other broader platforms. Thus, they only meet one or more of the defined requirements without completing the full set of them. The two best tools resulting from this classification are Spyrix and SpyAgent. However, they are not OpenSource. In addition, they do not cover important requirements like the possibility of being executed in diverse operating systems or recording the mouse position, among others.

To the best of our knowledge, none of the analyzed tools or platforms satisfies all the requirements defined, this paper presents the foundations of an OpenSource Logger to cover this gap.

4 RPA Logger

4.1 Endpoint Logger

The endpoint logger is focused on gathering enough information for a future RPA analysis. It captures the position and the button type of a mouse click, the keystrokes, and the screen captures. It provides three mechanisms of extension:

  • Capture extensions. For specific context, the logger can be extended with a scraper component which adds more information for each event, e.g., web page changes in the context of an RPA project where only web pages are used.

  • Capture policies. They are differentiated into two types. First, policies to capture mouse and keyboard events. The current mechanism is to capture one event per mouse click or keystroke. However, it would be interesting to define a policy where a set of keystrokes are grouped in only one event if they are within a defined time window. And secondly, policies to capture screen captures. The current policy capture one image per click or keystroke. However, some scenario that would not afford so many images can decide to make captures in a frequency basis, e.g., one capture every 30 s.

  • Send policies. A big amount of data is sent to the central server and, in some context, strict policies must be defined. The current policy sends the event once it occurs. However, in a context where network restrictions apply, a common policy would be to send all the events at the end of the day.

4.2 Central Server

The central server is in charge of storing all the events associated with each monitored computer. In addition, the heavy processing is performed to extract information from the events to be more useful for future RPA analysis.

For example, comparing the similarities between images to detect which ones correspond to the same activity. This comparison may be done by the use of image-similarity techniques [11]. More precisely, an efficient bit-wise comparison [10] between images fingerprints (i.e., short hashes which are obtained from each image in a deterministic way) used to state that two screen captures are related to the same activity according to some prefixed similarity threshold [26]. Another example is to extract patterns or texts from the images applying image processing techniques like Object Character Recognition (OCR) [17].

Data processing in the central server will be performed on demand. At this point, the information being collected does not have to be processed at runtime. Moreover, this way of processing the information can be beneficial to prevent database overloads by avoiding unnecessary iterations.

A simple process, i.e, a teacher that has to consolidate the results of the exams that she marked on her institution website, has been the reference to illustrate how the log should look like. Figure 2 illustrates a simplified log with some of the most interesting fields to be considered. Among them: a global identifier, another one that identifies which computer the capture came from, the timestamp of the capture, the action and the window where the actions are executed.

Fig. 2.
figure 2

Log example

5 Conclusions and Future Work

This paper presents the foundations of an OpenSource project which aims to serve as a logger for the analysis phase of RPA projects. After introducing a motivation scenario where the critical requirements have been identified, the closet works related to this proposal have been presented. Although similar proposals have been found, it has been noticed that: (1) they do not cover all the requirements or (2), they are private. Thus, none of the proposals is suitable for giving a solution to the described scenario.

In this context, this paper presents a proposal covering all the aspects mentioned. The proposed solution consists of: (1) a logger capable of collecting information from different events and equipment and sending it to a server and (2), a central server that is responsible for processing this information and converting it into an enriched log so that the data can be processed later.

The immediate future work is focused on preparing the data for processing by data mining techniques. Moreover, an in-depth definition of the requirements will be studied to improve the connection meaning between the requirements and the ones used for the classification scheme. Finally, another important aspect is to manage the data processing itself on the central server.