Keywords

1 Introduction

Although maintenance activities are very critical in the manufacturing industry, only few maintenance activities are fully automated yet because it is one of the last areas to be automated in the manufacturing [1, 2]. Recent study also reports that over 30% of total workforce contributes to maintenance activities [3]. Maintenance activities are often composed of technical activities and non-technical activities. Retrieving instructions or information from manuals, for instance, take up about 45% of maintenance technicians’ time [4]. Therefore, if a technology such as an AI can alleviate some of technicians’ task by supporting their activities, the diagnosis and repair time will be shortened. However, AI must be cautiously implemented to the maintenance process because there was a case that an AI was meant to improve operators’ performance but it, instead, acted as a barrier and created even more challenges [5, 6].

Since 1996, as AI has become more popular, the number of annually published AI papers has soared in the field of computer science; the annual investment in AI startups by venture capitals has increased six fold since 2000 [7]; more and more people are paying attention to the potential benefits of AI.

In the field of the manufacturing numerous AI related papers can be found. In the manufacturing, AI is often used to detect product quality problems [8]. For example, Nguyen et al. and Yang et al. used an AI to detect defective wafers in the semiconductor industry [9, 10]. Similarly, Liu and Jin used an AI to detect defective tail lights in the automobile industry [11]. Outside of detecting product quality problems, research has also investigated different applications of AI. Huang et al. used AI to diagnose vehicle fault. Hong et al. used AI to detect faults in the semiconductor manufacturing equipment [12]. Similarly Zhang et al. used AI to identify degradation machines and tools [13].

The usage of an AI is also studied in the field of the human factors. For example, Overmeyer et al., studied the cognitive load of the operator who commands autonomous vehicles through an AI agent [14]. Similarly Strayer et al. studied the cognitive load of drivers who used an intelligent personal assistant [5].

Therefore, to explore this issue, we conducted a controlled pilot experiment to investigate the effect of AI-based support system on diagnosis task in the maintenance process.

The rest of this paper is structured as follows. In Sect. 2, we explain the experiment that we conducted to evaluate the effect of the AI on the diagnosis task. Next, in Sect. 3, we present the results of the experiment. Lastly, in Sect. 4, we state discussion and conclusions of this experiment.

2 Experimental Design and Setup

A proximity sensor is widely used to detect the presence of an object in many automated machines. However, proximity sensors frequently fail in CNC (Computer Numerical Control) machines. In addition, even though a technician identifies that the cause of a machine failure is related to the proximity sensor, the maintenance activity is not as simple as replacing a proximity sensor. The technician must check conditions of all components such as cable, power, I/O board, and sensor itself in order to repair the machine. Therefore, in this experiment, the model operated by a proximity sensor is chosen to evaluate the effect of AI on diagnosis tasks in maintenance.

2.1 Experimental Task

Proximity Sensor Model.

Every component in the proximity sensor model represents some component in a real industry machine as shown Table 1. The sensor in the experiment model detects whether the door in front of the sensor is closed or not. When the sensor detects the door, it shuts down the power to turn off the light. On the other hand, when there is no object, and every component is in working order, the light bulb is illuminated (See Fig. 1).

Table 1. Proximity sensor model setting
Fig. 1.
figure 1

Proximity sensor model

In the experiment, 4 components of the proximity sensor model were purposely in bad condition: battery, switch, light bulb and signal cable to light bulb. Then the participants were divided into two groups. The first group, also known as the FT group, were asked to diagnose problems and fix the model according to a fault tree-based support system. The participants in the second group, also known as the AI group, were asked to diagnose problems and fix the model according to an AI-based support system.

Support Systems.

Two support systems were provided to support participants’ diagnosis tasks. The FT, which is a common practice to repair the machines in many small and medium enterprises, based support system helped participants diagnose the locations of problems by deductive failure analysis method. On the other hand, the AI-based support system helped participants diagnose the locations of problems based on the pre calculated probability using the Naïve Bayesians classifier method. The Naïve Bayesians classifier method is used in this experiment because the method is known to require less input, work great in practice even if NB assumptions doesn’t hold, and good for showing casual relationship [1] (See Fig. 2).

Fig. 2.
figure 2

AI-based support system interface

2.2 Participants

Five subjects were participated in each group. The total participants for this experiment were 10. The average age of participants in the FT and the AI group was 29.2 and 29.6 respectively. The youngest participant was 25 years old and the oldest was 32 years old. Of 10 participants, 80% of them were male. In each group, equal number of female participants was assigned to minimize gender effects. Twenty percent of the participants did not major in either engineering or science. All other participants’ majors were either engineering or science.

2.3 Hypotheses

The following hypotheses were tested by using above experimental design and setup

  • H1: The task completion time of the group which uses the AI-based support system will be shorter

  • H2: The cognitive load of the group which uses the AI-based support system will be lower.

2.4 Experiment Procedures

Experiment participants are going to be divided into two groups depending on their assigned group and participated in the experiment as stated in Table 2.

Table 2. Experiment procedures

3 Experiment Results

Task Completion Time.

Task completion time is the time that a participant takes to diagnose components and fix them accordingly. It is comprised of diagnosis time, such as using a diagnosis support system and a multimeter, and time to replace or fix components. By measuring the task completion time, the effect of the AI-based support system on diagnosis time can be identified.

The mean task completion time for the FT group was 372.4 s. The standard deviation of this group was 72.2. The mean task completion time of the AI group, on the other hand, was 176.4 s and its standard deviation was 21.1. The coefficient of variation for FT and AI group was 22% and 12% respectively. Based on the level of the coefficient of variation, the AI group had less variation in the task completion time. The mean task completion time difference between the two groups was 196 s (see Fig. 3). A two-sample t-test was used to test the difference between two groups. The calculated t-value was 5.83 and p-value was 0.004. Therefore at α equals to 0.05, we conclude that there was a mean task completion time difference between two groups.

Fig. 3.
figure 3

Mean task completion time difference

NASA Task Load.

The NASA Task Load index (TLX) is a subjective assessment tool that rates perceived workload of participants in order to assess a system. The TLX is divided into six subscales or categories: mental demand, physical demand, temporal demand, performance, effort and frustration. By measuring TLX, the effect of AI-based support system on operators’ cognitive load and workload can be identified.

The average overall task load of the FT group was 5.03. For the FT group, the frustration load turned out to be the highest load among six sub-scales. The other loads were around 5.00 or above except the performance load. The average performance load for this group was 2.40. Furthermore, in average, the mental load was less than the physical load as shown in Fig. 4. On the other hand, the average overall task load for AI group was about 4.2. Most of the loads’ levels were similar to the overall task load level. However, the temporal demand load was 1.5 times more than the overall task load. The second highest load was the mental load which was 5.40. Among six task loads, the performance load was the lowest (Fig. 5).

Fig. 4.
figure 4

The NASA TASK load of two groups: FT group and AI group

Fig. 5.
figure 5

NASA task load differences

A two-sample t-test was used to identify the significance of task load differences between the two groups. The two sample t-test revealed that none of task loads’ differences were statistically significant at alpha 0.05. Although visually there were some differences between the two groups, the differences were not large enough to have statistical meaning.

Performance Accuracy.

The performance accuracy (PA) was defined as the number of parts replaced divided by the number of malfunctioning parts. If PA is greater than one, it implies that the participant replaced unnecessary parts while they were diagnosing the model. Of 10 participants, none of them replaced unnecessary parts.

4 Discussion and Conclusions

The experiment that we conducted to investigate effects of the AI-based support system on maintenance reveals several interesting points.

First of all, the experiment result shows that the AI-based support system not only can reduce the diagnosis time but also can reduce the variation of the diagnosis time compared to the FT-based support system. This is possible that the AI-based support system allows participants to diagnose less numbers of parts compared to the FT-based support system if and only if the reliability of AI-based support system is high.

Secondly, AI-based support system must be carefully implemented to the maintenance process because the experiment result shows that the mean mental load of the AI group is higher than the mean mental of the FT group although the difference was not verified by the two-sample t-test.

In sum, the experiment showed that the AI-based support system can reduce the diagnosis time and increase the mental load of technicians. However, the above points must be carefully interpreted since these results are based on our preliminary experiment in which only 10 subjects participated. Since a prerequisite of a two-sample t-test is a normality and the normality could not be assumed with 10 participants, the result of the two-sample t-test has to be interpreted cautiously. In addition, the power test requires at least 8 participants for each group. Therefore, there is a possibility that participants in this experiment do not truly represent the population. This pilot experiment was conducted as part of an exploratory study. In the future study, several additional factors that might influence the cognitive load of a technician during the maintenance task will be included and investigated.