1 Introduction

The customized production model has gradually replaced the large-scale production model [1]. It requires that new manufacturing systems have the ability to meet individual needs. The cyber-physical system (CPS) [2], which is the next generation of intelligent systems that integrates and coordinates computing and physical resources through the organic and deep integration of computing, communication, and control technologies, effectively fills this gap with its high flexibility and adaptability. It helps people gain knowledge in the process of interacting with manufacturing systems and allows people’s intelligence and machine intelligence to motivate and grow together [3].

At present, CPS began to let manufacturing system learn to discover, understand, and apply knowledge autonomously like human-being not then before the emergence of the concept of intelligent manufacturing; the carrier of manufacturing knowledge was worker[4]. CPS shifts knowledge from human-beings to new carriers such as machines and computers with more operability and imagination [5]; it improves the efficiency and operational capability of transforming knowledge into physical products.

Model-based definition (MBD) [6], digital product definition, is the practice of using 3D models (such as entity models, 3D PMI and related metadata) to define (provide specifications for individual components and product components) in 3D CAD software. At this stage, research on visualization of AR instruction has proven that the advantages of MBD-based full-digital models are beyond the scope of any previous technology. It eliminates the physics manual, making the 3D model the only data source in the assembly process.

Geometric information, originally marked in the physics manual, is an industrial product used to describe product manufacturing information (PMI) [7] which is an organic combination of human intelligence and traditional manufacturing at the geometric level, including “precision” requirements, technical requirements, and annotations. “Precision” requirements include dimensional tolerance, geometric tolerance, and surface roughness. In AR assembly, precision can be visually superimposed on component objects to prompt operation tasks.

Augmented reality (AR) [8] is a technology that realizes the organic integration of computing resources (e.g., precision) and physical resources (e.g., assembly shop) which uses computer-generated visual information to convey knowledge to users to guide them in understanding current tasks [9]. AR can be applied to address a wide range of problems throughout the assembly phase in the lifecycle, e.g., planning [10], design [11], ergonomics assessment [12], operation guidance [13], and training [14]. AR assembly is guided by AR instructions. In the research of AR instruction, in addition to directly displaying PMI, visual information such as 2D images, 3D graphics, and animations can be used to guide users to complete specific tasks. Caudell et al. [15] rendered digital models of cable assemblies (such as cables, brackets, and fixtures) into real-world environments rather than physical drawings. Neumann et al. [16] inserted explanatory text in the work scene to guide the user in maintaining the vehicle equipment. Wiedenmaier et al. [17] used explanatory pictures, assembly models, and other visual information to express the assembly process of the door panel. Schward et al. [18] added operation instructions, case pictures, teaching videos, hints, 3D animation, and CAD model to AR assembly scene. Yuan et al. [19] used AR technology to insert text description, text description, two-dimensional markup, and other information directly into the assembly scene. Zhang et al. [20] established a knowledge-based AR assembly guidance system which inserted explanatory text, descriptive legend, CAD model files, video clips, and other knowledge into the working scene. Hou et al. [21] used visual cues such as specific colors and animation effects to guide the user’s assembly operations, which made them find that this information relieved the user’s psychological burden. Funk et al. [22] evaluated visual information such as videos, photos, graphics, and symbols, which made them convinced that “simple” information such as abstract graphics and special symbols also received good user feedback.

Actually, all of the above visual information is geometrical information and is visualized by engineering information (e.g., graphics, symbols, text, and numbers) following industry standards. This type of method is collectively referred to as geometry-level visualization (GLV). The user understands the task by looking at the geometric level information inserted. After input into the user’s brain, the brain converts geometric information into assembly relationships. Assembly relationship is an intelligent product composed of logic constraints, which is an organic combination of human intelligence and manufacturing technology at the information level. The formation of this product is accompanied by a series of complex logical operations, which results in heavy psychological burden and low cognitive efficiency of users.

At present, there is no such a method to directly insert the logical constraints initially stored in the user’s brain into the actual scene to guide assembly operations. Therefore, our team proposed an information-level visualization (ILV). Our team believes that ILV will reduce the burden on the human brain to understand information, thereby improving user performance in terms of assembly efficiency and cognitive efficiency.

Inspired by references [23, 24], ILV uses an interface to represent assembly relationships as visual cues that are intuitive and easy to understand, in order to explain the implicit operational logic of tasks to users. Especially in AR assembly field, there is no similar research. Therefore, our research is novel and contributes to the following aspects:

  • In AR assembly field, our team proposed ILV, which can guide users to understand assembly content quickly and accurately.

  • ILV provides appropriate display information according to users’ cognitive needs, which is an important upgrade of GLV in meeting users’ cognitive needs.

  • Our case study is one of the first pilot user studies to evaluate AR instruction design rules at the geometric and information levels.

  • Our implications indicate that ILV will bring better user experience and more efficient assembly than GLV.

The structure of this paper is as follows. The next section will briefly introduce the research work on GLV and define ILV on this basis. By constructing a data processing model for AR instruction, the advantages of ILV over GLV are discussed. In the third part, our team introduced the related items involved in our task, and designed two GLV-based visual interfaces and two ILV-based visual interfaces by using the built data processing model. The fourth section describes a case study, its experimental software and hardware configuration, and specific flow of our experiment used to evaluate two rules. The discussion after the experiment will be discussed in Section 5. The implications and limitation of case study will be given in Section 6. In Section 7, our team presents the conclusion about our study and look forward to future work.

2 Related works

In this section, through the analysis of milestones over the past 15 years, our team proves the existence of GLV and ILV, and describes the relationship and difference between GLV and ILV through a new mathematical model.

In the research of AR instructions, visual information such as text, 2D graphics, 3D graphics, video, and animation is the common content of AR instruction. The existing AR instructions and their corresponding data processing models are shown in Table 1.

Table 1 The research progress of AR instructions

2.1 Data processing

In the above literature, the assembly system always uses geometric information as the display content of AR instruction, which is called GLV. In fact, what users want to know is the intention of the task, and geometric information can not directly reflect these intentions. In order to reinterpret the descriptive process of task intention, we propose a data processing model which helps you understand the advantages of ILV.

  1. 1.

    Theoretical Formula

In an assembly task, each assembly process can be described by formula (1).

$$ \overrightarrow{Y_A}=\left\{\overrightarrow{Y_1},\overrightarrow{Y_2},\dots \overrightarrow{Y_i},\dots \overrightarrow{Y_{n-1}},\overrightarrow{Y_n}\right\} $$
(1)

In Formula (1), \( \overrightarrow{Y_A} \) represents the assembly task and \( \overrightarrow{Y_i} \) represents the assembly process.

$$ \overrightarrow{Y_i}=\left\{F\left(\overrightarrow{X_1}\right),F\left(\overrightarrow{X_2}\right),\dots F\left(\overrightarrow{X_k}\right),\dots F\left(\overrightarrow{X_n}\right)\right\} $$
(2)

In formula (2), \( F\left({\overrightarrow{X}}_k\right) \) denotes the visual guidance information of assembly step k, and F denotes the mapping relationship between assembly step and visual guidance information. \( {\overrightarrow{X}}_k \) refers to an assembly step consisting of several “precision” requirements, which can be expressed in formula (3).

$$ \overrightarrow{X_k}=\left\{{x}_k(1),{x}_k(2),\dots, {x}_k(k),\dots, {x}_k(n)\right\} $$
(3)

In formula (3), xk(k) denotes “precision” requirement k.

$$ F\left(\overrightarrow{X_k}\right)=\left\{f\left({x}_k(1)\right)\cdot {x}_k(1),f\left({x}_k(2)\right)\cdot {x}_k(2)\dots, f\left({x}_k(n)\right)\cdot \left.{x}_k(n)\right)\right\} $$
(4)

In formula (4), f(xk(i)) is a logical rule of visual information. In fact, different logical rule uses different data processing method (f(xk(1)), f(xk(2)), …, f(xk(i)), …, f(xk(n))).

Because ILV calculates logically according to “precision” requirements, all coefficients will be equal to a specific assembly relationship expression formula (e.g., formula 13). GLV does not perform logical operations on “precision” requirements, so all coefficients are 1 at this time.

$$ F\left(\overrightarrow{X_k}\right)=\left\{{x}_k(1),\cdot {x}_k(2),\dots, \left.{x}_k(n)\right)\right\} $$
(5)

By substituting formula (4) into formula (1), our team can get that:

$$ \overrightarrow{Y_i}=\left\{f\left({x}_i(1)\right)\cdot {x}_i(1),\dots, f\left({x}_i(n)\right)\cdot \left.{x}_i(n)\right),\dots, f\left({x}_n(1)\right)\cdot {x}_n(1),\dots, f\left({x}_n(n)\right)\cdot \left.{x}_n(n)\right)\right\} $$
(6)

This formula represents the data processing process of the assembly information at the information level. GLV can be represented by formula (7).

$$ \overrightarrow{Y_i}=\left\{{x}_i(1),\dots, \left.{x}_i(n)\right),\dots, {x}_n(1),\dots, \left.{x}_n(n)\right)\right\} $$
(7)
  1. 2.

    Explanation

Throughout the study of AR instructions (see Table 1), an assembly task is always composed of one or more assembly processes. These processes may be performed sequentially or in parallel. Our team interprets these two relationships as Formula (1).

A process is also composed of one or more steps. Similarly, each step includes sequential execution [35] and parallel execution [37]. They are expressed as Formula (2).

Step reflects the operation content of assembly task. It conveys some details of assembly operation to users through AR instruction. These details are “precision” requirements. We formulate the relationship between work steps and “precision” requirements as Formula (3).

Each “precision” requirement has corresponding logical operation method. Based on the user’s cognitive needs, these methods re-parse the “precision” requirements into user-oriented AR instructions [23, 24, 38]. We describe this new data processing process as Formula (4).

Actually, the logical rule is the core of this model. The main difference between these studies (see Table 1) is different than we used.

In Ref [25, 26], “precision” requirements are only text describing geometric size, which conforms to model-based definition (MBD) is the practice of using 3D models (such as entity models, 3D PMI and related metadata) to define (provide specifications for individual components and product components) in 3D CAD software). These “precision” requirements have not been interpreted as visual information that meets users’ cognitive needs. Ref [25, 26] only uses annotation information such as geometric dimensions, guidelines, and explanatory text as visual information. This method is the most common information visualization method in AR assembly. Similarly, Ref [15, 17, 20, 29,30,31] processes “precision” requirements into geometric features such as lines, curves, planes, surfaces, cylinders, and cuboids, which is another method that converts annotation information into 3D objects.

Some studies attempted to deal with more consider users’ cognitive needs. Ref [27] uses FEA simulation module to convert stress-strain data (mechanical data) into a 3D color cloud image, and user can adjust the stress distribution in color cloud map by modifying part structure. In Ref [28], FEA simulation analysis module of processing stress-strain data into a 3D color cloud image, deformation body displayed to the user through cloud image. These approaches further develop mechanical data, which is only processed as another form of visualization, visual information based on voxel models.

Whether Ref [25, 26], Ref [15, 17, 20, 29,30,31], Ref [32, 33], Ref [21, 34, 35], Ref [22, 36, 37], or Ref [27, 28], AR instructions designed by these cases describe geometric features. According to the built models, they do not use data processing methods (f(xk(1)), f(xk(2)), …, f(xk(i)), …, f(xk(n))) to analyze “precision” requirements (see Formula (5)), so they are all based on GLV visual interface (see Formula (7)).

Fortunately, some studies have tried to design different visual interfaces from GLV. In Ref [24], AR instruction does not display geometric features, and presents some key geometric parameters, as to directly reflect the assembly relationship. In contrary, this kind of research has not explored the effect of assembly relationship on users’ cognitive performance; instead, it explained the difference between these AR instructions and the previous ones.

Excitedly, our team has found that the visualization of this information does not occur at the geometric level (see Formula (6)). We will use an example to illustrate the difference between GLV and ILV.

In Fig. 1a, the task is to insert part B into the square slot of part A. The placement of part B in the square slot must meet the 4 technological requirements. These requirements are displayed to the user in the form of geometric features, such as points and lines; the content of which is the gap between the surfaces of the two parts (A and B). These geometric features are objective reflection of the geometric level, so this data processing method is GLV. In Fig. 1b, four process requirements are interpreted as an assembly relationship. Establishment of this relationship creates a constrained space, which is tolerance zone to maximize allowable range between two pseudo-axes. Tolerance zone in Fig. 1 is an expression of user’s cognitive needs, which objectively reflects form of “precision” requirements at information level. Therefore, data processing method is ILV, which is our research focus. To sum up, ILV developed from GLV. GLV describes geometric features involved in assembly task, while ILV describes the logic constraints behind these features.

Fig. 1
figure 1

The visual difference between GLV and ILV. a Geometric features. b Assembly relationship

In addition, some AR-based assembly systems visualized geometric features as assembly relationships. In Ref [23], the docking process of the upper and lower cylinders of an engine is expressed as the space traction effect of three 3D straight lines defining the operation method of the cylinder docking. Obviously, in this case, the user’s assembly efficiency and questionnaire feedback are not ideal, which means that only by following certain rules, the expression of assembly relationship can maximize the user’s cognitive efficiency. It is why this article tries to find out these rules.

2.2 Summary

Through analysis and summarization of other people’s research, by using new mathematical modeling ideas to summarize other people’s theoretical research, which leads to our team’s theoretical optimization. Geometric visualization works through its geometric features. In order to meet “precision” requirements, we further extend the geometric features involved in GLV to the assembly relationship. Assembly relationship is the core of ILV. It is composed of geometric information and has great maneuverability. It is better than GLV because it conforms to users’ cognitive habits. In addition, the introduction of assembly relationship makes instruction content more concise and intuitive.

In Fig. 2, there is a clear boundary between GLV and ILV, which is derived from information processing level of assembly data. Whether the object described is a geometric feature or an augmented geometric feature, it belongs to application scope of GLV. Besides, assembly relationships evolve from geometric features/augmented geometric features, which are described by data processing models. Augmented assembly relationship refers to assembly relationship with more detailed features. Next, we will introduce related items to demonstrate the above four levels of visual information, and discuss interface guidance with specific operational steps.

Fig. 2
figure 2

The visual information processing between AR system and human-beings

3 Our approaches

3.1 Relevant items of our task

In our task, the relevant items of assembly mainly contain engineering fit and machining error.

Engineering fit

The American Society of Mechanical Engineers (ASME) Y14.5 is considered the authoritative guideline for the design language of geometric dimensioning and tolerancing (GD&T). It establishes symbols, rules, definitions, requirements, defaults, and recommended practices for stating and interpreting GD&T and related requirements for use on engineering drawings, models defined in digital data files, and in related documents. GD&T is an essential tool for communicating design intent—that parts from technical drawings have the desired form, fit, function, and interchangeability. By providing uniformity in drawing specifications and interpretation, GD&T reduces guesswork throughout the manufacturing process-improving quality, lowering costs, and shortening deliveries. GD&T is so important element in engineering fit. Under such the system, a hole and shaft relationship has the same basic dimensions, which combined with each other is referred to as an engineering fit that is an indicator of tightness of connection between them.

Engineering fit is what GD& T needs to express. It shows tightness between hole and shaft by 3 relationships: clearance fit, transition fit, and interference fit. If the size of the hole is larger than the size of the shaft, radial difference is referred to as clearance fit (denoted as S); if the size of the shaft is shorter than the size of the hole, radial difference is referred to as interference fit (denoted as δ ). The intermediate state between the two is called transitional fit. In order to meet “precision” requirements, the actual clearance or interference of allowable change shall be specified in the design, called the limit clearance or the limit interference. Engineers typically refer to the maximum and minimum limits between a hole and a shaft as the maximum and minimum clearance, or the minimum and maximum interference. They are the basis with determining the hole and shaft limited dimensions that are combined with each other. Assume D defines the diameter of the hole and d defines the diameter of the shaft.

The maximum clearance:

$$ {S}_{\mathrm{max}}={D}_{\mathrm{max}}-{d}_{\mathrm{min}} $$
(8)

The minimum clearance:

$$ {S}_{\mathrm{min}}={D}_{\mathrm{min}}-{d}_{\mathrm{max}} $$
(9)

The maximum interference:

$$ {\delta}_{\mathrm{max}}={d}_{\mathrm{max}}-{D}_{\mathrm{min}} $$
(10)

The minimum interference:

$$ {\delta}_{\mathrm{min}}={d}_{\mathrm{min}}-{D}_{\mathrm{max}} $$
(11)

The permissible change in S or δ is called fit’s tolerance Tf.

$$ {T}_f={S}_{\mathrm{max}}-{S}_{\mathrm{min}}={\delta}_{\mathrm{max}}-{\delta}_{\mathrm{min}} $$
(12)

The relationship between the limit clearance (or limit interference) and the fit tolerance can be represented by the tolerance zone. As shown in Fig. 3, the zero-line indicates that clearance or interference is equal to 0. Above the zero-line is the clearance fit; below the zero-line is the interference fit. The size of the tolerance zone depends on the value of the tolerance, and the position of the tolerance zone relative to the zero-line depends on the limit clearance or the limit interference; the former indicates the accuracy of the fit, and the latter indicates the tightness of the fit. The smaller the tolerance zone, the higher the matching accuracy and the more uniform the tightness of the fit.

Fig. 3
figure 3

Limits and fits of the tolerance zone

Machining error

Engineering fit between a hole and shaft depends on their processing dimensions. However, the geometric parameters (size, geometry, and mutual position) of the processed part slightly deviated from the geometric parameters of the ideal part. This deviation is called the machining error. Due to machining errors, the machining dimensions of the various components are different. This phenomenon is called size dispersion. When calculating part data, the machining dimensions of each part are grouped according to a specific data interval range. The number of parts in the same interval is called the frequency data, and the ratio of the frequency data in the batch to the total number of parts is called the frequency.

In mechanical engineering, the histogram is plotted on the abscissa and the frequency is plotted on the ordinate. The curve formed by the histogram is called the machining error distribution curve. The data shows that the curve follows a normal distribution when the number of workpieces taken is sufficient and is not affected by any dominant error factors. The function expression for its probability density is:

$$ y=\frac{1}{\sigma \sqrt{2\pi }}{e}^{-\frac{1}{2}{\left(\frac{x-\mu }{\sigma}\right)}^2},\left(-\infty <x<+\infty, \sigma >0\right) $$
(13)

In formula 13, y is the probability density of the distribution. x is a random variable. μ is the population arithmetic mean of the normally distributed random variables. σ is the standard deviation of the population of normally distributed random variables. Then, the normal distribution function is:

$$ F(z)=\frac{1}{\sqrt{2\pi }}{\int}_o^z{e}^{-\frac{z^2}{2}} dz $$
(14)

In formula 13, \( z=\frac{x-\mu }{\sigma } \).

As shown in Fig. 4, when x − μ, then:

$$ {y}_{\mathrm{max}}=\frac{1}{\sigma \sqrt{2\pi }} $$
(15)
Fig. 4
figure 4

The machining error distribution curve

Our team regards μ and σ of the normal distribution population as calculating the average value \( \overline{x} \) and the standard deviation s of the part size, and then deduces the following formula:

$$ \overline{x}=\frac{1}{n}\sum \limits_{i=1}^n{x}_i $$
(16)
$$ s=\sqrt{\frac{1}{n-1}\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2} $$
(17)

At the point of view of part processing, the workpiece size close to μ has a higher probability of occurrence, while the workpiece size away from μ has a lower probability of occurrence. Because the curve is in normal distribution, positive deviation and negative deviation have the same trigger probability. The area surrounded by the distribution curve and the abscissa indicates the total number of parts (100%). When z = ± 3 (x − μ =  ± 3σ), the area in the range of μ − 3σ to μ + 3σ reaches 99.73%. If ±3σ represents the tolerance of the part, most of the parts in that range have reached an acceptable size. In fact, even if the holes and shafts are qualified, different assembly relationships may be formed between them.

Illustrating this with the example of transformation fit (ϕ50H7/js6), when the hole and shaft dimensions follow a normal distribution, their average gap is Sav = + 12.5 mm. From this point, interference fit or clearance fit may occur. As shown in Fig. 5, our team draw the hole size as a purple line and three different axes as three red lines. The purple line can only move horizontally along the green strip, while the red line can only move horizontally along the blue strip.

Fig. 5
figure 5

The tolerance relationship between hole and shaft

When the size of the hole is in the position of hole 1, all of the shafts (shaft 1, shaft 2, and shaft 3) can only form clearance fit with the hole. When the hole size is at the position of hole 2, shaft 1 and shaft 2 appear in the overlapping area of the two tolerance bands, the relationship between the hole and shaft is transition fit. In fact, shaft 1 appears on the left side of HOLE 2, and the relationship between the hole and shaft is transition fit–based interference state (TF-based interference state). Shaft 2 appears on the right side of HOLE 2, and the relationship between the hole and shaft is transition fit–based clearance state (TF-based clearance state). In this paper, our assembly task only exists in the above two cases. It is worth emphasizing that we have designed a three-color band for the three regions of TF, and its practicability will be explained in the next section.

3.2 Different level visual information

Machining error defines assembly relationship with different items. How to show assembly relationship to users is mainly by interfaces that are such 4 kinds of referring in our research. It is expressed by other articles that visual information directly represents the geometric attributes of hole and shaft, not representing assembly relationship between them. It is used by calculating the data from the brain, to understand assembly relationship, which is a waste of time. As for our team method, ILV allows users to directly observe assembly relationships that existed only in human brain in the past without overusing brain for data calculations. This is why we carry out this research.

In our task, “precision” requirements are specified as the tolerance relationship of the square tube in the X and Y directions (see Figs. 7, 8, 9, and 10). Transition fit is always satisfied between the selected square plugs and the square pipe (refer to Fig. 6). Our task is to use GLV and ILV to distinguish between TF-based interference state and TF-based clearance state, aiming at evaluating which way is more beneficial to users. According to the visual information output by GLV and ILV (see Fig. 2), our team already knows that the visual information of levels III and IV is different from that of levels I and II. In this section, our team will select a representative display style from each level to explain our task.

Fig. 6
figure 6

The square pipe and pipe plugs

Level I visual information (I-vi) is a direct manifestation of PMI. Product manufacturing information (PMI) is an organic combination of human intelligence and traditional manufacturing at the geometric level, including technical requirements, “precision” requirements, and annotations. “Precision” includes dimensional tolerance, geometric tolerance, and surface roughness. As shown in Fig. 7, our team use dimension lines, arrows, standard values, and below and above deviations in the PMI to sign parts. Measurement data is taken from AR assembly system.

Fig. 7
figure 7

The level I visual information

Level II visual information (II-vi) is an augmented form of PMI. As shown in Fig. 8, our team uses color map to mark assembled object. Among them, yellow block represents current state of square plug and blue block represents current state of the square pipe. In Fig. 8, the plug in X and Y directions has 4 situations. The yellow map is a symbol of plug and the blue one represents pipe. And then, make a comparison with the size of the blue and yellow map. When yellow map is larger than the blue in X direction, it means that size of plug is larger than size of the pipe in this direction. When the yellow map is smaller than the blue in X direction, it means that the size of the plug is smaller than the size of the pipe in this direction. When the yellow map is larger than the blue in Y direction, it means that the size of the plug is larger than the size of the pipe in this direction. When the yellow map is smaller than the blue in Y direction, it means that the size of the plug is smaller than the size of the pipe in this direction.

Fig. 8
figure 8

The level II visual information

Level III visual information (III-vi) is an expression of the assembly relationship. As shown in Fig. 9, the plug in X and Y directions has 4 situations. Our team uses these situations to sign the parts. When the data subtraction is positive in X direction, the size of the square plug is larger than the size of square pipe. When the data subtraction is negative in X direction, the size of square plug is smaller than the size of square pipe. When the data subtraction is positive in Y direction, the size of the square plug is larger than the size of square pipe. When the data subtraction is negative in Y direction, the size of square plug is smaller than the size of square pipe.

Fig. 9
figure 9

The level III visual information

Level IV visual information (IV-vi) is an augmented expression of the assembly relationship. As shown in Fig. 10, green rectangle represents the size of plug in X and Y directions. Both bright blue line (BBL) and rectangle have the same meaning. They show the size of pipe in X and Y directions. Our team uses the knowledge of engineering fit to convert the mutual dimensional relationships into tolerance bands for the three colors (see Fig. 5). Actually, each direction has 2 situations: When green rectangle enters the yellow area, however, not out of BBL, it means TF-based clearance state. When green rectangle enters the yellow area, however, out of BBL, it means TF-based interference state.

Fig. 10
figure 10

The level IV visual information

It is worth emphasizing that the first two kinds of visual information require the human brain to process it into information that the user can understand. The latter two types of visual information allow the user to intuitively understand the current state of the component without the brain having to calculate the data again.

3.3 Summary

The above describes the related items of assembly: engineering fit (GD&T) and machining error are a system for defining and conveying engineering tolerances. Symbolic language is used in engineering drawings and computer-generated 3D solid models to clearly describe nominal geometry and its allowable changes. This can define the form of a single feature and the allowable change of the possible size, and explain the allowable change between features. Engineering fit shows the tightness between holes and shafts through 3 relationships: clearance fit, transition fit, and interference fit. In order to meet “precision” requirements, the actual clearance or interference allowed to change should be specified in the design, which is called limit clearance or limit interference.

Interfaces, new interpretations, and new progressive relationships achieve optimal processing of assembly relationships, introducing a new concept to introduce the interpretation of information to a new level of visualization. The processing of data is not just the data carried by the geometric features but is processed and analyzed into data with assembly relationships. The optimal processing of the data is reflected in how the streamlined and analyzed optimal results are presented to the user. The interface is the assembly relationship that presents the information between the assembly objects. Our experiment clearly shows the working principle of the interface. We introduced four interfaces and used a simple assembly experiment for each interface. Explain that the seemingly complex interface concept is simple and clear in contrast experiments.

4 User study

An interesting question is: if the system knows all the measurements and can calculate them quickly, why not simply point out the workpiece to be selected? This can be attributed to two reasons: (1) When the user’s cognitive efficiency is the highest, the assembly efficiency may not be the highest. Although users quickly complete the task, this does not mean that they have a comprehensive and in-depth understanding of what they are doing. (2) Human participation is essential in AR-based assembly. It is of great practical significance to ensure that an inexperienced user completes the task accurately and skillfully.

In this section, we open up the research and raise six hypotheses. Our team describes the test settings, measuring structure and working settings used to validate hypotheses, outline the experimental process for verifying guesses, and report the test results. Aiming at the 4 interfaces, we open up the exact experiment to make more explanation about GLV and ILV (see Fig. 11).

Fig. 11
figure 11

The work system of the user study

4.1 Test setup and hypotheses

The test setup is designed to support assembly tasks in a controlled environment. Our team placed a test rig in one room (see Fig. 12) with 9 square pipe plugs on one side and a square pipe on the other. The square pipe plug is about 100 mm long, 100 mm wide, and 25 mm high. The square tube is about 120 mm long, 120 mm wide, and 100 mm high. The user will complete the assembly task based on the visual information provided by the projector.

Fig. 12
figure 12

The test setup for AR manual assembly

In this task, the user is required to select a square pipe plug that meets the matching requirements from all of the plugs. In fact, 9 plugs are almost identical in terms of size, material, and the like. In order to optimize AR instructions based on GLV, AR instructions oriented to assembly relationship are proposed in the experiment. 4 visual information designed by two instructions is randomly assigned to 4 tasks.

In order to validate the overall hypothesis, it is necessary to compare the performance of the four interfaces in the user study, such as I-vi, II-vi, III-vi, and IV-vi. Our team has designed an experiment comparing these four interfaces for users with respect to the time taken to complete a particular task and their subjective feedback. Based on this, the hypotheses of the experiment were:

  • Hypothesis 1: IV-vi is faster than I-vi.

  • Hypothesis 2: IV-vi is faster than II-vi.

  • Hypothesis 3: IV-vi is faster than III-vi.

  • Hypothesis 4: In terms of user experience, users prefer IV-vi compared with the other three interfaces.

  • Hypothesis 5: In terms of cognitive efficiency, users prefer IV-vi compared with the other three interfaces.

  • Hypothesis 6: IV-vi deepens the user’s understanding of visual information compared with the other interfaces.

4.2 User study structure

Our team conducted a within-subjects study with 25 participants, who were students from Northwestern Polytechnical University, mostly with an engineering background. In the beginning, each participant is asked to read and sign an informed consent form. Next, each participant will do a short pre-experimental questionnaire asking for demographic information, including their age, gender, educational background, previous VR/AR experience, and disassembly experience. These participants will be informed of the main objectives of the study, and the experimenter will explain to the participants the tips provided by each interface. Subsequently, he or she is told that the projector will display four interfaces on the surface. They are also told that each piece of information corresponds to the tolerance state of the current plug and that their goal is to complete the entire assembly process as efficiently as possible while spending as much time as possible to feel that the presented information is fully understood.

Our team uses a timer to record the time it takes each participant to complete each task. When a task starts, the participant is asked to press the timer at hand, at which point the industrial camera is turned on. The camera will record the video until the user presses the timer again. Participants were also asked to complete a postquestionnaire in which they were asked how confident they were about the tasks before and after they completed the study. A 7-point Likert scale was used to record the confidence level. Our team evaluated the above four interfaces by testing the performance of each participant in the experiment. Figure 2 summarizes the interfaces used under 4 conditions. These interfaces are:

PMI: As for this interface, I-vi contains visual information such as dimension lines, tolerance data, and explanatory text (see Fig. 13).

Fig. 13
figure 13

PMI-based visual information

Color map: In this case, in addition to information such as dimension lines, tolerance data, and explanatory text, II-vi also includes 2D color map (see Fig. 14).

Fig. 14
figure 14

Color map-based visual information

PMI+: III-vi still contains information such as dimension lines, tolerance data, and description text. The difference is that the tolerance data represents the difference between the two corresponding dimensions (see Fig. 15).

Fig. 15
figure 15

PMI+-based visual scheme

Augmented graphics (AG): IV-vi contains tolerance data, geometric 3D models with geometric meaning, text descriptions, etc. This information can directly indicate the mating state between the square pipe plug and the square pipe (see Fig. 16).

Fig. 16
figure 16

AG-based visual scheme

The order of conditions tried by the participants was arranged in Balanced Latin Square design in order to counterbalance the carryover effects between conditions. Before the experiment started, our team asked each participant to complete the assembly of one square plug. The square plug is the same as the square plug used in the actual test. Through this, the participant will learn how to perform the tasks our team requested. After the exercise is complete, the participant uses the interface provided to perform the task. After each interface, each participant was asked to complete a questionnaire with a Likert scale rating item (see Table 2) at a level of 1 (completely disagree) to 7 (completely consistent). The timer records the time at which the task was completed. After all interfaces, each participant was asked to rank the four interfaces on various aspects of their experiences and interviewed.

Table 2 Likert scale rating questions (evaluation item in bold)

Our team invited 25 participants to participate in the experiment, but data from only 23 participants were used. One participant failed to complete the experiment, and one participant was removed because the answers to these questions in the questionnaire were contradictory. The average age of these participants was 24.5 years, 82.61% for male and 17.39% female, who have no experience in using AR for part assembly.

4.3 Hardware and software setup

The prototype system our team developed combines the following elements: (1) Server-, (2) Client-, (3) “Client Server”-based information communication platform, and Fig. 17 shows our prototype system.

Fig. 17
figure 17

The prototype system for AR assembly

Server: It is the core of resources such as visual data and assembly instructions. Client can access server and obtain the corresponding resource data. Our team uses the Intel NUC7i7BNH microcomputer as a hardware platform for assembly resources. It uses Intel Ceroi7 7567U 3.5 GHz, 6 G RAM, Intel GMA HD 650 graphics card, 32G DDR4 2133 MHz, and Windows 10 Professional 64-bit operating system.

Client: It connects the projector and industrial camera to the PC and uses projected content to guide the user’s operational behavior. Our team chose the Dell Alienware 17 (ALW17C-D2758) laptop as a client. It uses an Intel Corei7 7700HQ 2.8 GHz, 8 G RAM, NVIDIA GeForce GTX 1070 graphics card, 16G DDR4 2667 MHz, and Windows 10 Professional 64-bit operating system. Our team chose an industrial camera with 8 to 50 mm long zoom lens. The resolution is 5 million, the maximum frame rate is 60 fps, the type of video interface is USB3.0, and the horizontal angle of view is 6.3 to 37.5°. Our team chose a projector model VPL-DX271. The projection screen size is 40 to 300 inches, the luminance (lumen) is 3600, the standard resolution is 1600 × 1200 dpi, and the display technology is 3lcd.

Platform: It is used to implement resource transfer between the server and the client. Based on this platform, server sends visual resources and assembly instructions to client through WIFI, and client integrates the received resources to guide user’s operations. The user’s operational behavior is recorded by an industrial camera and fed back to server, which will perform statistical analysis on the data.

4.4 Results

In this section, our team will report the obtained results. First, our team reports the task completion time and then reports the questionnaire data collected from the participants. Finally, our team summarizes the qualitative feedback data collected by asking participants about open-ended questions. Some key results are as follows:

  • In terms of task completion time, PMI+ can help participants complete the task faster, while PMI is the worst.

  • AG has been rated best in most aspects of user experience, but the significant difference between AG and PMI+ is not as large as expected.

  • In terms of cognitive efficiency, AG is better than PMI and color map, but it is no better than PMI+ in most aspects.

  • Compared with other interfaces, AG significantly deepens participants’ understanding of assembly tasks.

4.4.1 Task completion time

There is a significant difference in performance time between these interfaces. Figure 18 shows the average performance time for different interface conditions. Among the four interfaces, PMI+ has the shortest time, while color map has the longest time. The time required for the AG is slightly longer than PMI+ but much shorter than PMI.

Fig. 18
figure 18

Task completion time for 4 interfaces

Statistics show that compared with PMI (M = 103.52, SD = 23.71, SE = 5.05), the task completion time of AG (M = 86.13, SD = 19.46, SE = 4.05) is shortened by 16.8%. The paired t test (α = 0.05) showed a statistically significant difference in mean time between the two conditions (t(x) = 2.823, p = .010). Besides, statistics also show that compared with color map (M = 112.22, SD = 35.07, SE = 7.31), the task completion time of AG (M = 86.13, SD = 19.46, SE = 4.05) is shortened by 23.25%. The paired t test (α = 0.05) showed a statistically significant difference in mean time between the two conditions (t(x) = 3.879, p = .001).

It is worth noting that compared with PMI+ (M = 76.65, SD = 8.65, SE = 1.80), the task completion time of AG (M = 86.13, SD = 19.46, SE = 4.05) is lengthened by 12.37%. The paired t test (α = 0.05) showed a statistically significant difference in mean time between the two conditions (t(x) = − 2.205, p = .038).

In summary, the above results are statistically significant. This shows that our data records are meaningful.

4.4.2 Questionnaire: Likert scale rating

After completing the task under each interface, the subjects were asked to answer a questionnaire containing 8 questions (see Table 2).

Cronbach’s alpha indicated that the internal consistency between the Likert items is good (α = .739) and excluded each item from having a significant impact on reliability (α ranging from .702 to .739). Figure 19 shows the results with taking all Likert items aggregated into a single-scaled index of overall experience (from 0 to 100). The questions were answered by all participants, and our team reports the results for each group separately.

Fig. 19
figure 19

Likert scale rating on guiding experience

For factor analysis of the Likert scale results, our team used Friedman tests (α = .05) to see if these interfaces were ranked significantly different. The results showed that the participants were significantly different in all items (Q1: χ2(3) = 25.037, p < .001; Q2: χ2(3) = 10.261, p = .016; Q3: χ2(3) = 45.309, p < .001; Q4: χ2(3) = 39.311, p < .001; Q5: χ2(3) = 34.168, p < .001; Q6: χ2(3) = 48.297, p < .001; Q7: χ2(3) = 22.107, p = .002; Q8: χ2(3) = 30.711, p < .001). This suggests that these four interfaces affect the ratings of participants in all aspects, they affect the participants’ perception quality for visual information (Q1: enjoyment, Q2: focus, Q3: feeling confident, Q4: feeling natural and intuitive, Q5: feasibility, Q6: feeling efficient, Q7: availability, Q8: understandability).

In addition, our team also analyzed the rating results for each item. Figure 20 summarizes the corresponding results. In most cases, the PMI is the lowest rating, while the AG’s score is the highest in most cases. The scores for the color map and PMI+ are primarily between the PMI and AG. In the case of a pair-wise comparison between the interfaces our team want to study, our team uses the Wilcoxon signed-rank test (see Table 3) with the Bonferroni correction (α = .0167).

Fig. 20
figure 20

Results of Likert scale rating (1: strongly disagree~7: strongly agree, “×”: mean, significant main effects are marked as a superscript to the questions)

Table 3 Results of Wilcoxon signed-rank test on rating questions (significant result in Normals)

From Table 3, AG was ranked significantly higher than PMI (Q1: Z = − 3.041, p = .002; Q2: Z = − 2.569, p = .010; Q3: Z = − 3.784, p = .001; Q4: Z = − 3.798, p = .001; Q5: Z = − 3.440, p = .003; Q6: Z = − 4.083, p = .001; Q7: Z = − 3.722, p = .002; Q8: Z = 4.022, p = .001). This suggests that AG significantly affected participants’ enjoyment (Q1), focus (Q2), confident (Q3), N&I (Q4), feasibility (Q5), efficient (Q6), availability (Q7), and understandability (Q8) compared with PMI.

In all cases except for Q2, AG was also ranked significantly higher than color map (Q1: Z = − 3.178, p = .001; Q3: Z = − 4.181, p = .001; Q4: Z = − 3.675, p = .001; Q5: Z = − 3.683, p = .002; Q6: Z = − 3.743, p = .004; Q7: Z = − 2.430, p = .015; Q8: Z = 3.405, p = .001). It suggests that AG had a significant main effect for participants on enjoyment (Q1), confident (Q3), N&I (Q4), feasibility (Q5), efficient (Q6), availability (Q7), and understandability (Q8) compared with color map, but no significant main effect on focus (Q2).

Surprisingly, AG and PMI+ do not show significant differences in six cases. AG was ranked significantly higher than PMI+ only in efficient (Q6: Z = − 2.425, p = .015) and understandability (Q8: z = − 2.612, p = .009). This suggests that AG had no significant effect for participants on enjoyment (Q1: Z = − .243, p = .808), focus (Q2: Z = − .001, p = 1.000), confident (Q3: Z = − 2.000, p = .046), N&I (Q4: Z = − .440, p = .660), feasibility (Q5: Z = − 2.065, p = .039), and availability (Q7: Z = − 2.612, p = .038) compared to PMI+.

4.4.3 Questionnaire: Ranking

The participants were given questionnaires to rank above interfaces based on user experience with the different criteria described (1 = best, 4 = worst). Table 4 shows the list of the criteria for ranking.

Table 4 Ranking criteria

Figure 21 shows the average results of the ranking questionnaire. In all cases except for C3, PMI+ was ranked in the first place, while AG was ranked in the followed place. This suggests that participants tend to choose PMI+ in terms of enjoyment, focus, N&I, and confidence, and prefer to choose AG in understanding. In all cases, ILV was superior to GLV, and PMI was ranked in the worst place.

Fig. 21
figure 21

The average ranking results (*: significant difference)

To rank interface significantly different, our team used Friedman tests (α = .05). It is showed that the participants were significantly different in all criteria (C1: χ2(3) = 37.696, p = .002; C2: χ2(3) = 36.391, p < .001; C3: χ2(3) = 31.122, p < .001; C4: χ2(3) = 25.748, p < .001; C5: χ2(3) = 36.235, p < .001). From that, these interfaces affected the participants in terms of enjoyment, focus, understanding, N&I, and confidence.

In cases where a significant difference was found in the ranking, our team performed post hoc analysis using Wilcoxon signed-rank test with Bonferroni correction (α = .0167) to investigate if certain interfaces were ranked significantly different compared with others. The statistical results show that AG was ranked significantly higher than PMI in all the cases (C1: Z = − 3.105, p = .002; C2: Z = − 3.381, p = .001; C3: Z = − 3.692, p < .001; C4: Z = − 2.466, p = .014; C5: Z = − 2.609, p = .009). AG was also ranked significantly higher than color map in all the cases (C1: Z = − 3.387, p = .001; C2: Z = − 2.872, p = .004; C3: Z = − 4.320, p < .001; C4: Z = − 2.414, p = .016; C5: Z = − 3.042, p = .002). Except C3, the ranking of AG is lower than that of PMI+ (C1: Z = − 2.482, p = .013; C2: Z = − 2.399, p = .016; C3: Z = − 2.587, p = .010; C4: Z = − 2.496, p = .013; C5: Z = − 2.974, p = .003).

4.4.4 Questionnaire: Preference and qualitative feedback

To further investigate the operational feelings of 25 participants, our team asked the following 3 questions.

  • Question 1: How do you think AR affects the four visual interfaces presented in this article, compared with the interactive methods you use, such as desktop applications and paper manuals?

Nearly 80% of participants admitted that AR gave them a better sense of participation, a real and detailed understanding of the assembly process, rather than simple information exchange. In addition, more than 90% of participants said that AR guidance seamlessly linked process information to physical space, which increased their attention to the task itself and helped them not ignore assembly details in performing tasks. Nearly 50% of participants declared that AR instruction improved the efficiency and quality of information transmission. As we all know, the characteristics of information transmission are concise, efficient, image, low bit error rate, and strong understanding. The information quality of AR instruction enhances participants’ ability to process local information.

  • Question 2: Which one interface helps you to be more effective and faster?

56.52% (13/23) of all participants preferred the PMI+ interface, as it allowed participants to easily identify the rules depended by each interface in the assembly task. One participant said: “The PMI+ interface allows me to ignore the task itself, but only needs to pay attention to the sign of the number. I found that when the number sign is “+,” it means that the size of plug is larger than the size of pipe, while when the number sign is “−,” it means that the size of pipe is larger than the size of plug. Therefore, when the sign of the number is “+,” the relationship between pipe and plug is the TF-based interference state, and when the sign of the number is “−,” the relationship is the TF-based clearance state.” The assessment of AG is slightly lower than PMI+, and only 21.74% (5/23) of all participants think the AG interface is the best of these interfaces. One participant recognized that AG gives him too much detail, which does not contribute to his assembly efficiency. 13.04% (3/23) of all participants prefer to use PMI. They believe that standardized graphics can help them quickly become familiar with operational tasks. Another 8.70% (2/23) of all users thought that color map was a good interface. One participant once said: “Color-map makes me feel that the assembly task is interesting, which makes me feel happy.”

  • Question 3: Which interface can better help you understand the operational details?

69.57% (16/23) of participants prefer AG which helps them to understand operational details during the task. “As for AG, I feel that the content of brain thinking is more closely related to operational tasks. It is clear to see the current size of the square pipe plug and the current size of the square pipe in 3D tolerance zone. The content of these expressions is consistent with what I imagined in my mind.”, a participant once said. 17.39% (4/23) of all participants think the PMI+ interface is the best of these interfaces. They claimed that the visual content of PMI+ was intuitive and easy to understand. In addition, participants who support PMI and color map account for 4.35% (1/23) and 8.70% (2/23), respectively.

5 Discussion

In this part, our team further discussed and analyzed this case study. The authenticity of 6 hypotheses is checked according to the test results in task completion time, questionnaire, ranking questionnaire, and interviews.

  1. (1)

    User acceptance of AR instructions

Our most noteworthy concern is the impact of AR on four visualization technologies. Instead, can the same technology be used in desktop applications, even on paper? In fact, all 25 participants had used paper drawings, electronic handbooks, and 3D design software. When asked what different experiences AR instructions have brought them compared with previous desktop applications? Firstly, participants affirmed the improvement of AR instruction’s sense of integration into assembly tasks. To some extent, they admit that even if the same visual information (the four above) appears on engineering drawings, they still need the brain to spend time connecting drawing information with physical scenes, but AR instructions greatly shorten the process because they are directly superimposed on physical space. Secondly, participants believe that the high aggregation of information space and physical space improves users’ attention to operational tasks. The reasons can be summarized as two points. On the one hand, AR instruction has strong visual stimulation, which can attract users’ attention at any time. On the other hand, AR instruction has a close relationship with task content, which enables users to pay long-term attention to task content, reduce the loss of effective information in the transmission process, and improve the quality of information transmission. Thirdly, participants believed that the emergence of AR directives changed the previous training model. In fact, this change has brought greater help to novices who lack operational experience. In the interview, more than half of the participants admitted that even novices who lacked practical assembly operation experience could quickly combine digital parameters with practical tasks through AR instructions and solve assembly problems as skillfully as experienced users. This change allows users to continue to learn from operational experience that is not yet understood in the operation process, greatly reducing the time required for training.

  1. (2)

    The influence of GLV and ILV on assembly efficiency

According to the task completion time, ILV (PMI+, AG) improves the assembly efficiency of users than GLV (PMI, color map). We found two surprising rules. First, GLV design has potential specifications. Although GLV is not completely superior to ILV, this does not mean that GLV has no advantage. In comparison with the two GLVs, PMI takes a shorter time. Actually, in addition to the two GLVs reported in this paper, we have designed other GLV-based visualization schemes, such as color point clouds and color voxel maps. Without exception, their usage time is 10–20% longer than that of PMI. This can be attributed to one reason: users are accustomed to using MBD-defined graphics, symbols, and other elements. In other words, the design of these elements has simplified complex geometric features to abstract graphics. Therefore, we believe that these graphics should be preserved completely, rather than redesigned. Secondly, ILV which retains MBD graphic elements effectively improves the assembly efficiency of users. PMI+ is an ILV that retains MBD graphic elements. Compared with PMI, it does not use MBD graphic elements to describe geometric features, but uses them to describe users’ cognitive needs. According to the data, PMI+ further shortens the user’s assembly time. The comparison between AG and PMI+ also proves the existence of this advantage of MBD graphic elements. We thought that the rich operation details contained in AG would further improve assembly efficiency, but the fact is the opposite. Two reasons contribute to this result: (1) Inheritance of MBD graphic elements caters users’ working habits to the greatest extent; (2) Simple graphics that can describe complex problems are extremely beneficial to cater to users’ cognitive needs. Therefore, we believe that ILV should use MBD graphic elements to describe the cognitive needs of users reasonably.

  1. (3)

    The Impact of GLV and ILV on user experience

Questionnaire data show that AG and PMI have a significant impact on the answers to user experience-related questions, including Q1 (Enjoyment), Q2 (Focus), Q3 (Confident), and Q4 (N&I), and C1 (Enjoyed), C2 (Focused), C4 (N&I), and C5 (Confident) in the ranking questions. This proves that ILV (AG) does have a better user experience than GLV (PMI). However, the comparison between AG and color map shows abnormal results. There is no significant difference between both on the Q2. We believe that redundant information is the key to this problem. Color map itself is a reprocessing of PMI. Due to the introduction of redundant information such as color and wireframe, users’ attention will be distracted to a certain extent. Similarly, AG is also the reprocessing of PMI+ and there is the problem of mixing redundant information. Too much redundant information is introduced into AR instruction, which makes users feel confused about the real purpose of the operation task. Therefore, the large amount of redundant information will reduce users’ attention to task content. Another strange phenomenon is that there is no significant difference between Ag and PMI+ between Q1 and Q4. We also ranked AG in C1, C2, C4, and C5 slightly lower than PMI+. The reasons are as follows: (1) redundant information hinders users’ perception; (2) users only want to complete tasks as soon as possible, rather than know more about the details of tasks. Therefore, although assembly details are helpful to users’ cognition, it is not important for operators to provide too many details. However, this does not mean that the details of the operation are useless. When the user in AR task needs to be taught, the details of operation directly affect the quality of learning.

  1. (4)

    The influence of GLV and ILV on users’ cognitive efficacy

According to the questionnaire data, AG and PMI have a significant impact on the answers of user cognitive efficacy-related questions, including Q5 (Feasibility) and Q6 (Efficient). Similarly, AG and color map have significant differences in Q5 and Q6. In fact, GLV’s description of geometric features is beneficial to users’ cognition, but the promotion effect is at a low level. The same is true: ILV promotes user awareness by describing geometric features, but the difference is that user-centered ILV (AG) tries to reasonably restore the user’s inner thoughts, making users perform better in mastering the core content of the task and understanding the progress of the work. Therefore, the reproducibility of human brain thinking results by ILV is one of the factors that affect users’ cognitive efficacy. According to the questionnaire data, AG and PMI+ only show a significant difference on Q5, but not on Q6. This means that although any form of ILV can improve users’ cognitive efficiency, there are significant differences in the effectiveness of each form. The reason is that the instructions involved in AG are not closely related to the task objectives, which makes users need longer time to understand the task intentions. Therefore, the correlation between ILV’s instruction content and task intention is another factor that affects users’ cognitive effectiveness.

  1. (5)

    The influence of GLV and ILV on users’ information understanding ability

The results of Q8 (understandability) and C3 (understanding) questionnaires show that there are significant differences between AG and PMI, and AG and color map under this problem. This shows that compared with GLV (PMI, color map), ILV (AG) is helpful to deepen the user’s understanding of instruction content. The reason for this result is that ILV can directly present task intentions to users, while GLV can only indirectly reveal task intentions by relying on geometric features. There were significant differences between AG and PMI+ in Q8. According to the ranking questionnaire, AG responded better to assembly details than PMI+. In fact, question 3 of the interview yielded the same results. However, this result is correct only under one premise; even when the assembly efficiency is neglected, AG can give users enough cognitive clues to help users. Therefore, we conclude that the level of information understanding does not affect the assembly efficiency of users. In other words, effective assembly does not mean that the user knows the specific intent of the operation.

  1. (6)

    Cognitive rules of GLV and ILV

The cognitive rule is a potential rule reflecting assembly intention. It has been hidden behind AR instruction (i.e., GLV and ILV) since the beginning of AR instruction design. In fact, both GLV and ILV have their own cognitive rules. For example, the cognitive rule of PMI is that the size of square plug is larger than that of square tube, which is based on TF-based interference state, while the size of square plug is smaller than that of square tube, which is based on TF-based clearance state. Obviously, the cognitive rules given by GLV are too complex. Because ILV considers users’ cognition, the cognitive rules given by ILV are more popular with users, which can be seen from the results of Q7 (availability). Question 2 in the interview shows that by focusing on positive and negative symbols, users can quickly get the desired results. The same happens in AG. Users only need to observe the relative position of red rectangle and purple line to get the corresponding results. Therefore, compared with GLV, ILV simplifies the cognitive rules and greatly reduces the cognitive difficulty of users.

6 Implication and limitation

In this section, 3 implications are given based on the results of hypotheses and interviews. Besides, our team acknowledges that our assembly experiments have many drawbacks compared with actual assembly and that there may be better display schemes under ILV.

  1. (1)

    The design of AR instructions must follow the cognitive rules contained in assembly tasks.

In fact, whether AR instructions are designed with GLV or ILV, cognitive rules are potential. This paper only chooses engineering fitting to prove two points: (1) ILV can better satisfy users’ cognitive needs than GLV; (2) ILV can better promote the expression of cognitive rules than GLV. Facts have proved that these views are completely correct in any assembly task and have wide applicability.

  1. (2)

    ILV only focuses on improving users’ cognitive level and does not deliberately emphasize users’ visual experience.

Our research answers the question that visual information should be used to satisfy the user’s understanding of operational tasks, rather than to enhance the user’s visual experience of information. Therefore, for research purposes, our team is not focused on describing geometric features, but on expressing the operational logic of the task. This is the essential difference between ILV and GLV. Not GLV has nothing to recommend. Obviously, the construction of ILV requires the geometric characteristics of GLV. However, ILV retains only those parts that are beneficial to user awareness.

  1. (3)

    AR instructions for teaching and training should adopt ILV with sufficient details, while AR instructions for guiding operation should adopt ILV based on MBD.

User study shows that the two types of ILV solutions are suitable for different situations. In real assembly operations, concise instructions and easy-to-understand cognitive rules are expected by users. Therefore, the original ILV (e.g., PMI+) based on MBD graphics will enable users to obtain the best assembly efficiency, but it cannot guarantee that the cognitive efficiency is at a higher level. In demonstrative teaching and training, novelty instructions and detailed cognitive rules are what users want to see. Therefore, ILV with visualized graphics will enable users to obtain the best cognitive level without considering assembly efficiency.

Although the results of user research are very interesting, there are still many limitations. Because the projection content is only a two-dimensional image, users can not see the three-dimensional effect of visual information, which makes it difficult for users to fully understand the advantages of visual information. Compared with the actual assembly task, the content of this task is relatively limited. However, it does have some key elements, such as object recognition and assembly. This may limit the applicability of the results. Furthermore, this case study is not so detailed. We only analyzed the user’s task time and subjective feedback but did not evaluate the user’s behavior. We believe that assessing user behavior can help us understand the specific impact of information-level visualization.

7 Conclusion and future work

ILV is one of the first pilot user studies to evaluate AR instructions at information level. In this article, case study provides an example to demonstrate that ILV will achieve higher user performance than GLV. Three implications are not only applicable to engineering fit, but also to other types of operational details in AR assembly. The purpose of case study is to identify the ILV design factors that affect user performance. In terms of assembly efficiency, the experimental results only support H1 and H2, but not H3. In terms of user experience (Q1–Q4), our team found that there were significant impacts between PMI and AG, color map and AG, but no significant impacts between PMI+ and AG. The ranking questionnaire still showed statistical differences between PMI+ and AG. So H4 was eventually accepted. In terms of cognitive efficiency (Q5, Q6), our team also found significant differences between PMI and AG, color map and AG, but only significant differences between PMI+ and AG on Q6; by contraries, Q5 did not. In terms of information understanding (Q8), PMI and AG, color map and AG, and PMI+ and AG have significant effects. Therefore, the above results support H6, indicating that ILV is the logical process of visual information to content assembly, and is an important upgrade of GLV to meet users’ cognitive needs. The results of Q7 show that cognitive rules exist, and it can be predicted that ILV is more conducive to the expression of cognitive rules than GLV. In the future, we will try to make more study about ILV which changes user’s operation behavior. It can be inferred that it will be very interesting to explore user’s behavior deeply under ILV based on dynamic feedback.