1 Introduction

Cyber-physical system (CPS) consists of physical assets and computational capabilities that are integrated and interconnected. The availability and affordability of field devices, information systems, and computer networks drive industries to implement new methodologies, resulting in more intelligent, resilient, and self-adaptable systems. These types of systems, known as Industry 4.0 systems, offer significant economic potentials by acquiring accurate and reliable data from the production plants, converting this data into useful information, analyzing the information, optimizing operational decision, and applying the corrective and preventative decisions. The challenge as well as the key to the success of Industry 4.0 are highly modular structures for multivendor interoperability facilitation [1]. In Industry 4.0, the traditional production hierarchy is replaced by a decentralized self-organization scheme [2]. Components and procedures with local control intelligence communicate to other components and procedures through the system’s network to self-organize themselves within the production network. In this way, production lines become flexible and modular, allowing an easy plug-and-play integration or replacement of entities (components and/or procedures) [3]. The increased demand for personalized products, reduced life cycle time, and higher competitive markets results in more complexity of the organizational structures and procedures, including the assembly processes. However, the complexity introduced by Industry 4.0 technologies demands new approaches to manage modular design and design for assembly techniques.

Industry 4.0 is heralding an era of personalized production and assembly [4], which demands and is motivated by integration of online/real-time sequence optimization and optimal assembly formation tools. This information becomes readily available to the assembly workers, for example though augmented reality headsets [4], and to assembly machines for evolvable readjustment [5]. The optimization problem applies to both assembly time, through balancing problem [6, 7], and assembly process cost. In this paper, we introduce a probabilistic formulation for assembly optimization problem and explore the sensitivities of an assembly process to the constraints imposed by modular design. This method can be used with online optimization of assembly processes through simultaneous sequencing and module formation.

1.1 Problem definition

In the assembly literature, the cost or the number of defects is typically taken as the measure of assembly process efficiency. In this paper, we turn our attention to the cost of the assembly process (or the time of the assembly process) as a result of imperfect process efficiencies. The paper investigates the effect of modularity in the assembly sequence planning (ASP) for heterogeneous complex assembly processes where each operation (task) has a unique probability for success and a unique cost as shown Fig. 1a. Here, ci, i + 1 and pi, i + 1 are the cost and the probability for a successful completion of assembly task between elements i and i + 1, respectively. We analyze the effect of buffering the assembly process by introducing break points as shown in Fig. 1b, given the heterogeneous probabilities and costs, and the mutual dependencies between all tasks, and in view of a principal assumption about the complexity of assembly, a failure in any task requires restarting the assembly process from the first task. The goal of these break points is to interrupt the mutual dependencies in case of failures such that a failure of a specific assembly task affects previous operations only to the last break point. These break points can take place as operational buffers (e.g., mechanical couplers, electronic diodes), as assembly stabilizers (e.g., additional fixtures and connectors), or as modules and subassemblies that can be executed independently. According to this structure, all operations between two break points construct a cluster, and any pair of clusters is connected at a break point. Each break point is defined by two values (\( {c}_{M_i,{M}_{\mathrm{j}}}^{BP} \) and \( {p}_{M_i,{M}_j}^{BP} \)), representing the cost and probability of assembling the two modules (i and j), respectively. Although each break point entails additional cost to the entire assembly process, we anticipate that the expected process cost and duration are improved by both optimal modularization and optimal sequencing of tasks within each module. Also, Fig. 2 shows a scenario where three workstations assemble the three modules and a forth workstation assembles the modules together. The total assembly time in this case is the sum of the longest module assembly time (for this example, workstation 2) and the module integration time in station 4.

Fig. 1
figure 1

a Serial assembly sequence of eight components with seven fully mutually dependent assembly tasks, each with a different cost and reliability. b Two break points replace two initial assembly tasks to interrupt the mutual dependencies between the tasks

Fig. 2
figure 2

A hypothetical assembly time scenario for modularization in Fig. 1b

In this work, we consider task sequencing (optimization) and (optimal) modularization simultaneously. Figure 3 shows three modular architectures of the same system: in Fig. 3a, b, the sequences are identical and the modularization is different, while in Fig. 3c, both the sequence and the modularization are altered. We define a modular structure by a modularization vector (MV) that characterizes a modular architecture by the number of modules in the entire process and by the number of elements within each module for a given sequence of the tasks (Hk). For example, the MV in Fig. 3a is [2, 3] (two elements in M1, three elements in M2, and three elements in M3) for the sequence [3 → 1 → 2 → 5 → 4 → 6 → 8 → 7], the MV in Fig. 3b is [2, 3] for the same sequence, and the MV in Fig. 3c is [2, 3] for the sequence [2 → 1 → 5 → 8 → 4 → 6 → 7 → 3]. The aim here is to find optimum modularization and sequencing for minimum time (optimally balanced line [6]) and cost.

Fig. 3
figure 3

ac Different modular structures with different sequences of the same system resulting in different expected costs

More specifically, there are several questions regarding the implementation of the break points that the paper tries to answer:

  1. 1.

    How many points (modules) the optimal cost and time values give?

  2. 2.

    Where to insert them?

  3. 3.

    Should there be a change of assembly sequence within each module? Is there an optimal assembly sequencing for the new modular structure with respect to both assembly cost and time?

  4. 4.

    What sort of trade-offs exists between assembly time and cost for each modularization and sequence?

We study the above problems in a very general form by considering the following assumptions:

  1. 1.

    There are enough working stations to accommodate any modularization.

  2. 2.

    There are no precedence constraints for assembly tasks. This is not a limiting assumption, and this actually increases the computational complexity of the optimization process relative to when precedence constraints are present.

  3. 3.

    The probability for successfully conducting an assembly task and its cost are constant, regardless of the number of attempts and repeats. In reality, repeating a task might increase the probability of its success because of learning.

In Section 2, a literature review and background to the assembly complexity, and some initial developments to the assembly model, are provided. We show how task sequencing affects complex assemblies with heterogeneous success probabilities. We then examine, in Section 3, the effects of employing break points in a given assembly sequence. In Section 4, we examine the simultaneous effect of employing break points and changes in the assembly sequencing. The paper concludes that, for cost optimality, task sequencing must be performed subsequent to task decomposition; otherwise, the cost-saving effects of task sequencing can be eradicated by inappropriate decomposition.

2 Background

Traditionally, an assembly line is a sequential manufacturing setup in which components are assembled in several workstations (local or external) which are connected by a transportation network [8]. The transportation network can be rigid (e.g., conveyors) or flexible (e.g., autonomous guided vehicles (AGVs)). Each station in the line performs one or several assembly operations using humans, machines, tools, or robots. The output of the assembly line is a complete unit that is assembled according to the product structure or “bill of materials” (BOM) that lists the components, sub-assemblies and parts, and their relationships. The assembly process has a major impact on the product cost and quality. As assembly is often the last process in the manufacturing chain, detection and correction of failures in this stage is an important factor for the product quality. Design for assembly (DFA), originally proposed by Boothroyd [9], is a product design methodology that aims at reducing costs and the probability for assembly failures. Reducing the number of components (and therefore, the number of assembly tasks), adding grasping and orientation tools, and considering assembly directionality assist in reducing assembly cost and increasing the reliability. Numerous enterprises use the DFA methodology and report large savings [10,11,12]. Many models, technologies, and methodologies have been developed since Boothroyd’s introduction of the DFA for improving the assembly process and reducing the costs [13,14,15,16,17].

Statistical quality control (SQC) is a common tool in industry for process monitoring and improvement, which is based on sampling products from the manufacturing line, observing variations, and applying corrections [18]. The Motorola 6σ (Six Sigma) method [19] is another common tool used for process improvement. It identifies and removes the causes for failures during the manufacturing process using a clear sequence of actions with specific value targets. SQC and 6σ methods in assembly processes assume all defects are the results of normal distribution of variations that can be observed and corrected. However, many failures during the assembly process can be defined only in terms of probabilistic occurrence due to different types of errors [20] such as missed assembly operation, processing errors, tools and machines’ setting-up errors, missing parts, wrong parts, and tools and jigs’ errors. Hinckley [21] presents the type of defects, the source for these defects, and metrics for assessing the relative importance of each source. He introduces the concept of assembly complexity factor and states that “complexity is the least understood source of defects in assembly processes because of the difficulty of defining relative measures of complexity”.

The complexity of the assembly process can be expressed by quantitative measures (e.g., the number of components and the required assembly operations) and by qualitative measures (e.g., the level of components and assembly difficulty). While quantitative measures are explicit and unambiguous, the qualitative measures are subjective and multitudinous. The expected time for completion of a specific assembly operation is often used as an approximation for the assembly complexity. Hinckley [21] formulates complexity by the probability that an assembly process is successful according to

$$ {P}_Y=\prod \limits_{i=1}^n\left\{{C}_k{\left({t}_i-{t}_0\right)}^k\right\}\left(1-{d}_i\right) $$
(1)

where the notations are as follows:

PYthe probability that the entire assembly process is successful

Ckthe level of the quality control of the assembly operations (Ck > 0)

tithe expected assembly time of assembly operation i

t0the expected assembly time of the benchmark assembly operation

kthe sensitivity of assembly complexity to defects (k > 1)

nthe number of assembly operations

dithe probability that the ith operation is defected (0 ≤ di ≤ 1)

The expected times in Eq. (1) can be determined from direct measurements or by using one of the DFA methodologies. The probability for successful assembly as given by Eq. (1) expresses the effect of the relevant factors on the assembly complexity and provides a tool for comparing design alternatives. However, this complexity definition does not take into consideration the effect of the assembly operations task sequencing on PY. It also does not help in the identification and formation of the assembly batches.

Here, we propose a new definition for assembly complexity that is based on a network of assembly operations and their level of interdependencies that determine the amount of rework. According to this definition, assembly processes are classified by the effect that a failure of one assembly task may have on pervious completed tasks. In simple assemblies, a failure of one assembly task requires rework of that task alone and does not necessitate reworks of prior tasks. A failure of a task in a complex assembly process, on the other hand, requires either scrapping the semi-finished assembly or disassembling some of the components and reassembling them. According to this definition, the assembly complexity is determined according to the mutual dependencies between the assembly tasks. Given the availability and affordability of a wide range of sensors and quality assurance techniques, each task in the assembly process can be accurately monitored and registered independently. Analyzing the outcomes of the quality assurance of each task, the assembly process can be adjusted by either repeating the failed tasks or scraping the entire process and restarting it from the first task.

Figure 4 illustrates these dependencies as a state diagram in which the states represent the assembly tasks and the edges represent the transitions between these tasks. Each transition has a probability (pi, j) associated with it. The probability pi, i + 1 reflects the probability for a successful completion of task i and continuing to the next task in the sequence (i + 1). pi, i is the probability of repeating the same task after a failure, and pi, j ∀ j < i is the probability of returning to task j after a failure in task i. In this case, we define a complexity index (ki = i − j) that indicates the number of steps the assembly process needs to repeat in case of a failure in task i. For example, in Fig. 1, tasks 1 and 2 are simple as a failure requires repetition of the failed tasks only. However, tasks 3 and 4 are complex as a failure in one of them requires repetition of previously completed tasks. Notice that a task may have several complexity indices (e.g., task 4) as there may be probabilities for returning to different previous tasks according to the specific type of failure. According to this definition, the complexity of the entire assembly sequence (CA) is given by

$$ {C}_{\mathrm{A}}={\sum}_{i=1}^{n-1}{k}_i\left(\max \right) $$
(2)

where ki(max) is the largest complexity index of task i.

Fig. 4
figure 4

State diagram of an assembly process

Although most assembly processes are a mixture of simple and complex tasks, in this paper, we consider the worst-case scenario in which all tasks are mutually dependent and are therefore sensitive to failures in the subsequent steps, and each task has the highest possible assembly complexity index such that a failure in one task results in repeating all previous tasks. In particular, we consider the cost associated with the assembly process and the expected assembly time, given the probability for success of each assembly operation.

In the study of Efatmaneshnik and Ryan [22], we introduced a general formulation for tackling complex assemblies by modularization of the entire assembly process into subassemblies for homogeneous operations that are characterized by identical costs and probabilities for success of all assembly operations. In the study of Shoval et al. [23], we considered the effect of assembly sequencing on the expected assembly cost for simple and complex heterogeneous assembly processes. For simple heterogeneous serial assembly processes, we used the traveling salesperson problem (TSP) solver for determining the optimal assembly sequence that provides a lower bound for the assembly process. For complex heterogeneous assembly processes, we presented heuristics that claim that sequencing the more complex operations at the beginning of the process decreases the expected assembly cost. We also showed that when the probabilities for success of the assembly operations are similar, the more expensive operations should be deployed as late as possible (subject to the operational constraints) in order to reduce the total expected assembly cost.

A summary of the results presented in the study of Shoval et al. [23] that is relevant to the modularization of the assembly process is presented here for clarity. As mentioned, we consider a complex assembly system where tasks are fully mutually dependent, and therefore, a failure in the current assembly task requires rework of all previous tasks. Efatmaneshnik and Ryan [22] found that the expected cost of n assembly tasks for a homogeneous system (identical task reliabilities and costs) is

$$ {\widehat{C}}_n={\sum}_{i=1}^n\frac{C}{P^i}=\frac{C}{1-P}\left(\frac{1}{P^n}-1\right) $$
(3)

where P and C are the probability for a successful completion and the cost of all the homogeneous assembly tasks, respectively. An important assumption in the model presented in the study of Efatmaneshnik and Ryan [22] is that the probability for success and the cost are constant, regardless of the number of attempts. That is, the probability and the cost of the first attempt are identical to all additional attempts due to the failures. Note that similar formulation can be driven for assembly time, in which case the cost is replaced by the homogenous time of completing the individual tasks. Because of this similarity, we pay closer attention to the assembly time in Section 3, where the assembly time of modular structures is considered. Since all probabilities and costs for all tasks are homogeneous, the expected cost of the entire process, given by Eq. (3) is not affected by the assembly sequence.

Next, we examine systems with heterogeneous probabilities and costs. Consider an assembly system that consists of n + 1 element that is assembled in n serial heterogeneous tasks, where the costs and reliability values of task i (connecting elements i and i + 1) are ci, i + 1 and pi, i + 1, respectively (i = 1,..., n). Assuming a system with no precedence constraints, then H(n × n!) represents all possible sequences for the n tasks’ system, where each row is a specific assembly sequence. Let Hk (k = 1,…, n!) be the kth row in H that represents a specific sequence, which consists of n tasks, each with a cost (\( {C}_{H_k(i)} \)) and probability for success (\( {P}_{H_k(i)} \)) (i = 1,…, n). Notice that \( {C}_{H_k(i)}={c}_{H_k(i),{H}_k\left(i+1\right)} \) and \( {P}_{H_k(i)}={p}_{H_k(i),{H}_k\left(i+1\right)} \). The expected cost of the entire system is given by

$$ \widehat{\mathrm{C}}\left({H}_k\right)=\sum \limits_{i=1}^{n-1}\frac{c_{H_k(i),{H}_k\left(i+1\right)}}{\prod_{j=i}^n\ {p}_{H_k(i),{H}_k\left(i+1\right)}} $$
(4)

For proof of Eq. (4), see the study of Shoval et al. [23]. The complexity of determining the optimal assembly sequence using an exhaustive search algorithm is O(n2 × n!). This problem is NP-complete and therefore intractable for assembly sizes greater than 20 tasks. However, given the precedence constraints, the number of possible valid sequences is lower, and often the problem can be solved using conventional optimization techniques (e.g., linear programming, graph optimization). Two useful heuristic based on Eq. (4) are that of Shoval et al. [23]:

  1. 1.

    Given equal costs for all assembly tasks, when less reliable tasks (lower probabilities of success) are executed earlier in the process, the expected cost of the whole assembly would be lower.

  2. 2.

    Given all assembly tasks are mutually dependent and have the same reliabilities for success, the minimum expected cost of a complex assembly process is given by a sequence that sorts the costs in ascending order.

3 Modularity in assembly planning

In this section, we examine the effect of modular boundaries on assembly cost and time. In the next section, we examine the effect of modular interfaces. Here, both effects are studied in combination with task sequencing. A complex assembly system, in which tasks are mutually dependent, was initially investigated by Simon [24]. Each time the process is disturbed, it has to be repeated from the first task. According to Simon [24], the solution to this complexity is the division of the process into several subassemblies such that a failure in one task affects only the tasks within that subassembly. Dividing assembly processes into subassemblies (modules) may require additional resources (machines, tools, space, etc.), and possibly new product architectures, as the connectivity between the components may need to be altered. These features should be considered in terms of their entire contribution to the system.

Two principal purposes can be envisaged for modularity in assembly [25]:

  1. 1.

    Reduce assembly cost

  1. a.

    To stabilize the number of components in a module by creating boundaries for subassemblies that protect them from being harmed in the assembly rework process

  2. b.

    Reduce the assembly complexity level by creating interfaces (standard or otherwise) between modules (subassemblies). These interfaces reduce the complexity of the process. By process complexity, we mean the measure introduced earlier in Eq. (2). Interfaces facilitate this by easing the plug-in process that have the ability to decouple the processes, thus removing the dependency between failure and success of subassembly processes. This process will be explained in Section 4

  1. 2.

    Reduce assembly time due to parallelism (further explained in this section)

Let us assume that a serial assembly process that consists of n + 1 elements and n tasks is redesigned such that m break points are added to the original assembly sequence, creating an m + 1 subassembly. In addition, the assembly sequence is replanned, subject to the assembly sequence constraints. The redesigned system is similar to the one shown in Fig. 2, but now, the assembly costs and probabilities for successful completion within each subassembly are marked by \( {c}_{i,i+1}^{M_k} \) and \( {p}_{i,i+1}^{M_k} \), respectively, representing costs and probabilities of assembling elements i and i + 1 within module Mk. These terms consider possible changes in the assembly sequences, as well as special features related to the modular structure. Furthermore, the modular structure includes different types of costs and probabilities: \( {c}_{M_i,{M}_{i+1}}^{BP} \) is the cost of assembling modules i and i + 1, and similarly, \( {p}_{M_i,{M}_{i+1}}^{BP} \) is the probability for success in assembling modules i and i + 1. Notice that assembling two components in a separate module may require additional fixtures, tools, or machines and also may change the probability for successful completion of that task. Therefore, in Fig. 4, \( {c}_{M_i,{M}_{i+1}}^{BP}\ne {c}_{2,3}^{M_1}\ne {c}_{2,3}^{M_2} \) and, similarly, \( {p}_{M_i,{M}_{i+1}}^{BP}\ne {p}_{2,3}^{M_1}\ne {p}_{2,3}^{M_2} \) as assembling modules M1 and M2 are different from assembling elements 2 and 3. In most cases, we anticipate that \( {c}_{M_i,{M}_{i+1}}^{BP}\ge {c}_{j,j+1}^{M_i} \) and \( {p}_{M_i,{M}_{i+1}}^{BP}\le {p}_{j,j+1}^{M_i} \) as connecting two modules is typically more complex than connecting two elements.

Based on the previous notations, the total expected assembly cost in a modular system is given by

$$ {\widehat{C}}^M=\sum \limits_{i=1}^{m-1}\frac{c_{M_i,{M}_{i+1}}^{BP}}{\prod_{j=i}^{m-1}{p}_{M_j,{M}_{j+1}}^{BP}}+\sum \limits_{i=1}^m\sum \limits_{j=1}^{n_i-1}\frac{c_{j,j+1}^{M_i}}{\prod_{k=j}^{n_i-1}{p}_{k,k+1}^{M_i}} $$
(5)

where m is the number of modules, and \( {c}_{M_i,{M}_{i+1}}^{BP} \) and \( {p}_{M_i,{M}_{i+1}}^{BP} \) are the cost and reliability of the ith break point between the ith and the (i + 1)th modules, respectively (assembling modules Mi and Mi + 1). In Eq. (5), \( {c}_{j,j+1}^{M_i} \) and \( {p}_{j,j+1}^{M_i} \) are respectively the cost and reliability of the jth assembly task inside module i, and ni is the number of elements in the ith module. The first part in Eq. (5) is the expected cost of connecting all the modules, and the second part is the expected cost of assembling the elements within each module.

Now, assume \( {t}_{j,j+1}^{M_i} \) is the time associated with the task of joining j and j + 1 components in the ith module, and \( {t}_{M_i,{M}_{i+1}}^{BP} \) is the time associated with assembling modules Mi and Mi + 1. In a modular setting, the assembly of modules can be performed in parallel. To fully utilize the parallelism, at least m working stations are required. Then, the expected minimal total assembly time is

$$ {\widehat{T}}^M=\sum \limits_{i=1}^{m-1}\frac{t_{M_i,{M}_{i+1}}^{BP}}{\prod_{j=i}^{m-1}{p}_{M_j,{M}_{j+1}}^{BP}}+\underset{i=1\dots m}{\max}\sum \limits_{j=1}^{n_i-1}\frac{t_{j,j+1}^{M_i}}{\prod_{k=j}^{n_i-1}{p}_{k,k+1}^{M_i}} $$
(6)

where \( {t}_{M_i,{M}_{i+1}}^{BP} \) is the assembly time of modules Mi and Mi + 1. Again, here, we assume there are enough working stations to accommodate any modularization.

In general, the number of possible partitions of a set that consists of n elements into m subsets is given by the Stirling number of the second kind, also known as Stirling partition number (SPN)

$$ S\left(n,m\right)=\frac{1}{m!}\sum \limits_{i=0}^m{\left(-1\right)}^i\frac{m!}{i!\left(m-i\right)!}{\left(m-i\right)}^n $$
(7)

The SPN considers the number of different partitions but not the order of the elements within each subset. Therefore, the number of permutations of dividing n elements into clusters is

$$ N=\sum \limits_{m=1}^nS\left(n,m\right)\prod \limits_{i=1}^m{k}_i!=S\left(n,m\right)\times n! $$
(8)

where ki is the number of elements in the ith module.

The combined modularization and sequencing problem is NP-hard and therefore can be solved only for simplified cases. Simplification can be done by limiting the total number of elements or by introducing sequential constraints. For example, maintaining a constant sequence reduces the number of possible partitions to 2n − 1 for the n elements’ system. To illustrate this argument, consider a system that consists of eight elements that are assembled in seven sequential tasks. Assume the probabilities for success completion of the assembly tasks are given by the matrix in Fig. 5. The probability of joining every pair of components i and j is the element of the ith column and jth row of the matrix. Given this, the probabilities for the sequence 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 are in the ascending order 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, respectively, where task 1 is the assembly of elements 1 and 2 and task 7 is the assembly of elements 7 and 8. For the sequence 8 → 7 → 6 → 5 → 4 → 3 → 2 → 1, the probabilities are in the descending order 0.97, 0.96, 0.95,…, 0.91. For clarification, assume identical costs and times of 1 (ci, j = ti, j = 1 ∀ i, j) and \( {c}_{i,j}^{M_k}={c}_{i,j} \), \( {t}_{i,j}^{M_k}={t}_{i,j} \), and \( {p}_{i,j}^{M_k}={p}_{i,j} \) such that modularization does not affect the internal costs, times, and reliabilities within a module. Also, assume that \( {c}_{M_i,{M}_{i+1}i}^{BP}=1 \) (the cost of assembling any two modules is 1, and that \( {p}_{M_i,{M}_{i+1}}^{BP}={p}_{m,n} \) where m is the last element of module i and n is the first element of module i + 1). The final assumption is that the modules are mutually independent such that a failure in assembling two modules does not affect previously assembled modules. There are 28 − 1 = 128 possible modular structures for the given sequence, ranging from a single module structure ([8]) to a monolithic assembly structure ([1]). For example, the eight elements can be divided into two modules in seven different configurations ([1, 7], [2, 6], [3, 5], [4], [3, 5], [2, 6], [1, 7]), into three subassemblies in 21 different ways, into four subassemblies in 35 different ways, into five modules in 35 ways, and so on. All 128 MVs for an eight-component system are presented in Table 1 in the Appendix.

Fig. 5
figure 5

The assembly success probability matrix used for the design of optimum assembly

We can analyze the best modularization for particular sequences. For example, consider two extreme sequences: ascending and descending probability sequences. In the ascending sequence elements are assembled in the order of 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8, whereas in the descending sequence, the elements are assembled in the order of 8 → 7 → 6 → 5 → 4 → 3 → 2 → 1. The minimum expected cost and time for ascending order occurs at MV = [2, 3], and that for descending order occurs at MV = [2, 3]. However, these two configurations are not optimum when all other sequences are considered, and despite their symmetry, they do not lead to similar assembly costs and times.

Figure 6 shows the expected cost (Fig. 6a) versus the expected time (Fig. 6b) for all possible scenarios which consist of 128 MVs multiplied by 40,320 sequences, a total of 516,0960 points. Since the minimization of cost and time possibly requires different sequences, we thus suggest using the Pareto optimization to find the non-dominated solutions (sequences and MVs) for the minimum expected cost and time problem. Figure 6b shows the specified region in Fig. 6a, which is a Pareto region, and also the Pareto solutions (global Pareto solutions) within the region. Five MVs constitute all the points in this region: [4], [2, 3], [2, 3], [2, 3], and [2]. The Pareto point characteristics are as follows:

  • \( {\widehat{C}}^M=7.58 \) and \( {\widehat{T}}^M=4.38 \) for MV = [4] and sequences 2 → 4 → 3 → 5 → 6 → 8 → 7 → 1 and 2 → 4 → 3 → 5 → 6 → 7 → 8 → 1.

  • \( {\widehat{C}}^M=7.57 \) and \( {\widehat{T}}^M=4.38 \) for MV = [4] and sequences 2 → 3 → 4 → 5 → 6 → 8 → 7 → 1 and 2 → 3 → 4 → 5 → 6 → 7 → 8 → 1.

  • \( {\widehat{C}}^M=7.61 \) and \( {\widehat{T}}^M=4.33 \) for MV = [2, 3] and sequences 1 → 3 → 2 → 6 → 8 → 7 → 5 → 4 and 1 → 3 → 2 → 6 → 7 → 8 → 5 → 4.

  • \( {\widehat{C}}^M=7.59 \) and \( {\widehat{T}}^M=4.33 \) for MV = [4] and sequences 1 → 2 → 3 → 6 → 8 → 7 → 5 → 4 and 1 → 2 → 3 → 6 → 7 → 8 → 5 → 4.

Fig. 6
figure 6

Panel a shows the plot of assembly time versus the cost for each configuration determined by a MV and a sequence. Panel b shows the close up look into the region circled out in a and the MV numbers that form that region. The Pareto solutions are also shown

From Fig. 6b, an important observation can be made: that the MVs in the Pareto region have particular characteristics that also lead to a useful heuristic for modular assembly formation. First, the size of each module in these MVs is lower or equal to n/2 (= 4). Second, these MVs are the most balanced ones, meaning they have minimum standard deviation (SD) among the categories of similar sized MVs

$$ {\displaystyle \begin{array}{c}C1:m\le \frac{n}{2}\\ {}C2:\underset{\mathrm{M}{\mathrm{V}}_m}{\min}\mathrm{SD}\left[\mathrm{MV}\right]\end{array}} $$
(9)

Note that the two constraints need to be observed together and that this heuristic does not advise on a particular sequence. The standard deviation of MVs can be found in Table 1 in the Appendix. These two rules can be examined further in Fig. 7a, b where the minimum expected cost and minimum time for each MV (over all sequences) are respectively plotted against the standard deviation of corresponding MVs. The figure classifies different solutions according to the size of the MVs (m). To verify this heuristic, a further example with different P and C matrices is shown in Fig. 12 in the Appendix.

Fig. 7
figure 7

The minimum expected assembly cost (a) and time (b) versus the standard deviation of their MVs. The encircled regions show that validity of the two-step heuristic

4 Modularization costs

Let us now loosen some of the assumptions used for simplification in the previous section by considering the effect of modularization and modular interface creation on the expected cost. In simple mechanical systems, the additional connection costs of the modules are due mainly to supplementary manufacturing cost of the mechanical components. A simple illustration of this is brought in Fig. 8. However, in more complex systems, the connection costs between modules, as well as the probabilities for success, depend on the relations between the elements of the modules as determined by the design structure matrices (DSMs) or by connectivity graphs. The connection costs and probabilities between modules are determined by the internal costs of connecting the elements within the module and by “bottleneck” or “external” connections, which represent interactions and dependencies between elements that belong to different modules [26, 27]. Figure 9 shows the DSMs of two products [28]: PW4098 Jet Engine (Fig. 9a), and Kodak single-use camera (Fig. 9b). Although the two systems consist of a nearly similar number of modules (9 for the jet engine and 8 for the camera), there are significant differences between their modular structures, as well as differences between modules within the same product. For example, the “waterproof” module of the camera has a single external interface with the “shutter” module (element 4 is the “rubber cover” and element 28 is the “shutter release”), while module 8 of the jet engine has more than ten external connections with each of the other modules in the engine.

Fig. 8
figure 8

Assembling four blocks using simple mutually dependent connection (a), modified modular assembly interfaces and mutually dependent connections (b), and a fully modular assembly configuration with modular interfaces (c)

Fig. 9
figure 9

Design structure matrices (DSMs) for the PW4098 Jet Engine system (a) and the Kodak single-use camera (b) [23]

In general, the intermodular connection costs and probabilities are given by

$$ {\displaystyle \begin{array}{c}{c}_{M_k,{M}_{k+1}}^{BP}=\mathcal{F}\left(E\left({n}_i\right),E\left({n}_{i+1}\right)\right)\ \\ {}{p}_{M_k,{M}_{k+1}}^{BP}=\mathcal{M}\left(E\left({n}_i\right),E\left({n}_{i+1}\right)\right)\end{array}} $$
(10)

where E(ni) and E(ni + 1) are the subsets of elements in modules k and k + 1, respectively. Similarly, the costs and probabilities of the internal connections between elements within a module are determined by the costs and probabilities of the internal connections given by

$$ {\displaystyle \begin{array}{c}{c}_{i,j}^{M_k}=\mathcal{H}\left(E\left({n}_k\right)\right)\\ {}{p}_{i,j}^{M_k}=\mathcal{L}\left(E\left({n}_k\right)\right)\end{array}} $$
(11)

where E(nk) is a subset of elements in module k. As mentioned in Section 3, \( {c}_{i,j}^{M_k} \) and \( {p}_{i,j}^k \) may be different from ci, j and pi, j due to special features within the module. The goal is to determine an optimal modular structure that minimizes the total expected assembly cost and time as given by Eqs. (5) and (6).

A simple example to demonstrate the above follows. Consider the binary DSM of a system that consists of the eight elements shown in Fig. 10a. The optimal clustering of this system results in three subassemblies without any external (bottleneck) connections (three independent subsystems), and therefore, the only costs are related to the internal connections of the elements within each subassembly (Fig. 10b). Furthermore, a failure in any assembly task can affect only the tasks in the same cluster. Assume the nominal connection costs and probabilities between the elements as ci, j = ti, j = 1 and pi, j are those shown in Fig. 5, with no assembly sequence constraints. If we assume that the assembly cost between any two modules is determined by \( {c}_{M_k,{M}_{k+1}}^{BP}={t}_{M_k,{M}_{k+1}}^{BP}=\left(\sum {c}_{i,j}^b\right) \), where \( {c}_{i,j}^b \) is the cost of a bottleneck connection between elements i and j (let us refer to this by case 1), the optimal modularization and sequence is that depicted in Fig. 10c. However, if we raise the assumed cost by \( {c}_{M_k,{M}_{k+1}}^{BP}={t}_{M_k,{M}_{k+1}}^{BP}=\left(\sum {c}_{i,j}^b\right)\left({n}_k+{n}_{k+1}\right) \), where nk and nk + 1 are the number of elements in modules k and k + 1, respectively (and let us refer to this scenario as case 2), the optimal modularization is no modularization (Fig. 10a). Figure 11a, b shows the general landscape for the cost and time of different modularizations under cases 1 and 2, respectively. For case 1, the following sequences lead to the minimum cost and time of 7.8692 and 4.4795, respectively, when used with MV = [4]:

$$ 1\to 3\to 8\to 6\to 2\to 4\to 7\to 51\to 3\to 8\to 6\to 2\to 4\to 5\to 71\to 3\to 6\to 8\to 2\to 4\to 7\to 51\to 3\to 6\to 8\to 2\to 4\to 5\to 7 $$
Fig. 10
figure 10

DSM for the eight elements’ system (c = t = 9.4250) (a) and the optimal clustering of three subassemblies (c = 8.7321, t = 5.4715) (b) and optimal clustering for minimum assembly cost and time (c)

Fig. 11
figure 11

The assembly time versus cost for case 1 (a) and for case 2 (b)

All four sequences lead to similar probabilities (0.9100, 0.9300, 0.9600, 0.9200, 0.9200, 0.9400, 0.9500). This example shows the importance of interface cost modeling for sequencing and modularization problem.

5 Conclusions

Modularity is an important feature in any complex system. It is particularly important in cyber-physical systems (CPSs), where many physical and computational resources are integrated and interconnected. Modularity is one of the major principles in the Industry 4.0 structure, as systems are becoming more complex, and the traditional production hierarchy is replaced by alternative dynamic structures.

This paper considered the design of complex assembly processes, given that each assembly task has cost and reliability values associated with it. The reliability value denotes the probability of the assembly task to be completed successfully in a single attempt. Given the availability and affordability of a wide range of sensors and quality assurance techniques, each task in the assembly process can be monitored, and if required, the process is adjusted according to the updated information. In particular, the paper focuses on the assembly process in which the tasks are mutually dependent such that a failure of a specific assembly task is backpropagated to the previous tasks. In such a case, the assembly must restart from previous tasks and, in the extreme case, even from the first tasks. Our previous work focused on the effect of task sequencing in assembly processes with probabilistic success rate on the expected process cost and duration. This paper illustrates the importance of considering task modularization prior to task sequencing in complex assemblies in order to reduce the effect of the mutual failure dependency. Simulation results indicate that a balanced modularization, in terms of the number of assembly tasks within each module, as well as sequencing the tasks in a descending order of the probability for failure of each task, improves the process performance. The cost savings that might result from this simple heuristic are calculable and depend on the nature of the assembly system and the assembled product. For simple assemblies, the presented heuristic does not hold, as the only factors affecting their sequencing are the assembly liaison matrices and precedence relations [29]. However, as the assembly process becomes more complex, the advantages of modular assembly become more apparent. The paper unveils several hidden complexities even for presumably simple assembly processes.

The implementation of the proposed method requires considerations at both the product design and the assembly process planning stages. At the product design stage, the interaction between the different components, mainly in terms of the product’s liaison graph and the assembly sequence, must be considered. In particular, a failure in the assembly process of a particular component may have an effect on other components and subassemblies. In the assembly process stage, the planners should consider the effect of failures on the logistics of the assembly process, in terms of the cost of each assembly operation, as well as the required resources in the assembly operations (e.g., manpower, special machinery and tools, transportation, assembly scheduling).

In practice, designers and planners can use standard risk management tools such as probabilistic risk assessment (PRA), event tree analysis (ETA), and fault tree analysis (FTA) to address the uncertainties with metrics, parameterization, and prioritization. The PRA tools, also known as likelihood consequence or probability impact tools, use the magnitude and the likelihood of potential failures in order to express the expected loss. The PRA tools can provide the designers and planners with insights regarding the sensitive components and tasks that can initiate significant failures, the detriments of these failures, and their probabilities or frequencies. The ETA and FTA tools can guide designers and planners in optimizing the use of resources, assist in planning quality assurance and monitoring procedures, and assist in the development of diagnostic manuals and processes.

One limitation to this work is that it uses fairly simple examples to illustrate the utility of the presented model. The number of components in these examples was limited to no more than eight components. This is because we could not efficiently run the model in MATLAB with a greater number of components, primarily due to the large number of sequences that needs to be considered. MATLAB could not run the code for examples with more than 18 components, which is relatively small compared to the size of most complex assemblies. There are two possible ways to address this issue. One is to use precedence graphs as shown in the study of Shoval et al. [23] to limit the number of possible sequences. The second possible solution is to use genetic search algorithms such as for effective search for optimal modularization and sequencing. We will take on the challenge of system size in future work.