Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In 2014, the United States manufacturing industry produced $2.1 trillion worth of goods and supported 12.3 million jobs [1]. While these figures are impressive, there has been a declining trend in manufacturing’s share of the Gross Domestic Product (GDP). The reasons for this decline include increasing global competition, sustainability concerns, and uncertainties in the cost and supply of materials [2].

Traditionally, cost, quality, productivity, and throughput are the major considerations when selecting materials, manufacturing processes or developing production plans. However, environmental sustainability is now considered to be the fourth such consideration. Even though it may negatively impact the other three, sustainability is deemed critical for an organization to succeed in today’s markets.

To better understand and predict those impacts, a type of manufacturing systems, called smart manufacturing systems (SMS), is being proposed. SMS are characterized by the wide availability of data that can shed light on those impacts and predictions. This data is expected to improve real-time system planning and operational decision-making. But this can be achieved only if context and meaning can be deduced from it. Data collected by smart sensors, radio frequency identification (RFID), and wireless communications, is described by volume, velocity, variety, veracity, validity, volatility, and value—the so-called 7 Vs of big data [3]. Figure 1 from United Nations Economic Commission for Europe (UNECE) [4] shows the recent past, current, and projected “explosion” of business data.

Fig. 1
figure 1

Growth in big data (Source UNECE)

Gröger et al. [5] identifies two types of collected data: process data and operational data. Process data is made up of execution data, which includes flow-oriented machine and production-events data recorded by the Manufacturing Execution System (MES). Operational data mainly encompasses Computer-Aided Design (CAD), Computer-Aided Process Planning (CAPP), and Enterprise Resource Planning (ERP) data. These data are interrelated and influenced by many factors including hidden patterns, correlations, associations, and trends. It is our view that uncovering these factors can contribute to the decision-making process considerably. The conventional approaches have inherent limitations for deriving actionable recommendations based on the process data and operational data [6]. Thus, a new methodology is needed.

In this chapter, we describe such a methodology combining simulation, data mining, and optimization specifically for utilizing the large amount of process and production data. This methodology is demonstrated with a case study of a machining process where environmental impacts and production time are the performance measures. The objective is to choose the process sequence, production plan, manufacturing resources, and parameter settings that optimize both above-mentioned measures during production operation.

Traditionally, simulation has been used to investigate the performance of manufacturing systems under a predefined set of scenarios. Better et al. [7] observe, however, that in a system with a high degree of complexity and uncertainty, (1) it is not always obvious which parameters and variables to focus on to improve performance nor (2) is it evident to what extent these variables should be changed for optimal operation. Brady and Yellig [8] also observe that in complex simulation systems of real-world problems, it is challenging to determine a set of input variables and their values to obtain optimal output. One major reason is that data contains intricate dependencies, which must be determined before generating the input to simulation models. Techniques and methods are needed to discover such dependencies before performing simulation analysis. In summary, the main contribution of this chapter is the novel methodology that integrates data mining, simulation, and optimization techniques for more effective model parameter identification, simulation input preparation, and actionable recommendation derivation.

In our proposed methodology, data-mining is used to develop high-level association rules among various kinds of data, including performance data. The outputs from using those rules are used as inputs to the simulation model. Simulation optimization then determines the best process and operational parameter settings to obtain actionable recommendations for decision makers and operators. We believe that the combined effect of data mining, simulation, and optimization can improve manufacturing decision making in face of big data and system complexity.

We use a case of a small machine shop with two performance objectives: minimize production time and resource—material, energy, and water—usage during the machining processes. Each part design has a different process plan. Some machines can perform more than one process. However, the sequencing of parts through the shop depends on the users’ objectives. The choice of a machine for a given process will produce different impacts on both performance objectives.

The chapter is structured as follows: Section 2 provides a background to data mining with a focus on the unsupervised learning techniques: association and clustering. Section 3 overviews simulation modeling for manufacturing applications. Section 4 reviews simulation optimization methods and techniques as they are currently applied to decision-making in manufacturing. Section 5 illustrates the integration of data mining, simulation, and optimization. Section 6 presents the proposed methodology and the strengths of a combined-methods approach. Section 7 presents a case study demonstrating how energy and production time can be optimized in a machine shop based on the methodology. Section 8 presents a summary and discussion of how the methodology can be implemented highlighting integration needs.

2 Background to Data Mining

This section provides a background to data mining techniques in manufacturing particularly association and clustering that are relevant to the work of this chapter.

2.1 Data Mining Techniques

Data mining is the process of discovering knowledge hidden in large amount of data [9]. The data being mined is typically observed data—as opposed to experimental data—so the data mining techniques employed have no influence on the data-collection methods. In Agard and Kusiak [10], for example, the authors show how to mine data stored in ERP, previous schedules, and MES to gain knowledge about the best choice of manufacturing processes based on defined design characteristics.

Data mining techniques draw from several disciplines including statistics, visualization, information retrieval, neural networks, pattern recognition, spatial data analysis, image databases, signal processing, probabilistic graph theory, and inductive logic programming. Data mining can in general be distinguished into two groups: descriptive and predictive. Descriptive techniques describe events from data and factors that are responsible for them. Predictive techniques attempt to predict the behavior of new data sets. Both techniques use the same general approach which is to (1) identify data fields and types and (2) specify the data as discrete or continuous.

Our current focus is on predictive data mining. Predictive data mining types include supervised learning, unsupervised learning, and semi-supervised learning [11]. With supervised learning, output variables are known or predetermined and the purpose of a learning algorithm is to develop a function that maps output variables to the inputs. Output variables corresponding to any given inputs can then be predicted using the learned function. Semi- supervised learning problems have only some of the data associated with output variables.

Unsupervised learning is where none of the input data is associated with prior defined responses, called data labels. The objective for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about or discover any patterns in the data. In other words, the intent of unsupervised learning is to understand hidden data concepts where the data labels are not known beforehand.

The case study described in this chapter involves predicting the best operational performance of a manufacturing system based on collected data, from which the best parameters are determined. Therefore, we investigate a case of unsupervised learning. Unsupervised learning techniques include clustering and association rule data mining [11]. We further discuss association and clustering as follows.

2.1.1 Association

Association techniques (1) discover relationships among large volumes of data and (2) represent those relationships as rules that “describe” the data. Discovery is based on the probability of co-occurrence of items in a collection of a large data set. The relationships between co-occurring items are expressed as association rules. Conceptually, an association rule indicates that the occurrence of certain items in a transaction would imply the occurrence of other specific items in the same transaction [12]. In other words, there is a supposed phenomenon within the system that makes these two types of items to concurrently occur.

The aim of association data mining is not to try to understand the underlying phenomenon. Rather, the association learning process attempts to determine the relating association rules. The idea of mining association rules originates from the market analysis where rules such as “a customer who buys products A and B also buys product C with probability p.” In theory, given enough manufacturing data, such rules could be derived, rules that help explain the relationship between the values of the input data and the values of output data representing system performance. These rules could be of the type “if parts are sequenced such that process A is performed before process B, then there is an increase in the total energy consumed per part with probability p.” Our focus then is on understanding the relationship between input and output variables (performance data)—that is, the rules—as well as the ranges that these variables can take. Algorithms for association rules learning concentrate on obtaining statistically significant patterns, and deriving rules from those patterns [13]. This is done by finding the frequency of concurrence of items from a transaction dataset and generating association rules based on user specified minimum confidence.

For example, one rule might say “that if we pick any product at random and find out that it was processed according to a given processing sequence through the factory floor, we can be confident, quantified by a percentage, that its production time is larger than average.”

Association rules derived from data should be reliable. The measures of rules obtained include support and confidence of a rule. Support is a proportion of items in the data that contain a given set of items that occur together. Confidence is an indicator of how often a rule has been found to be true.

The main challenge of association rule induction is that there are so many possible rules. For example, the large product range of a typical job shop results in several classes of product designs, materials, and processing requirements. The rules cannot be processed by inspecting each one in turn. Therefore, efficient algorithms are needed to restrict the search space and check only a subset of all rules, but if possible, without missing important rules. One such algorithm is the Apriori algorithm [14], which is the algorithm used in the work of this chapter.

2.1.2 Clustering

Clustering is the process of identifying a finite set of categories, called clusters, that “describe” the data. Clustering techniques segment large data sets into smaller homogeneous subsets that can be easily managed, separately modelled, and analyzed [15]. Clusters are formed such that objects in the same cluster are more similar to each other than objects in different clusters.

Clusters correspond to hidden patterns in the data. Clusters can overlap or be at multi-level dimensions such that a data point can belong to more than one cluster. A clustering algorithm creates clusters by identifying points closest to the center of a cluster and expanding outwards up to a certain threshold when a new cluster needs to be formed. The process continues until all data points are assigned to a cluster [11].

In manufacturing applications, Kerdprasop [16] used clustering techniques to determine patterns and relationships in multidimensional data to indicate a potential poor yield in high volume production environments. Another potential application is determining relationships that can help differentiate categories of parts that can be processed by similar machines. The features of such parts are used to form a cluster and are useful in developing cell manufacturing systems through “group technology.”

3 Modeling and Simulation for Manufacturing Applications

A simulation is a computerized model of a real, or a proposed, system. Users can conduct experiments with such a model to better understand the likely behavior of that system for a given set of conditions and scenarios [17]. Because of the dynamic nature of manufacturing operations, most simulation models are stochastic.

Law and Kelton [18] summarize the benefits of modeling and simulating manufacturing systems. First, they help identify and quantify the equipment and personnel. Second, they can predict the probability distributions associated with performance. Third, they can be used to evaluate operational procedures. Fourth, they can take into consideration of stochastic behavior of the system.

Objectives and types of simulation models needed may differ within each life cycle phase of a manufacturing system. Table 1 shows typical objectives and simulation types in various phases of a system life cycle. A review of 317 simulation papers by Negahban et al. [19] shows three major research topic areas: simulation language development, manufacturing system design, and manufacturing system operation. For manufacturing system operations, simulation helps users understand, assess, and evaluate the operation so that the ‘best’ configurations that result in ‘optimum’ performance can be determined.

Table 1 Simulation application in different stages of manufacturing system life cycle

Manufacturing simulation has been widely used for determining policies or rules to be employed in specific operational situations [20]. But, for a long time, its application to real-time control was limited by computational capacity, system reconfiguration time, data, and optimization issues. These days, a number of technologies is helping to overcome those limitations. Those technologies include high-speed computation, communication, integration technologies, standards, and automated data collection and processing [21, 22]. Simulation models can be updated with data to provide capacity to foresee the impact of new orders, equipment failures, and changes in operations.

Simulation can also be used for generating and filling gaps in missing data for analysis by other methods. Shao et al. [23] demonstrate how simulation could be used to generate data to help evaluate the performance of data-analytics applications. For this approach to be effective in real applications, however, data-generating models require improved verification and validation methods.

One of the major activities of simulation projects is input-data preparation. Previous research efforts have attempted to address this issue. For example, Skoogh et al. [22] demonstrates a Generic Data Management Tool (GDM—Tool) for data extraction, conversion, cleaning, and distribution fitting. The GDM—Tool enables data reuse, thereby, reducing needed time for carrying out simulation projects. However, finding optimal parameters and settings from a large volume and variety of streaming data, as addressed by this chapter, cannot be carried out using the GDM-Tool.

The common approach for simulation-based decision-making is to prepare and run a number of scenarios and select the one with best outcome, as shown in Fig. 2. However, this approach is very difficult for a complex system with several inputs, particularly if model execution time is long. In addition, the quality of the answer obtained largely depends on the skill of the analyst who selects and defines the scenarios.

Fig. 2
figure 2

The conceptual relationship between inputs and outputs of a simulation model

Identifying the “best solution” requires an optimization process, which is mostly the maximization or minimization of the expected value of the objective function of a problem [24]. Brady and Bowden [25] proposed two approaches for integrating simulation and optimization. The first is to construct an external optimization framework around the simulation model. The second is the internal approach, to investigate the relationship between input variables based on the dynamics of their interaction within the simulation model. This chapter uses the first approach. The importance of optimization has led simulation vendors to include optimization modules as part of their tools.

4 Simulation Optimization

Simulation optimization is the search for specific values or settings of controllable input parameters to a simulation such that a target objective is achieved [26]. This objective depends on simulation input. The procedure for optimization is to define a set of decision variables and optimize (i.e., maximize or minimize) the designated performance subject to constraints and bounds on range of the decision variables. Azadivar [24] formulated one form of the simulation optimization as:

$$ \begin{array}{*{20}l} {{\text{Maximize }}\left( {{\text{or}}\,{\text{minimize}}} \right)} \hfill & {{\mathbf{f}}(\varvec{X}) = {\text{E}}\left[ {\varvec{z}(\varvec{X})} \right]} \hfill \\ {{\text{subject}}\,{\text{to}}} \hfill & {{\mathbf{g}}(\varvec{X}) = {\text{E}}\left[ {\varvec{r}(\varvec{X})} \right] \, \le \, 0} \hfill \\ {\text{and}} \hfill & {{\mathbf{h}}(\varvec{X}) \le {\mathbf{0}}.} \hfill \\ \end{array} $$

where z and r are random vectors representing several responses of the simulation model for a given X, a multi-dimensional vector of decision variables. The functions f and g are the unknown expected values of these vectors, which can only be estimated by observations on z and r. That means that the objective function (objective functions, in case of a multi-criteria problem) and/or constraints, are responses that can only be evaluated by simulation. The variable h is a vector of deterministic constraints on the decision variables.

The specific optimization algorithms used often depends on the type of simulation method. Because running a simulation model requires significant computations, efficient optimization algorithms are crucial. Some of the optimization methods that are applicable to different simulation types are overviewed by Amaran [26]. Carson and Maria [27] categorize simulation optimization methods into gradient based search methods, stochastic optimization, response surface methodology, heuristic methods, A-teams, and statistical methods. Fu et al. [28] reviews the state of practice for simulation optimization.

Table 2 shows a sample of commonly used simulation-based optimization tools. Researchers also often develop custom-made optimization tools based on simulation software for particular situations. Phatak et al. [29] introduce an example of an in-house optimization tool for manufacturing problems based on the particle swam optimization algorithm.

Table 2 Optimization search strategies for selected simulation tools

Unlike mathematical-programming formulations of optimization problems, there is no way of telling whether an optimum has been reached using simulation-based formulations. The optimization packages, such as those shown in Table 2, seek improved system performance by changing settings of system parameters. Consequently, these packages develop a solution incrementally by building upon earlier solutions to obtain a better one. The packages do this by proposing new simulation inputs, executing the simulation, and evaluating the performance iteratively [7]. Figure 3 illustrates this procedure.

Fig. 3
figure 3

Process of getting a solution using simulation-based optimization

5 Integrating Data Mining with Simulation and Optimization

Embedding an optimization module into simulation tools, as described in Sect. 4, provides actionable solutions. However, determining the set of inputs that optimize system performance is challenging because of the large volume of data, and number of possible input parameters and their interactions. Although tools such as the input data management to simulation have been developed from previous research [22], they do not address the data challenges discussed in the previous section. We propose using data mining as a technique to help obtain simulation scenarios through association of collected data with system performance. Remondino et al. [30] described two ways of combining data mining with simulation. The first, called micro-level modeling, is where data mining is applied on historical data to (1) develop the appropriate scenarios and (2) tune scenario-based simulation input parameters.

The second, called macro-level modeling, is where data mining analyzes simulation output data to (1) reveal patterns describing system behavior and (2) develop ways to use those patterns to aid decision-making [31, 32].

Our proposed methodology is based on micro-level modeling the first approach. Figure 4 shows the high-level components and their interactions. Two features are combined with the classical simulation modeling and analysis: data-mining and optimization. This approach is suitable for both static and dynamic data.

Fig. 4
figure 4

Data mining integrated approach to simulation optimization

6 A Methodology for Manufacturing System Optimization

Based on the review and discussion of Sects. 1 and 2, we conclude that (1) modeling and simulation tools cannot directly use streaming data, and (2) further analysis is needed to obtain actionable recommendations from the patterns and rules obtained by data mining. Therefore, a methodology combining different methods is needed. Operational steps for this methodology illustrated in Fig. 5 are next described.

Fig. 5
figure 5

Illustration of methodology steps

In summary, the user first formulates the problem by specifying the scope, high-level performance objectives, indicators, and metrics. This is followed by acquiring domain knowledge and developing a conceptual model to understand model requirements, activities, and processes. The next step is to collect data and apply data mining techniques on the data. The final steps are simulation modeling and optimizations. Detailed description of these steps follows in the next paragraphs.

Formulate the problem: This is the definition of the goal and scope of the project. The target plant, work cell, machine, manufacturing operations, or processes are specified at this step. The goal might be, for example, to minimize energy consumption for a foundry shop or to maximize throughput of a machine shop. Relevant resources, operational details, constraints, products, activities, and data collection points are also identified.

Acquire domain knowledge: This is the step to acquire domain knowledge for executing the project. Domain knowledge includes a thorough understanding of the manufacturing processes and system, indicators, metrics, and performance objectives and goals. In addition, knowledge about software (data mining, simulation, and optimization), data collection, communication, and storage are also required.

Design a conceptual model: This is the step to construct a high-level conceptualization of the problem so that the system can be better understood and modeled in detail. The model should provide the right level of abstraction to maintain the focus on the objectives and understand the problem before initiating the modeling and analysis. When designing a conceptual model, the following typical questions need to be answered to help modelers abstract the problem and plan the detailed modeling (1) What are the components (systems/processes) that need to be modeled? (2) What are the inputs and outputs of each component? (3) What are the relationships between components? (4) What are the indicators and metrics? and (5) What are the data requirements for the metrics? The conceptual models also help identify requirements for data collection. There are a number of available conceptual modeling methods and techniques including workflow modeling, workforce modeling, object role modeling, and system modeling. A system modeling language such as SysML [33] would well be used for the conceptual model to represent requirements for analysis and decision making.

Collect data: Manufacturing data is mainly collected through the use of sensors, bar codes, vision systems, meters, lasers, white light scanners, and RFID. Data collected is mainly process execution data, i.e., machine and production events recorded by the MES. From machine tools, for example, this data may include machine name and type, process, processing time, idle time, loading time, energy consumption, machine setting, tool, changeover time, and tear down time. MTConnect is a standard that can be used for data collection [34]. For data storage, Structured Query Language (SQL) is one of the means of managing data. A data model should be developed for efficient management.

Perform data mining: There are a variety of data-mining techniques and tools available. They are based on the methods reviewed in Sect. 2.1. The choice of a technique depends on the particular problem. If we use association rule learning, the applicable tools include Weka, R-programming, Orange, Knime, NLTK, ARMiner, arules, and Tanagra.

Mathematically, the performance indicator, y, e.g., energy, can be represented as a function:

$$ y = \left( {x,w} \right), $$

where \( x = \left( {x_{1} ,x_{2} ,x_{3} , \ldots ,x_{d} } \right)^{T} \) denotes the set of system parameters that are associated with the amount of energy used and w denotes the weight of the parameters. In the work presented in this chapter, y is known and the task of data mining is to determine the system parameters x.

Perform simulation modeling and optimization: The system is represented by a simulation model. Many simulation tools are supplied with optimization modules (as shown in Table 2). Typically, these tools automatically execute multiple runs and systematically compare the results of a current run with past runs to decide on a new set of input values until the optimum is gradually approached. Core manufacturing simulation data (CMSD) standard can be used for representing the input data for the simulation modeling [35].

Derive actionable recommendations: The final step is to derive actionable recommendations by interpreting and translating the output from the optimization process. The users also need to check if the recommended actions conflict with existing knowledge about the system and resolve this conflict if necessary. As Fig. 5 shows, the system performance can be monitored while data is continuously collected so that a new set of decisions can be made when needed.

7 Case Study: Minimizing Energy Consumption and Production Time in Machining Operations

Machining is one of the major manufacturing processes in the metal industry. The process inputs, removal processes, and waste byproducts have a large potential environmental impact. Currently, the relationships among them and their impacts on the environmental have not been fully investigated. As a result, methods for determining control inputs that optimize production objectives have not been fully developed [36]. This section describes how the proposed methodology was applied to a case study that uses data from a machining process for decision making. This case is a first step for understanding and implementing the proposed methodology.

Many machined parts are produced in job shops. The case under study is based on a machining job shop that was used in the research work reported in Kibira et al. [36], and Hatim et al. [37] for simultaneously optimizing process plans and production plans. In this investigation, we use a different part design. The shop consists of the following machine tools: a turning lathe, a milling machine, a drill press, and a boring machine. When orders are received and batched, it can be decided to focus on any or all of these performance objectives (1) minimize costs (e.g., labor, cutting tool, and energy), (2) minimize resource usage (e.g., material, energy, and water), and (3) maximize production. Figure 6 is a conceptual view of work flow through the machine shop.

Fig. 6
figure 6

Conceptual view of inputs and impacts of the machining shop

Each production batch, or each part in the batch, can potentially have its own process plan because the user can choose different sets of machine and tools to produce a given design feature on a part. We propose three approaches for the sequencing of part-feature production: a predefined, a partially defined, and an unspecified process plan. In the predefined case, each process has a pre-determined machine and cutting tool, determined to optimize a given performance objective such as minimum energy use. In an unspecified case, a machine is selected for processing by a part according to a priority rule such as always choose the machine with minimum number of parts waiting. The partially defined case is a combination of these two. Either of these choices results in a different process plan and hence different energy consumption, production time, and cost. Process and performance data is collected for each batch as it passes through the machine shop. Both types of data depend on resources used for each process within the process plan.

Formulate the problem: The scope and focus is on a machine shop and target product is a grinding head shell, shown in Fig. 7. The manufacturing processes for this part are facing, grooving, threading, spot drilling, and drilling. The objective is to select a sequencing plan, a machine tool, and cutting tools for each process so as to minimize energy consumption and production time.

Fig. 7
figure 7

Grinding head shell

Acquire domain knowledge: The following expert knowledge was acquired before beginning to model the machine shop operations including production resources, machining processes, energy consumption in machining, machining time, production planning and sequencing in job shop environment, costs of manufacturing processes, performance indicators and metrics, and performance data. Take production resources as an example. Table 3 shows the manufacturing processes to produce a grinding head shell and the machine tools available in the machine shop. These are Computer Numerical Control (CNC) lathe, three-axis vertical milling, press-upright drill, and mills-horizontal boring machine. For each machining operation, one or more cutting tools can be chosen to meet the required specification. Table 4 shows tool types available for each machine. Cutting tools are as follows: turning (single-point tipped tool, form turning, drill), milling (slot milling, mill cutter, form milling), drilling (center drill, reamer), and boring (boring tool). When a machine may perform a particular operation, each type of tool would perform it differently, which potentially results in different production time and energy use.

Table 3 Resource information for manufacturing the grinding head shell
Table 4 Tool types for use by each machine tool

Design conceptual model: The conceptual model shown in Fig. 8 is a schematic representation of the problem, activity sequences, and information flow. It includes product design, feature sequence, process selection, machines and tools requirements, and performance indicators that drive the above selections. The part design describes design information, including the features’ forms, shapes complexities, dimensions, tolerances, and surface conditions. Alternative networks that describe features’ processing precedence during fabrication are described. Next, a set of processes to manufacture a part is determined according to the part’s functionalities and design requirements. The combinations of machines and tools that satisfy the design and process requirements are designated. Performance indicators determine the actions that give the machine shop the best chance to meet those objectives.

Fig. 8
figure 8

A conceptual model representing the production process and information flow to a part design

Model data collection: Based on domain knowledge acquired and the conceptual model developed, mathematical expressions from published literature are used to calculate energy consumption and processing time of the processes [38,39,40,41,42,43,44]. The processes in question are turning, milling, and drilling. A matrix of process and prospective machine, and cutting tool to carry out the processing is used to determine the production time and the energy consumed. Three examples are provided to show the expressions employed.

The time to perform a turning operation is given by \( T_{m} = \frac{\pi DL}{ vf} \),

the time for a drilling process is given by

$$ T_{m} = \frac{{\pi D_{c} (d + 0.5 D_{c} { \tan }(90 - \frac{\theta }{2}))}}{ vf}, $$

and, the energy consumed by a plain milling operation is given by

$$ E = C_{z} azD_{c}^{b} f^{u} d^{e} v T_{m} . $$

for one specific case used in the model

$$ E = \frac{{68.2 a z D_{c} ^{ - 0.86} f^{0.72} d^{0.86} v}}{6120}T_{m} $$

where

D, L :

workpiece diameter and length,

v, f, d :

cutting speed, feed rate, and depth of cut,

\( T_{m} \) :

machining time,

\( D_{c} \) :

diameter of the milling cutter or the drill diameter,

\( z \) :

No. of teeth in milling cutter or no. of flutes in a tap,

\( \theta \) :

drill point angle,

\( C_{z} \) :

constant of the milling operation,

a :

milling width,

b, e, u :

constants that are determined empirically. These are tabulated for different types of machines and tools.

The values for some of these expressions are known constants. The others are random variables, which have been fitted with probability distributions to simulate their real—world variability. Only the energy consumed during processing can be generated using such mathematical expressions. The same applies to production time. The data set becomes (1) Machine that performs a process, (2) Tool type, (3) Sequencing plan, (4) Energy consumption, and (5) Production time. The machine cutting parameters (cutting speed, feed rate, and depth of cut) are set at constant values. The energy consumed differs for the sequencing plans because of difference in volume of material removed by a process for each plan.

Data analytics: Each line in the data obtained forms a transaction where “transaction” data = {Sequencing plan, Operation, Machine tool, Tool, Energy consumed, Production time}. Data mining is performed to determine the various relationships between the range of parameters and how those relationships impact system performance. Those relationships are represented as association rules. Determining the association rules was carried out using open source software developed for academic and research purposes named Tanagra [45].

Tanagra performs exploratory data analysis, statistical learning, and machine learning. It is suitable for both supervised and unsupervised learning. It uses a number of algorithms and approaches that employ techniques such as clustering, factorial analysis, parametric and nonparametric statistics, association rule, and feature selection and construction algorithms are implemented by Tanagra. We use the Apriori algorithm, which uses a “bottom up” approach and frequent subsets are extended—one item at a time—and tested against the data. The inputs to Apriori are the sequencing plan, machine tool, cutting tool type, energy consumption, and production time.

The outputs are the relationships between various factors expressed in the form of rules. Each rule has antecedents and consequents. Antecedents are the left hand side of a rule, implying that these are the factors and their values that are responsible for the results on the right hand side, also called consequents. As such, we are interested in antecedents that result in production time and energy consumption on the right hand side as consequents.

While energy consumption is generated as quantitative data, it was transformed into discrete variables using simple thresholds as the basis for the discrete classifications (“low,” “medium,” and “high”). Within the Apriori algorithm, the user can select minimum support, to prune candidate rules by specifying a minimum lower bound for the support measure of resulting association rules. Likewise, the confidence (described in introduction section), is also set. The cardinal is the number of concurring items (itemsets) used in computation. The values chosen are: minimum support is set at 0.16, confidence at 0.6, and minimum cardinal of itemsets is set at 4.

A sample of the derived rules for the demonstration is shown below:

$$ \begin{aligned} & {\text{Feature}}\,{\text{sequence}} = {\text{undefined}} = > {\text{Energy}} = {\text{High}} \\ & {\text{Feature}}\,{\text{sequence}} = {\text{predefined}} = > {\text{Energy}} = {\text{High}} \\ & {\text{Operation}} = {\text{Spot}}\,{\text{Drill}} = > {\text{Energy}} = {\text{Low}} \\ & {\text{Operation}} = {\text{Facing}} = > {\text{Energy}} = {\text{High}} \\ & {\text{Machine}} = {\text{M2}} = > {\text{Energy}} = {\text{High}} \\ & {\text{Machine}} = {\text{M1}} = > {\text{Energy}} = {\text{Medium}} \\ & {\text{Operation}} = {\text{Drill}} = > {\text{Energy}} = {\text{High}} \\ & {\text{Operation}} = {\text{Threading}} = > {\text{Energy}} = {\text{medium}} \\ & {\text{Operation}} = {\text{Grinding}}\& \& {\text{Tool}} = {\text{T2}} = > {\text{Energy}} = {\text{High}} \\ & {\text{Machine}} = {\text{M3}}\& \& {\text{Tool}} = {\text{T7}} = > {\text{Energy}} = {\text{Low}} \\ \end{aligned} $$

The rules show that feature sequencing, operation, machine, and tool are relevant to energy consumption. These factors are included in the simulation model, which generates the performance data. The association rules show relationships between input factors and performance data and they are incorporated into a DES model described.

Simulation and simulation-based optimization: The layout of machines and other details of the system operation were used to develop the DES model using Arena simulation software [17]. The model incorporates intermediate products, work-in-progress, raw materials, lubrication, energy, and operational disturbances. Main Arena modules in the simulation model include part arrival, data requirements for the process, part routing to various machines, part exit, and statistics generation. The manufacturing processes are represented as events, parts as entities, buffers as queues, parts and processes specification data as attributes, and collected data as variables.

The first section of the simulation model deals with part arrival and process data assignment. A part is assigned with information such as design features’ dimensions, operation list, and operation orders. The process sequence for the parts can be either of the three options described above. Based on this, the part is then sent to the second section of the model where its operations are decided from the operation matrix developed according to Table 3. A number of combinations of feature-process-machine-tool assignments are implemented in the model. Once an operation is completed, the routing of the part will be decided according to the assigned feature sequencing plan.

Optimization is performed using the OptQuest optimization package supplied by OpTek [46]. It is provided as an option extra with the Arena simulation tool. In OptQuest, resources, such as machine, material, control variables, attributes, constraints, and objective are specified. The user also controls the possible ranges of input variables and set-up inputs for OptQuest. OptQuest uses heuristics known as Tabu search, integer programming, neural networks, and scatter search for seeking within the control (input) space to converge towards the optimal solution. The results from different scenarios are shown in Table 5. The table also displays the resulting impacts from various system inputs.

Table 5 Resulting shop performance due to selected resource combinations

Note that the energy consumption as well as production times differ for the same resource set in each plan because of the different sequences in which the design features of the part are produced. Inter-arrival time between successive batch arrivals is set at a constant 120 min. Each batch consists of 15 parts. Data is collected and stored in database.

Determine actionable recommendations: This section discusses the results of various simulation runs from which actionable recommendations are made. Table 5 shows the resources available for each operation. The users can recognize the best process plan, or plans, that minimizes energy consumption and production time (see Table 6. The resource column shows available machine tools for a process; while the indicator columns show the resulting impacts. The table shows the tool-tip energy while the production time displays only the processing time on the machines. The minimum energy consumption is obtained by selecting resources R1R3R4R6R6.

Table 6 Summary of process plans for different feature sequence when minimizing energy consumption

System users will probably select the partially defined or undefined feature sequencing plans since they have lower energy consumption than the fully predefined sequencing case. At the same time, this sequencing plan would also result in minimum production time for the minimization of energy objective. We note, however, that if the minimum time objective is the one that had originally been set before the table was derived, the production sequence and resource set would have probably been different.

8 Summary, Discussion, and Future Work

Manufacturing industries today collect large volumes of data. Conventional data analysis methods cannot effectively transform this data into knowledge for decision support. Neither can simulation models be applied directly using this data. New approaches are, therefore, needed. This chapter presents a methodology that integrates different methods: data mining, simulation, and optimization for decision-making. This new idea provides the analyst and decision makers with the ability to pinpoint crucial data and prepare model parameters and input data that more effectively help improve performance analysis through simulation optimization. Data mining is first applied to the system data, simulation performs “what-if” analysis for the candidate scenarios, and optimization determines the resource sets, the production plans, and the process plans to optimize a given performance objective. The principal advantage of this methodology over existing approaches is to enable identifying and focusing only on relevant or crucial parameters within collected data. It also helps to reduce the search space for simulation model inputs and optimization by identifying the range of data that significantly affect user-defined system performance.

A case study of a machine shop has been used to demonstrate the methodology. In the case study, we showed how to determine a set of resources and feature sequencing plan that results in minimum tooltip energy during processing. The required prior knowledge can be made available to guide a product specification at the design stage. Similar approaches can be followed for a different objective such as minimum processing time or cost. Data mining to optimize system performance as demonstrated is the first step in developing models for eventually predicting system performance for any part design, machine shop resources, and desired production time.

The methodology involves data collection, model composition, model execution, and result analysis. In practice these activities would be carried out using different tools and models that need to be integrated using standardized interfaces. Therefore, a set of standards are required for the following purposes (1) data collection, (2) data representation, (3) model composition, and (4) system integration. Candidate standards include MTConnect [47] (data collection), CMSD [48] (data representation), Unified Modeling Language (UML) (model composition), and ISA-95 [49] or Open Application Group’s Integration Specification (OAGIS) (system integration) [50]. These are briefed next.

MTConnect standard facilitates the organized retrieval of process information from numerically controlled machine tools through continuous data logging. It provides a mechanism for system monitoring, process, and optimization with respect to energy and resources. This standard needs to be extended to collect other data besides CNC machine tools. The CMSD is a standard for integrating simulation applications with other manufacturing applications. CMSD uses a neutral data format to facilitate exchanging both simulation input and output data across supply chain partners. Among CMSD goals are supporting the construction of manufacturing simulators and the testing and evaluation of manufacturing software. More standardization efforts are needed especially for data collection. Currently, data collected is still limited to machine tool data.

For model conceptual design and composition, UML is a standard language for specifying, visualizing, constructing, and documenting the artifacts of software systems. An example of a diagramming method based on the UML is SysML, which supports management of system requirements along with the system development and operation.

The ISA-95 standard defines interfaces between enterprise and shop floor activities while OAGIS establishes integration scenarios for a set of applications including ERP, production scheduling, MES, and capacity analysis. However, OAGIS and ISA-95 were not intended to provide interfaces with simulation systems nor with each other.

Future work includes the definition and description of a framework for data collection and interface for input to data mining and simulation tools; investigation of data mining standards for the methodology; the requirements analysis for extension of existing standards for interfacing between tools for data mining, simulation, optimization, and manufacturing system monitoring; and conducting industrial size case studies.