Abstract
Data mining tools can be very beneficial for discovering interesting and useful patterns in complicated manufacturing processes. These patterns can be used, for example, to improve manufacturing quality. However, data accumulated in manufacturing plants have unique characteristics, such as unbalanced distribution of the target attribute, and a small training set relative to the number of input features. Thus, conventional methods are inaccurate in quality improvement cases. Recent research shows, however, that a decomposition tactic may be appropriate here and this paper presents a new feature set decomposition methodology that is capable of dealing with the data characteristics associated with quality improvement. In order to examine the idea, a new algorithm called (Breadth-Oblivious-Wrapper) BOW has been developed. This algorithm performs a breadth first search while using a new F-measure splitting criterion for multiple oblivious trees. The new algorithm was tested on various real-world manufacturing datasets, specifically the food processing industry and integrated circuit fabrication. The obtained results have been compared to other methods, indicating the superiority of the proposed methodology.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
R. Bellman (1961) Adaptive control processes: a guided tour Princeton University Press NJ
N.V. Chawla K.W. Bowyer L.O. Hall W.P Kegelmeyer (2002) ArticleTitleSMOTE: Synthetic minority over-sampling technique Journal of Artificial Intelligence Research 16 321–357
R. Duda P. Hart (1973) Pattern Classification and Scene Analysis Wiley New-York
G.H. Dunteman (1989) Principal Components Analysis Sage Publications CA, Beverley Hills
A. Estabrooks T. Jo N. Japkowicz (2004) ArticleTitleA multiple resampling method for learning from imbalances data sets Computational Intelligence 20 IssueID1 18–36 Occurrence Handle10.1111/j.0824-7935.2004.t01-1-00228.x
C. Ferri P. Flach J Hernández-Orallo (2002) Learning decision trees using the area under the ROC curve C. Sammut A. Hoffmann (Eds) Proceedings of the 19th International Conference on Machine Learning Morgan Kaufmann CA 139–146
Fountain, T. Dietterich T., & Sudyka B. (2000) Mining IC test data to optimize VLSI testing. In J. Simoff & O Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference Boston: MA, USA. pp 18–25
J.H. Friedman J.W. Tukey (1973) ArticleTitleA Projection pursuit algorithm for exploratory data analysis IEEE Transactions on Computers 23 IssueID9 881–889
Gardner, M., & Bieker, J. (2000) Data mining solves tough semiconductor manufacturing problems. In J. Simoff & O. Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference. Boston: MA, USA. pp 376–383
D. Hand (1998) ArticleTitleData mining—reaching beyond statistics Research in Official Statistics 1 IssueID2 5–17
J. Hwang S. Lay A. Lippman (1994) ArticleTitleNonparametric multivariate density estimation: A comparative study IEEE Transaction on Signal Processing 42 IssueID10 2795–2810
N. Japkowicz S. Stephen (2002) ArticleTitleThe class imbalance problem: a systematic study Intelligent Data Analysis Journal 6 IssueID5 429–449
L.O. Jimenez D.A. Landgrebe (1998) ArticleTitleSupervised classification in high- dimensional space: geometrical, statistical, and asymptotical properties of multivariate data IEEE Transaction on Systems Man, and Cybernetics—Part C: Applications and Reviews 28 39–54
G.H. John R. Kohavi P Pfleger (1994) Irrelevant features and the subset selection problem W. Cohen H. Hirsh (Eds) Proceedings of the Eleventh International Conference In Machine Learning New Brunswick NJ 121–129
V.M. Joshi (2002) On evaluating performance of classifiers for rare classes H. Wang S.P. Yu S Stolfo (Eds) Proceedings Second IEEE International Conference on Data Mining IEEE Computer Society Press San Jose, California 641–644
J.O. Kim C.W. Mueller (1978) Factor Analysis: Statistical Methods and Practical Issues Sage Publications CA
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced data sets: one-Sided sampling. Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, pp. 179–186
A. Kusiak (2000) ArticleTitleDecomposition in data mining: An industrial case study IEEE Transactions on Electronics Packaging Manufacturing 23 IssueID4 345–353
A. Kusiak (2001) ArticleTitleRough Set Theory: A Data Mining Tool for Semiconductor Manufacturing IEEE Transactions on Electronics Packaging Manufacturing 24 IssueID1 44–50 Occurrence Handle10.1109/6104.924792
A. Kusiak C. Kurasek (2001) ArticleTitleData Mining of Printed-Circuit Board Defects IEEE Transactions on Robotics and Automation 17 IssueID2 191–196 Occurrence Handle10.1109/70.928564
M. Last A. Kandel (2001) Data mining for process and quality control in the semiconductor industry D Braha (Eds) Data Mining for Design and Manufacturing: Methods and Applications Kluwer Academic Publishers Dordrecht 207–234
M. Last O. Maimon E. Minkov (2002) ArticleTitleImproving stability of decision trees International Journal of Pattern Recognition and Artificial Intelligence 16 IssueID2 145–159 Occurrence Handle10.1142/S0218001402001599
H. Liu H. Motoda (1998) Feature Selection for Knowledge Discovery and Data Mining Kluwer Academic Publishers Dordrecht
O. Maimon L. Rokach (2001) Data mining by attribute decomposition with semiconductors manufacturing case study D. Braha (Eds) Data Mining for Design and Manufacturing: Methods and Applications Kluwer Academic Publishers Dordrecht 311–336
T. Niblett (1987) Constructing decision trees in noisy domains I. Bratko N. Lavrac (Eds) Proceedings of the Second European Working Session on Learning Sigma Press, Wilmslow England 67–78
Nickerson, A., Japkowicz, N., & Milios, E. (2001) Using unsupervised learning to guide resampling in imbalanced data sets. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, pp 261–265
A.S. Nugroho S. Kuroyanagi A. Iwata (2002) ArticleTitleA Solution for Imbalanced Training Sets Problem by CombNET-II and Its Application on Fog Forcasting Transactions on Information and Systems, The Institute of Electronics, Information and Communication Engineers 85 IssueID7 1165–1174
Pfahringer, B. (1994). Controlling constructive induction in CiPF. In F. Bergadano, & L. De Raedt (Eds.), Proceedings of the seventh European Conference on Machine Learning. pp 242–256. Springer-Verlag
J.R. Quinlan (1993) C4.5: Programs for Machine Learning Morgan Kaufmann CA
Van Rijsbergen, C.J. (1979). Information retrieval, butterworth ISBN 0-408-70929-4
P. Zant ParticleVan (1997) Microchip fabrication: a Practical Guide to semiconductor processing McGraw-Hill New York
G.M. Weiss F. Provost (2003) ArticleTitleLearning when training data are costly: the effect of class distribution on tree induction Journal of Artificial Intelligence Research. 19 315–354
Weiss, G.M., & Zhang, T. (2003). Performance analysis and evaluation. In Y. Nong, (ed.), The Handbook of Data Mining. Lawrence Erlbaum Associates Publishers, pp 425–439
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: September 2004 / Accepted: September 2005
Rights and permissions
About this article
Cite this article
Rokach, L., Maimon, O. Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach. J Intell Manuf 17, 285–299 (2006). https://doi.org/10.1007/s10845-005-0005-x
Issue Date:
DOI: https://doi.org/10.1007/s10845-005-0005-x