Abstract
The history of statistical data analysis is old, it goes back to the 1920’s. Many fundamental concepts of multivariate statistical data analysis, especially pure theoretical notions, have been accomplished by the 1950’s. After the 1960’s, the practical applications of multivariate statistical data analysis have been available, coupled with the progress of computers, and these have also been an affect on theoretical considerations.
The basic process of data analysis is given as follows:
-
p1).
An objective of data analysis is given.
-
p2).
The data which seems to be closely connected with the objective is observed. (sampling data)
-
p3).
Constructing a model (or a set of models) for explaining the variation of the data.
-
p4).
Preprocessing (or transforming) the original data in order to make consistency between input data and the model.
-
p5).
Identification of the model based on observed (input) data.
-
p6).
Evaluate a goodness of fit. If the goodness of fit is insufficient, then return to P2) or P3), else go to next process.
-
p7).
Interpretation of the result and investigate the validity.
The most different point on “data mining” and statistical data analysis seems to be the concept of “Data”. In data mining, the data is given as a database in advance. But, in statistical data analysis, the data is observed according to the objective of the analysis.
On the other hand, the object of “data mining” is to find the effective (or valuable) information in the data. From the framework of statistical data analysis above, the main processes of data mining are p3), p4) and p5). However, the concept of “efficient information” in data mining is different from the main part of the data variation in statistical data analysis. For instance, in principal component analysis, the main part of the data variation is obtained as the first principal component, which has the largest proportion. But in data mining, the major variation of the data is of no interest, because the knowledge obtained from it is trivial. Then, data mining seems to be interested in the principal components with small proportion in order to get unusual but valuable information. Hence, statistical data analysis for residual data which is removing the main part of the data variation from the original data, will be useful for data mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sato, Y. (2000). Perspective on Data Mining from Statistical Viewpoints. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_1
Download citation
DOI: https://doi.org/10.1007/3-540-45571-X_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67382-8
Online ISBN: 978-3-540-45571-4
eBook Packages: Springer Book Archive