Abstract
Recent years have witnessed an increasing interest in interval-valued data analysis. As one of the core topics, linear regression attracts particular attention. It attempts to model the relationship between one or more explanatory variables and a response variable by fitting a linear equation to the interval-valued observations. Despite of the well-known methods such as CM, CRM and CCRM proposed in the literature, further study is still needed to build a regression model that can capture the complete information in interval-valued observations. To this end, in this paper, we propose the novel Complete Information Method (CIM) for linear regression modeling. By dividing hypercubes into informative grid data, CIM defines the inner product of interval-valued variables, and transforms the regression modeling into the computation of some inner products. Experiments on both the synthetic and real-world data sets demonstrate the merits of CIM in modeling interval-valued data, and avoiding the mathematical incoherence introduced by CM and CRM.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bertrand, P. & Goupil, F. (2000). Descriptive statistics for symbolic data. In: Bock, H., Diday, E. (eds.), Analysis of symbolic data: exploratory methods for extracting statistical information from complex data, pp. 106–124. Berlin: Springer-Verlag
Billard, L. & Diday, E. (2000). Regression analysis for interval-valued data. In: Kiers, H.A.L., Rasson, J.P., Groenen, P.J.F., Schader, M. (eds.), Data Analysis, Classification and Related Methods: Proceedings of the Seventh Conference of the International Federation of Classification Societies: 369–374, Namur, July 11–14, 2000, Springer
Billard, L. & Diday, E. (2002). Symbolic regression analysis. In: Jajuga, K., Sokolowski, A., Bock, H.H. (eds.), Data Analysis, Classification and Related Methods: Proceedings of the Eighth Conference of the International Federation of Classification Societies: 281–288, Cracow, July 14–15, 2002, Springer
Billard, L. & Diday, E. (2003). From the statistics of data to the statistics of knowledge: symbolic data analysis. Journal of the American Statistical Association, 98(462): 470–487
Billard, L. & Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester
Bock, H.H. & Diday, E. (2000). Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer-Verlag, Berlin
Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. (1997). Extension de l’analyse en composantes principales à des donnés de type intervalle. Revue de Statistique Apliquée, 45(3): 5–24
de Carvalho, F.A.T., Brito, P. & Bock, H.H. (2006). Dynamic clustering for interval data based on L2 distance. Computational Statistics, 21(2): 231–250
de Carvalho, F.D.T., de Souza, R.M.C.R., Chavent, M. & Lechevallier, Y. (2006). Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognition Letters, 27(3): 167–179
de Souza, R.M.C.R. & de Carvalho, F.D.T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25(3): 353–365
Diday, E. (1987). The symbolic approach in clustering and related methods of data analysis. In: Bock, H.H., (ed.), Classification and Related Methods of Data Analysis. Amsterdam: North-Holland
Diday, E. (1989). Introduction à l’approche symbolique en analyse des données. Revue Francaise d’automatique, d’informatique et de Recherche Opérationnelle: Recherche Opérationnelle, 23(2): 193–236
Diday, E. & Noirhomme-Fraiture, M. (2008). Symbolic Data Analysis and the SODAS Software. Wiley-Interscience, Chichester
Draper, N. & Smith, H. (1981). Applied Regression Analysis. John Wiley, New York
Gioia, F. & Lauro, C.N. (2005). Basic statistical methods for interval data. Statistical Application, 17(1): 1–29
Gioia, F. & Lauro, C.N. (2006). Principal component analysis on interval data. Computational Statistics, 21(2): 343–363
Lauro, C.N. & Gioia, F. (2006). Dependence and interdependence analysis for interval-valued variables. In: Batagelj, V., Bock, H.H., Ferligoj, A., Ziberna, A. (eds.), Data Science and Classification. Berlin: Springer-Verlag
Lima, E.D. & de Carvalho, F.D.T. (2008). Centre and Range method for fitting a linear regression model to symbolic interval data. Computational Statistics & Data Analysis, 52(3): 1500–1515
Lima, E.D. & de Carvalho, F.D.T. (2010). Constrained linear regression models for symbolic interval-valued variables. Computational Statistics & Data Analysis, 54(2): 333–347
Marino, M. & Palumo, F. (2003). Interval arithmetic for the evaluation of imprecise data effects in least squares linear regression. Statistica Applicata, 14(3): 277–291
Montgomery, D. (1982). Introduction to Linear Regression Analysis. John Wiley, New York
Moore, R. (1966). Interval Analysis. Prentice Hall, Englewood Cliffs, NJ
Scheffé, H. (1959). The Analysis of Variance
Silva, A.P.D. & Brito, P. (2006). Linear discriminant analysis for interval data. Computational Statistics, 21: 289–308
Simonoff, J. (1994). Smoothing Methods in Statistics. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants 71031001, 70771004, 70901002 and 71171007, the Foundation for the Author of National Excellent Doctoral Dissertation of PR China under Grant 201189, and the Program for New Century Excellent Talents in University under Grant NCET-11-0778.
Huiwen Wang received her B.Sc. degree from Beihang University (BHU), China, in 1982, DEA MASE, from Paris XI, France, in 1989, and Ph.D. degree in engineering system from BHU in 1992. She is currently a professor in Management Science and Engineering Department, School of Economics and Management (SEM), BHU. Also, she is dean of SEM, director of SEM Academic Degrees Committee, and director of Research Center of Complex Data Analysis in BHU. Prof. Wang received National Science Fund for Distinguished Young Scholars. Her general area of research is statistics and data analysis, with a recent focus on multivariate analysis for high-dimension complex data. She is an IASC member, a member of National Statistics Teaching Materials Review Committee, executive director of China Marketing Association, editorial member of Journal of Symbolic Data Analysis.
Rong Guan is a Ph.D. candidate from the School of Economics and Management at Beihang University, China. She received her B.S. degree in industrial engineering from the same university in 2008. Her research interests are in the area of computational statistics and data analysis, currently focus on multivariate analysis on interval-valued data.
Junjie Wu, the contact author of the paper, received his Ph.D. degree in management science and engineering from Tsinghua University, China, in 2008. He is currently an associate professor in Information Systems Department, School of Economics and Management, Beihang University, China. He is also the director of Social Computing and Sentiment Analysis Center, the vice director of Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, and the outside research fellow of Research Center for Contemporary Management, Key Research Institute of Humanities and Social Sciences at Universities, Tsinghua University. His general area of research is data mining and complex networks, with a special interest in solving the problems raised from the emerging data-intensive applications. He is the recipient of the National Excellent Doctoral Dissertation award (2010) and the New Century Excellent Talents in University award (2011), and the choices of the Microsoft Star-Track program and the Springer Thesis Prize. He is a member of ACM, IEEE, INFORMS, AIS, and CCF.
Rights and permissions
About this article
Cite this article
Wang, H., Guan, R. & Wu, J. Linear regression of interval-valued data based on complete information in hypercubes. J. Syst. Sci. Syst. Eng. 21, 422–442 (2012). https://doi.org/10.1007/s11518-012-5203-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11518-012-5203-4