Abstract
Designing and optimizing classifiers for multidimensional mixed quantitative-and-categorical data is a challenging task. We present here a workflow and associated toolset that assists with this task, by providing the designer with insights into how the multidimensional input data is structured and how this structure influences the classification results. Our approach heavily relies on visual analytics for detecting relevant patterns in the input data, observing the distribution of classification errors, detecting and controlling the effect of feature selection on the classification results, and comparing in detail the performance of different classification techniques. We demonstrate the value of our approach on the concrete problem of building a classifier for predicting biochemical recurrence, indicating potential cancer relapse after prostate cancer treatment, from clinical patient data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Abernethy, A.P., Etheredge, L.M., Ganz, P.A., Wallace, P., German, R.R., Neti, C., Bach, P.B., Murphy, S.B.: Rapid-learning system for cancer care. J. Clin. Oncol. 28(27), 4268–4274 (2010). PMID: 20585094; https://doi.org/10.1200/JCO.2010.28.5478
Albanese, D., Visintainer, R., Merler, S.: mlpy: Machine learning Python (2012). arXiv:1202.6548; http://mlpy.sourceforge.net
Altman, N.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Bartenhagen, C., Klein, H.U., Ruckert, C., Jiang, X., Dugas, M.: Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11, 567 (2010). https://doi.org/10.1186/1471-2105-11-567
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer, Berlin (2012)
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME – the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor. Newsl. 11(1), 26–31 (2009)
Biehl, M.: GMLVQ source code. http://www.cs.rug.nl/~biehl/gmlvq (2017)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, New York (1992)
da Silva, R.R.O., Rauber, P., Martins, R.M., Minghim, R., Telea, A.: Attribute-based visual explanation of multidimensional projections. In: Proceedings of EuroVis Workshop on Visual Analytics (EuroVA), pp. 137–142 (2015)
Demsar, J., Leban, G., Zupan, B.: FreeViz – an intelligent multivariate visualization approach to explorative analysis of biomedical data. J. Biomed. Inform. 40(6), 661–671 (2007)
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 10(55), 78–87 (2012)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Hajian-Tilaki, K.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp. J. Intern. Med. 4(2), 627–635 (2013). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/
Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Netw. 15, 1059–1068 (2002)
Hoffman, P., Grinstein, G., Marx, K., Grosse, I., Stanley, E.: DNA visual and analytic data mining. In: Proceedings of the IEEE Visualization, pp. 437–445 (1997)
Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2013)
Hohman, F., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers (2018). arXiv:1801.06889 [cs.HC]
Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H., Chen, Y.J.: Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets Ther. 8, 2015–2022 (2015). https://doi.org/10.2147/OTT.S80733; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531007/
Joia, P., Coimbra, D., Cuminato, J.A., Paulovich, F.V., Nonato, L.G.: Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 17(12), 2563–2571 (2011)
Jolliffe, I.T.: Principal Component Analysis. Springer, Berlin (2002)
Jones, E., Oliphant, T., Peterson, P.: SciPy: open source scientific tools for Python (2017). http://www.scipy.org
Keim, D., Andrienko, G., Fekete, J.D., Görg, C., Kohlhammer, J., Melan con, G.: Visual analytics: definition, process, and challenges. In: Information Visualization – Human-Centered Issues and Perspectives, pp. 154–175. Springer, Berlin (2008)
Keim, D.A., Mansmann, F., Schneidewind, J., Thomas, J., Ziegler, H.: Visual analytics: scope and challenges. In: Visual Data Mining, pp. 76–90. Springer, Berlin (2008)
Kimelfeld, B., Ré, C.: A relational framework for classifier engineering. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’17, pp. 5–20. ACM, New York (2017). http://doi.acm.org/10.1145/3034786.3034797
Kohonen, T.: Learning vector quantization. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 537–540. MIT Press, Cambridge (1995)
Leban, G., Zupan, B., Vidmar, G., Bratko, I.: VizRank: data visualization guided by machine learning. Data Min. Knowl. Disc. 13(2), 119–136 (2006)
Leemput, K.V., Maes, F., Vandermeulen, D., Suetens, P.: Automated model-based tissue classification of mr images of the brain. IEEE Trans. Med. Imaging 18(10), 897–908 (1999). https://doi.org/10.1109/42.811270
Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Kolter, J.Z., Langer, D., Pink, O., Pratt, V., Sokolsky, M., Stanek, G., Stavens, D.M., Teichman, A., Werling, M., Thrun, S.: Towards fully autonomous driving: systems and algorithms. In: Intelligent Vehicles Symposium, pp. 163–168. IEEE, Piscataway (2011)
Liu, S., Bremer, P.T., Pascucci, V.: Distortion-guided structure-driven interactive exploration of high-dimensional data. Comput. Graph. Forum 33(3), 101–110 (2014)
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Vis. Comput. Graph. 23(3), 1249–1268 (2017)
Martins, R., Coimbra, D., Minghim, R., Telea, A.: Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014)
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. 72(4), 417–473 (2010)
Minghim, R., Paulovich, F.V., Lopes, A.A.: Content-based text mapping using multi-dimensional projections for exploration of document collections. In: Visualization and Data Analysis (Proceedings of SPIE-IS&T Electronic Imaging), vol. 60, pp. 606–615 (2006)
Mühlbacher, T., Piringer, H., Gratzl, S., Sedlmair, M., Streit, M.: Opening the black box: strategies for increased user involvement in existing algorithm implementations. IEEE Trans. Vis. Comput. Graph. 20(12), 1643–1652 (2014)
Mulder, J., van Wijk, J.J., van Liere, R.: A survey of computational steering environments. Futur. Gener. Comput. Syst. 15(1), 119–129 (1999)
Niknazar, P., Bourgault, M.: In the eye of the beholder: opening the black box of the classification process and demystifying classification criteria selection. Int. J. Manag. Proj. Bus. 10(2), 346–369 (2017)
Paller, C.J., Antonarakis, E.S.: Management of biochemically recurrent prostate cancer after local therapy: evolving standards of care and new directions. Clin. Adv. Hematol. Oncol. 11(1), 14–23 (2013)
Paulovich, F., Oliveira, M.C.F., Minghim, R.: The projection explorer: a flexible tool for projection-based multidimensional visualization. In: Proceedings of SIBGRAPI, pp. 27–36 (2007)
Paulovich, F., Nonato, L., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14(3), 564–575 (2008)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://scikit-learn.org
Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification. In: ICWSM, vol. 11, pp. 281–288 (2011)
Pezzotti, N., Höllt, T., van Gemert, J., Lelieveldt, B.P., Eisemann, E., Vilanova, A.: DeepEyes: progressive visual analytics for designing deep neural networks. IEEE Trans. Vis. Comput. Graph. 24(1), 98–108 (2018)
Rauber, P., da Silva, R., Feringa, S., Celebi, M., Falcão, A., Telea, A.: Interactive image feature selection aided by dimensionality reduction. In: Proceedings of EuroVA, pp. 46–51. Eurographics (2015)
Rauber, P., Fadel, S., Falcão, A., Telea, A.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Vis. Comput. Graph. 23(1), 101–110 (2017)
Sammon, J.W.: A non-linear mapping for data structure analysis. IEEE Trans. Comput. C-18, 401–409 (1964)
Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19(1), 221–248 (2017). http://dx.doi.org/10.1146/annurev-bioeng-071516-044442
Sorzano, C., Vargas, J., Pascual-Montano, A.: A survey of dimensionality reduction techniques (2014). http://arxiv.org/pdf/1403.2877
Stephenson, A.J., Kattan, M.W., Eastham, J.A., Dotan, Z.A., Bianco, F.J., Lilja, H., Scardino, P.T.: Defining biochemical recurrence of prostate cancer after radical prostatectomy: a proposal for a standardized definition. J. Clin. Oncol. 24(24), 3973–3978 (2006)
Sun, Y.: Iterative relief for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)
Talbot, J., Lee, B., Kapoor, A., Tan, D.: EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In: Proceedings of ACM CHI, pp. 1283–1292 (2009)
Tamagnini, P., Krause, J., Dasgupta, A., Bertini, E.: Interpreting black-box classifiers using instance-level visual explanations. In: Proceedings of ACM HILDA (2017)
van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) (2009)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2431–2456 (2008)
van der Maaten, L., Postma, E., van den Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10(1), 66–71 (2009). http://www.iai.uni-bonn.de/~jz/dimensionality_reduction_a_comparative_review.pdf
Zhang, J., Gruenwald, L.: Opening the black box of feature extraction: incorporating visualization into high-dimensional data mining processes. In: Proceedings of IEEE International Conference on Data Mining (ICDM) (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kustra, J., Telea, A. (2019). Visual Analytics for Classifier Construction and Evaluation for Medical Data. In: Consoli, S., Reforgiato Recupero, D., Petković, M. (eds) Data Science for Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-030-05249-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-05249-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05248-5
Online ISBN: 978-3-030-05249-2
eBook Packages: Computer ScienceComputer Science (R0)