Visual Analytics for Classifier Construction and Evaluation for Medical Data

Kustra, Jacek; Telea, Alexandru

doi:10.1007/978-3-030-05249-2_10

Jacek Kustra⁴ &
Alexandru Telea⁵

2919 Accesses

Abstract

Designing and optimizing classifiers for multidimensional mixed quantitative-and-categorical data is a challenging task. We present here a workflow and associated toolset that assists with this task, by providing the designer with insights into how the multidimensional input data is structured and how this structure influences the classification results. Our approach heavily relies on visual analytics for detecting relevant patterns in the input data, observing the distribution of classification errors, detecting and controlling the effect of feature selection on the classification results, and comparing in detail the performance of different classification techniques. We demonstrate the value of our approach on the concrete problem of building a classifier for predicting biochemical recurrence, indicating potential cancer relapse after prostate cancer treatment, from clinical patient data.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Visual Analytics for the Representation, Exploration, and Analysis of High-Dimensional, Multi-faceted Medical Data

On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics

Using Discriminative Dimensionality Reduction to Visualize Classifiers

Article 07 November 2014

References

Abernethy, A.P., Etheredge, L.M., Ganz, P.A., Wallace, P., German, R.R., Neti, C., Bach, P.B., Murphy, S.B.: Rapid-learning system for cancer care. J. Clin. Oncol. 28(27), 4268–4274 (2010). PMID: 20585094; https://doi.org/10.1200/JCO.2010.28.5478
Article Google Scholar
Albanese, D., Visintainer, R., Merler, S.: mlpy: Machine learning Python (2012). arXiv:1202.6548; http://mlpy.sourceforge.net
Altman, N.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
Bartenhagen, C., Klein, H.U., Ruckert, C., Jiang, X., Dugas, M.: Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11, 567 (2010). https://doi.org/10.1186/1471-2105-11-567
Google Scholar
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer, Berlin (2012)
Google Scholar
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME – the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor. Newsl. 11(1), 26–31 (2009)
Google Scholar
Biehl, M.: GMLVQ source code. http://www.cs.rug.nl/~biehl/gmlvq (2017)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, New York (1992)
Google Scholar
da Silva, R.R.O., Rauber, P., Martins, R.M., Minghim, R., Telea, A.: Attribute-based visual explanation of multidimensional projections. In: Proceedings of EuroVis Workshop on Visual Analytics (EuroVA), pp. 137–142 (2015)
Google Scholar
Demsar, J., Leban, G., Zupan, B.: FreeViz – an intelligent multivariate visualization approach to explorative analysis of biomedical data. J. Biomed. Inform. 40(6), 661–671 (2007)
Google Scholar
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 10(55), 78–87 (2012)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
MATH Google Scholar
Hajian-Tilaki, K.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp. J. Intern. Med. 4(2), 627–635 (2013). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/
Google Scholar
Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Netw. 15, 1059–1068 (2002)
Google Scholar
Hoffman, P., Grinstein, G., Marx, K., Grosse, I., Stanley, E.: DNA visual and analytic data mining. In: Proceedings of the IEEE Visualization, pp. 437–445 (1997)
Google Scholar
Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2013)
Google Scholar
Hohman, F., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers (2018). arXiv:1801.06889 [cs.HC]
Google Scholar
Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H., Chen, Y.J.: Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets Ther. 8, 2015–2022 (2015). https://doi.org/10.2147/OTT.S80733; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531007/
Joia, P., Coimbra, D., Cuminato, J.A., Paulovich, F.V., Nonato, L.G.: Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 17(12), 2563–2571 (2011)
Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, Berlin (2002)
MATH Google Scholar
Jones, E., Oliphant, T., Peterson, P.: SciPy: open source scientific tools for Python (2017). http://www.scipy.org
Keim, D., Andrienko, G., Fekete, J.D., Görg, C., Kohlhammer, J., Melan con, G.: Visual analytics: definition, process, and challenges. In: Information Visualization – Human-Centered Issues and Perspectives, pp. 154–175. Springer, Berlin (2008)
Google Scholar
Keim, D.A., Mansmann, F., Schneidewind, J., Thomas, J., Ziegler, H.: Visual analytics: scope and challenges. In: Visual Data Mining, pp. 76–90. Springer, Berlin (2008)
Google Scholar
Kimelfeld, B., Ré, C.: A relational framework for classifier engineering. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’17, pp. 5–20. ACM, New York (2017). http://doi.acm.org/10.1145/3034786.3034797
Kohonen, T.: Learning vector quantization. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 537–540. MIT Press, Cambridge (1995)
Google Scholar
Leban, G., Zupan, B., Vidmar, G., Bratko, I.: VizRank: data visualization guided by machine learning. Data Min. Knowl. Disc. 13(2), 119–136 (2006)
MathSciNet Google Scholar
Leemput, K.V., Maes, F., Vandermeulen, D., Suetens, P.: Automated model-based tissue classification of mr images of the brain. IEEE Trans. Med. Imaging 18(10), 897–908 (1999). https://doi.org/10.1109/42.811270
Google Scholar
Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Kolter, J.Z., Langer, D., Pink, O., Pratt, V., Sokolsky, M., Stanek, G., Stavens, D.M., Teichman, A., Werling, M., Thrun, S.: Towards fully autonomous driving: systems and algorithms. In: Intelligent Vehicles Symposium, pp. 163–168. IEEE, Piscataway (2011)
Google Scholar
Liu, S., Bremer, P.T., Pascucci, V.: Distortion-guided structure-driven interactive exploration of high-dimensional data. Comput. Graph. Forum 33(3), 101–110 (2014)
Google Scholar
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Vis. Comput. Graph. 23(3), 1249–1268 (2017)
Google Scholar
Martins, R., Coimbra, D., Minghim, R., Telea, A.: Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014)
Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. 72(4), 417–473 (2010)
MathSciNet Google Scholar
Minghim, R., Paulovich, F.V., Lopes, A.A.: Content-based text mapping using multi-dimensional projections for exploration of document collections. In: Visualization and Data Analysis (Proceedings of SPIE-IS&T Electronic Imaging), vol. 60, pp. 606–615 (2006)
Google Scholar
Mühlbacher, T., Piringer, H., Gratzl, S., Sedlmair, M., Streit, M.: Opening the black box: strategies for increased user involvement in existing algorithm implementations. IEEE Trans. Vis. Comput. Graph. 20(12), 1643–1652 (2014)
Google Scholar
Mulder, J., van Wijk, J.J., van Liere, R.: A survey of computational steering environments. Futur. Gener. Comput. Syst. 15(1), 119–129 (1999)
Google Scholar
Niknazar, P., Bourgault, M.: In the eye of the beholder: opening the black box of the classification process and demystifying classification criteria selection. Int. J. Manag. Proj. Bus. 10(2), 346–369 (2017)
Google Scholar
Paller, C.J., Antonarakis, E.S.: Management of biochemically recurrent prostate cancer after local therapy: evolving standards of care and new directions. Clin. Adv. Hematol. Oncol. 11(1), 14–23 (2013)
Google Scholar
Paulovich, F., Oliveira, M.C.F., Minghim, R.: The projection explorer: a flexible tool for projection-based multidimensional visualization. In: Proceedings of SIBGRAPI, pp. 27–36 (2007)
Google Scholar
Paulovich, F., Nonato, L., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14(3), 564–575 (2008)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://scikit-learn.org
MathSciNet MATH Google Scholar
Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification. In: ICWSM, vol. 11, pp. 281–288 (2011)
Google Scholar
Pezzotti, N., Höllt, T., van Gemert, J., Lelieveldt, B.P., Eisemann, E., Vilanova, A.: DeepEyes: progressive visual analytics for designing deep neural networks. IEEE Trans. Vis. Comput. Graph. 24(1), 98–108 (2018)
Google Scholar
Rauber, P., da Silva, R., Feringa, S., Celebi, M., Falcão, A., Telea, A.: Interactive image feature selection aided by dimensionality reduction. In: Proceedings of EuroVA, pp. 46–51. Eurographics (2015)
Google Scholar
Rauber, P., Fadel, S., Falcão, A., Telea, A.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Vis. Comput. Graph. 23(1), 101–110 (2017)
Google Scholar
Sammon, J.W.: A non-linear mapping for data structure analysis. IEEE Trans. Comput. C-18, 401–409 (1964)
Google Scholar
Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19(1), 221–248 (2017). http://dx.doi.org/10.1146/annurev-bioeng-071516-044442
Google Scholar
Sorzano, C., Vargas, J., Pascual-Montano, A.: A survey of dimensionality reduction techniques (2014). http://arxiv.org/pdf/1403.2877
Stephenson, A.J., Kattan, M.W., Eastham, J.A., Dotan, Z.A., Bianco, F.J., Lilja, H., Scardino, P.T.: Defining biochemical recurrence of prostate cancer after radical prostatectomy: a proposal for a standardized definition. J. Clin. Oncol. 24(24), 3973–3978 (2006)
Google Scholar
Sun, Y.: Iterative relief for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)
Google Scholar
Talbot, J., Lee, B., Kapoor, A., Tan, D.: EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In: Proceedings of ACM CHI, pp. 1283–1292 (2009)
Google Scholar
Tamagnini, P., Krause, J., Dasgupta, A., Bertini, E.: Interpreting black-box classifiers using instance-level visual explanations. In: Proceedings of ACM HILDA (2017)
Google Scholar
van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) (2009)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2431–2456 (2008)
MathSciNet MATH Google Scholar
van der Maaten, L., Postma, E., van den Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10(1), 66–71 (2009). http://www.iai.uni-bonn.de/~jz/dimensionality_reduction_a_comparative_review.pdf
Google Scholar
Zhang, J., Gruenwald, L.: Opening the black box of feature extraction: incorporating visualization into high-dimensional data mining processes. In: Proceedings of IEEE International Conference on Data Mining (ICDM) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Philips Research, Eindhoven, The Netherlands
Jacek Kustra
Institute Johann Bernoulli, University of Groningen, Groningen, The Netherlands
Alexandru Telea

Authors

Jacek Kustra
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Telea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacek Kustra .

Editor information

Editors and Affiliations

Philips Research, Eindhoven, The Netherlands
Sergio Consoli
Dept of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Diego Reforgiato Recupero
Data Science Department, Philips Research, Eindhoven, The Netherlands
Milan Petković

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kustra, J., Telea, A. (2019). Visual Analytics for Classifier Construction and Evaluation for Medical Data. In: Consoli, S., Reforgiato Recupero, D., Petković, M. (eds) Data Science for Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-030-05249-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-05249-2_10
Published: 24 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05248-5
Online ISBN: 978-3-030-05249-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Visual Analytics for Classifier Construction and Evaluation for Medical Data

Abstract

Chapter PDF

Similar content being viewed by others

Visual Analytics for the Representation, Exploration, and Analysis of High-Dimensional, Multi-faceted Medical Data

On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics

Using Discriminative Dimensionality Reduction to Visualize Classifiers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Visual Analytics for Classifier Construction and Evaluation for Medical Data

Abstract

Chapter PDF

Similar content being viewed by others

Visual Analytics for the Representation, Exploration, and Analysis of High-Dimensional, Multi-faceted Medical Data

On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics

Using Discriminative Dimensionality Reduction to Visualize Classifiers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation