Abstract
Information mining from textual data becomes a very challenging task when the structure of the text record is very loose without any rules. Doctors often use natural language in medical records. Therefore it contains many ambiguities due to non-standard abbreviations and synonyms. The medical environment itself is also very specific: the natural language used in textual description varies with the personality creating the record (there are many personalized approaches), however it is restricted by terminology (i.e. medical terms, medical standards, etc.). Moreover, the typical patient record is filled with typographical errors, duplicates, ambiguities, syntax errors and many nonstandard abbreviations.
This paper describes the process of mining information from loosely structured medical textual records with no apriori knowledge. The paper concerns mining a large dataset of ~50,000–140,000 records × 20 attributes in relational database tables, originating from the hospital information system (thanks go to the University Hospital in Brno, Czech Republic) recording over 11 years. This paper concerns only textual attributes with free text input, that means 650,000 text fields in 16 attributes. Each attribute item contains approximately 800–1,500 characters (diagnoses, medications, anamneses, etc.). The output of this task is a set of ordered/nominal attributes suitable for automated processing that can help in asphyxia prediction during delivery.
The proposed technique has an important impact on reduction of the processing time of loosely structured textual records for experts.
Note that this project is an ongoing process (and research) and new data are still received from the medical facility, justifying the need for robust and fool-proof algorithms.
In the preliminary analysis of the data, classical approaches such as basic statistic measures, word (and word sequence) frequency analysis, etc., have been used to simplify the textual data and provide a preliminary overview of the data. Finally, an ant-inspired self-organizing approach has been used to automatically provide a simplified dominant structure, presenting structure of the records in the human readable form that can be further utilized in the mining process as it describes the vast majority of the records.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Adami, C.: Introduction to Artificial Life. Springer (1998)
Blum, C.: Ant colony optimization: Introduction and recent trends. Physics of Life Reviews 2(4), 353–373 (2005)
Bursa, M., Huptych, M., Lhotska, L.: Ant colony inspired metaheuristics in biological signal processing: Hybrid ant colony and evolutionary approach. In: Biosignals 2008-II, vol. 2, pp. 90–95. INSTICC Press, Setubal (2008)
Bursa, M., Lhotska, L., Macas, M.: Hybridized swarm metaheuristics for evolutionary random forest generation. In: Proceedings of the 7th International Conference on Hybrid Intelligent Systems 2007 (IEEE CSP), pp. 150–155 (2007)
Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., Chretien, L.: The dynamics of collective sorting robot-like ants and ant-like robots. In: Proceedings of the First International Conference on Simulation of Adaptive Behavior on From Animals to Animats, pp. 356–363. MIT Press, Cambridge (1990)
Dorigo, M., Stutzle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)
Freitag, D., McCallum, A.K.: Information extraction with hmms and shrinkage. In: Proceedings of the AAAI Workshop on Machine Learining for Information Extraction (1999)
Grasse, P.P.: La reconstruction du nid et les coordinations inter-individuelles chez bellicositermes natalensis et cubitermes sp. la thorie de la stigmergie: Essai d’interprtation des termites constructeurs. Insectes Sociaux 6, 41–81 (1959)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the ICML, pp. 282–289 (2001); text processing: interobserver agreement among linquists at 70
Lumer, E.D., Faieta, B.: Diversity and adaptation in populations of clustering ants. In: From Animals to Animats: Proceedings of the 3th International Conference on the Simulation of Adaptive Behaviour, vol. 3, pp. 501–508 (1994)
Trianni, V., Labella, T.H., Dorigo, M.: Evolution of Direct Communication for a Swarm-bot Performing Hole Avoidance. In: Dorigo, M., Birattari, M., Blum, C., Gambardella, L.M., Mondada, F., Stützle, T. (eds.) ANTS 2004. LNCS, vol. 3172, pp. 130–141. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bursa, M., Lhotska, L., Chudacek, V., Spilka, J., Janku, P., Huser, M. (2012). Practical Problems and Solutions in Hospital Information System Data Mining. In: Böhm, C., Khuri, S., Lhotská, L., Renda, M.E. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2012. Lecture Notes in Computer Science, vol 7451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32395-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-32395-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32394-2
Online ISBN: 978-3-642-32395-9
eBook Packages: Computer ScienceComputer Science (R0)