Abstract
Nowadays, processing a large mass of documents has become indispensable. Moreover, the abundance of information available today complicates the task of the user to access and find relevant information in a large collection of documents. Thus, the exploitation of large documentary collections requires the implementation of an efficient tool allowing a relevant and efficient retrieval. In a practical sense, the relevance of an Information Retrieval System depends on the document representation model.
In this paper, we delve into Information Retrieval approaches within the context of XML documents. The primary objective of this study is to emphasize the importance of considering various aspects, such as the semantic aspect, structural aspect, and others, to enhance the performance of an Information Retrieval System. By incorporating these aspects into the system, we aim to improve the precision and efficiency of retrieving relevant information from XML document collections, addressing the challenges posed by the ever-growing volume of available data in today’s information-rich environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmed, K.T., Ummesafi, S., Iqbal, A.: Content based image retrieval using image features information fusion. Inf. Fusion, 51:76–99, 2019
Aïtelhadj, A., Mezghiche, M., Souam, F.: Classification de structures arborescentes: cas de documents xml. CORIA 2009, 301–317 (2009)
Belahyane, I., Mammass, M., Abioui, H., Idarrou, A.: Graph-Based Image Retrieval: State of the Art. In: El Moataz, A., Mammass, D., Mansouri, A., Nouboud, F. (eds.) ICISP 2020. LNCS, vol. 12119, pp. 299–307. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51935-3_32
Chagheri, S., Roussey, C., Calabretto, S., Dumoulin, C.: Classification de documents combinant la structure et le contenu. 2013
Chen, H., Trouve, A., Murakami, K.J.: A Fukuda. an intelligent annotation-based image retrieval system based on RDF descriptions. Comput. Electr. Eng., 58:537–550, 2017
Dahak, F., Boughanem, M., Balla, A.: A probabilistic model to exploit user expectations in XML information retrieval. Inf. Process. Manage. 53(1), 87–105 (2017)
Daoud, M., Tamine, L., Boughanem, M.: A personalized search using a semantic distance measure in a graph-based ranking model. J. Inf. Sci. 37(6), 614–636 (2011)
Djemal, K.: De la modélisation à l’exploitation des documents à structures multiples. PhD thesis, Université de Toulouse, Université Toulouse III-Paul Sabatier, 2010
Farhi, S., Boughaci, D., Sidali Hocine Farhi and Dalila Boughaci: Graph based model for information retrieval using a stochastic local search. Pattern Recogn. Lett. 105, 234–239 (2018)
Floyd, R.W., Algorithm 97: shortest path. Communications of the ACM, 5(6):345, 1962
Gozuacik, N., Sakar, C.O., Ozcan, S.: Social media-based opinion retrieval for product analysis using multi-task deep neural networks. Expert Systems with Applications, p. 115388, 2021
Hahm, G.J., Yi, M.Y., Lee, J.H., Suh, H.W.: A personalized query expansion approach for engineering document retrieval. Adv. Eng. Inf., 28(4):344–359, 2014
Hernández-Gracidas, C.: L Enrique Sucar, and Manuel Montes-y Gómez. Modeling spatial relations for image retrieval by conceptual graphs, In First Chilean Workshop on Pattern Recognition (2009)
Hsu, J., et al.: Content-based text mining technique for retrieval of cad documents. Autom. Constr. 31, 65–74 (2013)
Pengfei, H., Liu, W., Jiang, W., Yang, Z.: Latent topic model for audio retrieval. Pattern Recogn. 47(3), 1138–1143 (2014)
Idarrou, A.: Entreposage de documents multimédias: comparaison de structures. PhD thesis, Toulouse 1, 2013
Laitang, C., Boughanem, M., Pinel-Sauvagnat, K.: XML Information Retrieval through Tree Edit Distance and Structural Summaries. In: Salem, M. V. M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 73–83. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25631-8_7
Laitang, C., Pinel-Sauvagnat, K., Boughanem, M.: DTD Based Costs for Tree-Edit Distance in Structured Information Retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S. O.., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 158–170. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_14
Lau, C.., Tjondronegoro, D.., Zhang, J.., Geva, S.., Liu, Y..: Fusing Visual and Textual Retrieval Techniques to Effectively Search Large Collections of Wikipedia Images. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 345–357. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73888-6_34
Lecluze, C., L n-grammes de caractères comme moyen de comparaison à grande échelle de corpus multilingue. In JéTou 2011, Toulouse, 7â” 8 avril 2011, pp. 147–151, 2011
Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. Multimedia Syst. 8(1), 69–81 (2000)
Liu, S., McMahon, C.A., Culley, S.J.: A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Computers in Industry, 59(1):3–16, 2008
McNamee, P., Mayfield, J.: Character n-gram tokenization for european language text retrieval. Inf. Retrieval 7(1), 73–97 (2004)
Montes-Y-Gómez, M., López-López, A., Gelbukh, A.: Information Retrieval with Conceptual Graph Matching. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 312–321. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44469-6_29
Munir, K., Anjum, M.S.: The use of ontologies for effective knowledge modelling and information retrieval. Appl. Comput. Inf., 14(2):116–126, 2018
Pinel-Sauvagnat, K., Mothe, J.: Mesures de la qualité des systèmes de recherche d’information. Ingénierie des Systèmes d Inf. 18(3), 11–38 (2013)
Qin, J., Liu, L., Mengyang, Yu., Wang, Y., Shao, L.: Fast action retrieval from videos via feature disaggregation. Comput. Vis. Image Underst. 156, 104–116 (2017)
Remi, S., Varghese, S.C.: Domain ontology driven fuzzy semantic information retrieval. Procedia Comput. Sci., 46:676–681, 2015
Ren, X., Dai, Yu.: Estimation of structural similarity of xml document based on frequency and path. 2016
Loïc Bremme. Définition: Qu’est-ce que le big data. Le Big Data, 2018
Ren, X., Dai, Yu.: Research on similarity for XML information retrieval. In International Conference on Education, Management, Computer and Society. Atlantis Press, 2016
Khan, E., AlSalem, A.: Ivia: interactive video intelligent agent framework for instructional video information retrieval. Procedia. Soc. Behav. Sci. 64, 186–191 (2012)
Salton, G., McGill, M.J.: Introduction to modern information retrieval. 1986
Sauvagnat. K. Modèle flexible pour la recherche d’information dans des corpus de documents semi-structurés. PhD thesis, Université Paul Sabatier-Toulouse III, 2005
Sébastien Sorlin, Pierre-Antoine Champin, and Christine Solnon. Mesurer la similarité de graphes étiquetés. 9èmes Journées Nationales sur la résolution pratique de problèmes NP-Complets (JNPC 2003), pp. 325–339, 2003
Spola, N., et al.: A systematic review on content-based video retrieval. Eng. Appl. Artif. Intell., 90:103557, 2020
Tagarelli, A., Greco, S.: Semantic clustering of XML documents. ACM Trans. Inf. Syst. (TOIS) 28(1), 1–56 (2010)
Tekli, J., Chbeir, R.: A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. J. Web Seman. 11, 14–40 (2012)
Truong, Q.D., Dkaki, T., Mothe, J., Charrel, P.J.: Information retrieval model based on graph comparison. Journées internationales d’Analyse statistique des Données Textuelles (JADT 2008), Lyon, France, 12-MAR-08-14-MAR, 8:1115–1126, 2008
Unar, S., Wang, X., Wang, C., Wang, Yu.: A decisive content based image retrieval approach for feature fusion in visual and textual images. Knowl.-Based Syst. 179, 8–20 (2019)
Vercoustre, A.M., Fegas, M., Lechevallier, Y., Despeyroux, T.: Classification de documents xml à partir d’une représentation linéaire des arbres de ces documents. In EGC, pp. 433–444, 2006
Vilares, J., Vilares, M., Alonso, M.A., Oakes, M.P.: On the feasibility of character n-grams pseudo-translation for cross-language information retrieval tasks. Comput. Speech Lang., 36:136–164, 2016
Feki , J., Kacem, M.S.H., Unification of XML Document Structure For Document Warhouse (DocW). Advances in Information Systems and Technologies, 12:301–308, 2011. SciTePress
Zhang, Z., Wang, L., Xie, X., Pan, H.: A graph based document retrieval method. In: 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD)), pp. 426–432. IEEE, 2018
Zhao, Q., Chen, L., Bhowmick, S.S., Madria, S.: Xml structural delta mining: Issues and challenges. Data Knowl. Eng., 59(3):627–651, 2006
Chen, B., Zhao, Q., Sun, B., Mitra, P.: Temporal and social network based blogging behavior prediction in blogspace. In Proc, ICDM (2007)
Boone, C.D., Bernath, P.F., Cok, D., Jones, S.C., Steffen, J.: Version 4 retrievals for the atmospheric chemistry experiment fourier transform spectrometer (ace-fts) and imagers. J. Quant. Spectrosc. Radiat. Transfer 247, 106939 (2020)
Zurinahni, Z., Bing, W.: XML Document Design via GN-DTD. Eur. J. Sci. Res., 44(2):314–336, 2010. EuroJournals Publishing, Inc. [Image of Journal Cover]
Alkhatib, R.: Investigating Binary String Encoding for Compact Representation of XML Documents. In Proceedings of the Fourth International Conference on Computer Science and Information Technology (CoSIT 2017), pp. 9–16, 2017. CS & IT-CSCP
Messaoud, I.B., Feki, J., Khrouf, K., Zurfluh, G.: A first step for building a document warehouse: Unification of XML documents. Int. J. Data Warehous. Min., 7(1):29–48, 2011. Inderscience Publishers. [Image of Journal Cover]
Belahyane, I., Mammass, M., Abioui, H., Moutaoukkil, A., Idarrou, A.: Structural Information Retrieval in XML Documents: A Graph-based Approach. Int. J. Adv. Comput. Sci. and Appl., 13(3):213–224, 2022. The Sci. Inf. (SAI) Organization
Author information
Authors and Affiliations
Contributions
Mouad Mammass, Hasna Abioui and Ali Idarrou–These authors contributed equally to this work.
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Belahyane, I., Mammass, M., Abioui, H., Idarrou, A. (2024). Information Retrieval in XML Document: State of the Art. In: Ezziyyani, M., Kacprzyk, J., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD'2023). AI2SD 2023. Lecture Notes in Networks and Systems, vol 930. Springer, Cham. https://doi.org/10.1007/978-3-031-54318-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-54318-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54317-3
Online ISBN: 978-3-031-54318-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)