Skip to main content

Information Retrieval in XML Document: State of the Art

  • Conference paper
  • First Online:
International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD'2023) (AI2SD 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 930))

  • 199 Accesses

Abstract

Nowadays, processing a large mass of documents has become indispensable. Moreover, the abundance of information available today complicates the task of the user to access and find relevant information in a large collection of documents. Thus, the exploitation of large documentary collections requires the implementation of an efficient tool allowing a relevant and efficient retrieval. In a practical sense, the relevance of an Information Retrieval System depends on the document representation model.

In this paper, we delve into Information Retrieval approaches within the context of XML documents. The primary objective of this study is to emphasize the importance of considering various aspects, such as the semantic aspect, structural aspect, and others, to enhance the performance of an Information Retrieval System. By incorporating these aspects into the system, we aim to improve the precision and efficiency of retrieving relevant information from XML document collections, addressing the challenges posed by the ever-growing volume of available data in today’s information-rich environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmed, K.T., Ummesafi, S., Iqbal, A.: Content based image retrieval using image features information fusion. Inf. Fusion, 51:76–99, 2019

    Google Scholar 

  2. Aïtelhadj, A., Mezghiche, M., Souam, F.: Classification de structures arborescentes: cas de documents xml. CORIA 2009, 301–317 (2009)

    Google Scholar 

  3. Belahyane, I., Mammass, M., Abioui, H., Idarrou, A.: Graph-Based Image Retrieval: State of the Art. In: El Moataz, A., Mammass, D., Mansouri, A., Nouboud, F. (eds.) ICISP 2020. LNCS, vol. 12119, pp. 299–307. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51935-3_32

    Chapter  Google Scholar 

  4. Chagheri, S., Roussey, C., Calabretto, S., Dumoulin, C.: Classification de documents combinant la structure et le contenu. 2013

    Google Scholar 

  5. Chen, H., Trouve, A., Murakami, K.J.: A Fukuda. an intelligent annotation-based image retrieval system based on RDF descriptions. Comput. Electr. Eng., 58:537–550, 2017

    Google Scholar 

  6. Dahak, F., Boughanem, M., Balla, A.: A probabilistic model to exploit user expectations in XML information retrieval. Inf. Process. Manage. 53(1), 87–105 (2017)

    Article  Google Scholar 

  7. Daoud, M., Tamine, L., Boughanem, M.: A personalized search using a semantic distance measure in a graph-based ranking model. J. Inf. Sci. 37(6), 614–636 (2011)

    Article  Google Scholar 

  8. Djemal, K.: De la modélisation à l’exploitation des documents à structures multiples. PhD thesis, Université de Toulouse, Université Toulouse III-Paul Sabatier, 2010

    Google Scholar 

  9. Farhi, S., Boughaci, D., Sidali Hocine Farhi and Dalila Boughaci: Graph based model for information retrieval using a stochastic local search. Pattern Recogn. Lett. 105, 234–239 (2018)

    Article  Google Scholar 

  10. Floyd, R.W., Algorithm 97: shortest path. Communications of the ACM, 5(6):345, 1962

    Google Scholar 

  11. Gozuacik, N., Sakar, C.O., Ozcan, S.: Social media-based opinion retrieval for product analysis using multi-task deep neural networks. Expert Systems with Applications, p. 115388, 2021

    Google Scholar 

  12. Hahm, G.J., Yi, M.Y., Lee, J.H., Suh, H.W.: A personalized query expansion approach for engineering document retrieval. Adv. Eng. Inf., 28(4):344–359, 2014

    Google Scholar 

  13. Hernández-Gracidas, C.: L Enrique Sucar, and Manuel Montes-y Gómez. Modeling spatial relations for image retrieval by conceptual graphs, In First Chilean Workshop on Pattern Recognition (2009)

    Google Scholar 

  14. Hsu, J., et al.: Content-based text mining technique for retrieval of cad documents. Autom. Constr. 31, 65–74 (2013)

    Article  Google Scholar 

  15. Pengfei, H., Liu, W., Jiang, W., Yang, Z.: Latent topic model for audio retrieval. Pattern Recogn. 47(3), 1138–1143 (2014)

    Article  Google Scholar 

  16. Idarrou, A.: Entreposage de documents multimédias: comparaison de structures. PhD thesis, Toulouse 1, 2013

    Google Scholar 

  17. Laitang, C., Boughanem, M., Pinel-Sauvagnat, K.: XML Information Retrieval through Tree Edit Distance and Structural Summaries. In: Salem, M. V. M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 73–83. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25631-8_7

    Chapter  Google Scholar 

  18. Laitang, C., Pinel-Sauvagnat, K., Boughanem, M.: DTD Based Costs for Tree-Edit Distance in Structured Information Retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S. O.., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 158–170. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_14

    Chapter  Google Scholar 

  19. Lau, C.., Tjondronegoro, D.., Zhang, J.., Geva, S.., Liu, Y..: Fusing Visual and Textual Retrieval Techniques to Effectively Search Large Collections of Wikipedia Images. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 345–357. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73888-6_34

    Chapter  Google Scholar 

  20. Lecluze, C., L n-grammes de caractères comme moyen de comparaison à grande échelle de corpus multilingue. In JéTou 2011, Toulouse, 7â” 8 avril 2011, pp. 147–151, 2011

    Google Scholar 

  21. Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. Multimedia Syst. 8(1), 69–81 (2000)

    Article  Google Scholar 

  22. Liu, S., McMahon, C.A., Culley, S.J.: A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Computers in Industry, 59(1):3–16, 2008

    Google Scholar 

  23. McNamee, P., Mayfield, J.: Character n-gram tokenization for european language text retrieval. Inf. Retrieval 7(1), 73–97 (2004)

    Article  Google Scholar 

  24. Montes-Y-Gómez, M., López-López, A., Gelbukh, A.: Information Retrieval with Conceptual Graph Matching. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 312–321. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44469-6_29

    Chapter  Google Scholar 

  25. Munir, K., Anjum, M.S.: The use of ontologies for effective knowledge modelling and information retrieval. Appl. Comput. Inf., 14(2):116–126, 2018

    Google Scholar 

  26. Pinel-Sauvagnat, K., Mothe, J.: Mesures de la qualité des systèmes de recherche d’information. Ingénierie des Systèmes d Inf. 18(3), 11–38 (2013)

    Article  Google Scholar 

  27. Qin, J., Liu, L., Mengyang, Yu., Wang, Y., Shao, L.: Fast action retrieval from videos via feature disaggregation. Comput. Vis. Image Underst. 156, 104–116 (2017)

    Article  Google Scholar 

  28. Remi, S., Varghese, S.C.: Domain ontology driven fuzzy semantic information retrieval. Procedia Comput. Sci., 46:676–681, 2015

    Google Scholar 

  29. Ren, X., Dai, Yu.: Estimation of structural similarity of xml document based on frequency and path. 2016

    Google Scholar 

  30. Loïc Bremme. Définition: Qu’est-ce que le big data. Le Big Data, 2018

    Google Scholar 

  31. Ren, X., Dai, Yu.: Research on similarity for XML information retrieval. In International Conference on Education, Management, Computer and Society. Atlantis Press, 2016

    Google Scholar 

  32. Khan, E., AlSalem, A.: Ivia: interactive video intelligent agent framework for instructional video information retrieval. Procedia. Soc. Behav. Sci. 64, 186–191 (2012)

    Article  Google Scholar 

  33. Salton, G., McGill, M.J.: Introduction to modern information retrieval. 1986

    Google Scholar 

  34. Sauvagnat. K. Modèle flexible pour la recherche d’information dans des corpus de documents semi-structurés. PhD thesis, Université Paul Sabatier-Toulouse III, 2005

    Google Scholar 

  35. Sébastien Sorlin, Pierre-Antoine Champin, and Christine Solnon. Mesurer la similarité de graphes étiquetés. 9èmes Journées Nationales sur la résolution pratique de problèmes NP-Complets (JNPC 2003), pp. 325–339, 2003

    Google Scholar 

  36. Spola, N., et al.: A systematic review on content-based video retrieval. Eng. Appl. Artif. Intell., 90:103557, 2020

    Google Scholar 

  37. Tagarelli, A., Greco, S.: Semantic clustering of XML documents. ACM Trans. Inf. Syst. (TOIS) 28(1), 1–56 (2010)

    Article  Google Scholar 

  38. Tekli, J., Chbeir, R.: A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. J. Web Seman. 11, 14–40 (2012)

    Article  Google Scholar 

  39. Truong, Q.D., Dkaki, T., Mothe, J., Charrel, P.J.: Information retrieval model based on graph comparison. Journées internationales d’Analyse statistique des Données Textuelles (JADT 2008), Lyon, France, 12-MAR-08-14-MAR, 8:1115–1126, 2008

    Google Scholar 

  40. Unar, S., Wang, X., Wang, C., Wang, Yu.: A decisive content based image retrieval approach for feature fusion in visual and textual images. Knowl.-Based Syst. 179, 8–20 (2019)

    Article  Google Scholar 

  41. Vercoustre, A.M., Fegas, M., Lechevallier, Y., Despeyroux, T.: Classification de documents xml à partir d’une représentation linéaire des arbres de ces documents. In EGC, pp. 433–444, 2006

    Google Scholar 

  42. Vilares, J., Vilares, M., Alonso, M.A., Oakes, M.P.: On the feasibility of character n-grams pseudo-translation for cross-language information retrieval tasks. Comput. Speech Lang., 36:136–164, 2016

    Google Scholar 

  43. Feki , J., Kacem, M.S.H., Unification of XML Document Structure For Document Warhouse (DocW). Advances in Information Systems and Technologies, 12:301–308, 2011. SciTePress

    Google Scholar 

  44. Zhang, Z., Wang, L., Xie, X., Pan, H.: A graph based document retrieval method. In: 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD)), pp. 426–432. IEEE, 2018

    Google Scholar 

  45. Zhao, Q., Chen, L., Bhowmick, S.S., Madria, S.: Xml structural delta mining: Issues and challenges. Data Knowl. Eng., 59(3):627–651, 2006

    Google Scholar 

  46. Chen, B., Zhao, Q., Sun, B., Mitra, P.: Temporal and social network based blogging behavior prediction in blogspace. In Proc, ICDM (2007)

    Google Scholar 

  47. Boone, C.D., Bernath, P.F., Cok, D., Jones, S.C., Steffen, J.: Version 4 retrievals for the atmospheric chemistry experiment fourier transform spectrometer (ace-fts) and imagers. J. Quant. Spectrosc. Radiat. Transfer 247, 106939 (2020)

    Article  Google Scholar 

  48. Zurinahni, Z., Bing, W.: XML Document Design via GN-DTD. Eur. J. Sci. Res., 44(2):314–336, 2010. EuroJournals Publishing, Inc. [Image of Journal Cover]

    Google Scholar 

  49. Alkhatib, R.: Investigating Binary String Encoding for Compact Representation of XML Documents. In Proceedings of the Fourth International Conference on Computer Science and Information Technology (CoSIT 2017), pp. 9–16, 2017. CS & IT-CSCP

    Google Scholar 

  50. Messaoud, I.B., Feki, J., Khrouf, K., Zurfluh, G.: A first step for building a document warehouse: Unification of XML documents. Int. J. Data Warehous. Min., 7(1):29–48, 2011. Inderscience Publishers. [Image of Journal Cover]

    Google Scholar 

  51. Belahyane, I., Mammass, M., Abioui, H., Moutaoukkil, A., Idarrou, A.: Structural Information Retrieval in XML Documents: A Graph-based Approach. Int. J. Adv. Comput. Sci. and Appl., 13(3):213–224, 2022. The Sci. Inf. (SAI) Organization

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mouad Mammass, Hasna Abioui and Ali Idarrou–These authors contributed equally to this work.

Corresponding author

Correspondence to Imane Belahyane .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belahyane, I., Mammass, M., Abioui, H., Idarrou, A. (2024). Information Retrieval in XML Document: State of the Art. In: Ezziyyani, M., Kacprzyk, J., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD'2023). AI2SD 2023. Lecture Notes in Networks and Systems, vol 930. Springer, Cham. https://doi.org/10.1007/978-3-031-54318-0_28

Download citation

Publish with us

Policies and ethics