Skip to main content

Dynamic Document Localization for Efficient Mining

  • Conference paper
  • First Online:
Sentimental Analysis and Deep Learning

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1408))

  • 1406 Accesses

Abstract

Annotation is the process of adding notes about an entity such as documents, attributes, data repositories and data spaces to make them more visible and expressive. In probabilistic data models, it is used for generating values for attributes which could be used for generating a range of queries which are matched with a database schema. The proposed fuzzy document localization model (FDLM) lists out the top-k attributes by deriving a monotone fuzzy rank function based on query value \(Q_\mathrm{val}\) and content value \(C_\mathrm{val}\). The newly arrived documents are processed with the annotated documents which are conditionally modeled with ground truth attributes in a dynamic document categorization process. The semantic matches of attributes are identified by a pre-processed conceptualization framework this in turn increases the cardinality of result set. The system is biased with a biasing parameter \(\beta \) in order to maintain a balance with workload-based query value and database-oriented content value to set a selection bound over the range of accurate and approximate matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Helm, D. J., & Thompson, B. W. (2001). An approach for totally dynamic forms processing in web-based applications. In ICEIS (pp. 974–977).

    Google Scholar 

  2. Tornqvist, N. C., & Johnson, A. M. (1999). XML and Objects-the Future of the E-forms on the Web (pp. 303–308).

    Google Scholar 

  3. Jeffery, S. R., Franklin, M. J., & Halevy, A. Y. (2008). Pay-as-you-go user feedback for dataspace systems (pp. 847–860).

    Google Scholar 

  4. Li, C., Sun, A., Weng, J., & He, Q. (2015). Tweet segmentation and its application to named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 27(2), 558–570.

    Article  Google Scholar 

  5. Yu, Z., Wang, H., Lin, X., & Wang, M. (2016). Understanding short texts through semantic enrichment and hashing. IEEE Transactions on Knowledge and Data Engineering, 28(2), 566–579.

    Article  Google Scholar 

  6. Schmidt, A., Kersten, M., & Windhouwer, M. (2001). Querying XML documents made easy: Nearest concept queries (pp. 321–329).

    Google Scholar 

  7. Fuhr, N., & Grosjohann, K. (2001). XIRQL: A query language for information retrieval in XML documents (pp. 172–180).

    Google Scholar 

  8. Schutze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval, 39.

    Google Scholar 

  9. Chen, S. F., & Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech Language, 13(4), 359–394.

    Article  Google Scholar 

  10. Ruiz, E. J., Hristidis, V., & Ipeirotis, P. G. (2014). Facilitating document annotation using content and querying value. IEEE Transactions on Knowledge and Data Engineering, 26(2), 336–349.

    Article  Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. Cornell University Library.

    Google Scholar 

  12. Sijin, P., Champa, H., & Venugopal, K. (2017). A survey on intent-based diversification for fuzzy keyword search. International Journal of Computer Science and Information Technologies, 8(6), 602–618.

    Google Scholar 

  13. Liu, J., & Yan, D. (2016). Answering approximate queries over XML data. IEEE Transactions on Fuzzy Systems, 24(2), 288–305.

    Article  Google Scholar 

  14. Li, J., Liu, C., & Yu, J. X. (2015). Context-based diversification for keyword queries over XML data. IEEE Transactions on Knowledge and Data Engineering, 27(3), 660–672.

    Article  Google Scholar 

  15. Wang, L. (2017). Heterogeneous data and big data analytics. Automatic Control and Information Sciences, 3(1), 8–15.

    Article  Google Scholar 

  16. Chiang, I. J., Liu, C. C. H., Tsai, Y. H., & Kumar, A. (2015). Discovering latent semantics in web documents using fuzzy clustering. IEEE Transactions on Fuzzy Systems, 23(6), 2122–2134.

    Article  Google Scholar 

  17. Sestakova, E., & Janousek, J. (2018). Automata approach to XML data indexing. Information, 9(1), 12.

    Article  Google Scholar 

  18. Dou, Z., Jiang, Z., Hu, S., Wen, J.-R., & Song, R. (2016). Automatically mining facets for queries from their search results. IEEE Transactions on Knowledge and Data Engineering, 28(2), 385–397.

    Article  Google Scholar 

  19. Zhao, R., & Mao, K. (2017). Fuzzy bag-of-words model for document representation. IEEE Transactions on Fuzzy Systems.

    Google Scholar 

  20. Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015). From word embeddings to document distances (pp. 957–966).

    Google Scholar 

  21. Liu, J., Wang, K., & Fung, B. C. (2016). Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering, 28(5), 1245–1257.

    Article  Google Scholar 

  22. Tseng, V. S., Wu, C.-W., Fournier-Viger, P., & Philip, S. Y. (2016). Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 28(1), 54–67.

    Article  Google Scholar 

  23. Suma, V., & Hills, S. M. (2020). Data mining based prediction of demand in Indian market for refurbished electronics. Journal of Soft Computing Paradigm (JSCP), 2(03), 153–159.

    Article  Google Scholar 

  24. Raj, J. S. (2020). Machine learning implementation in cognitive radio networks with game-theory technique. Journal: IRO Journal on Sustainable Wireless Systems, 2020(2), 68–75.

    Google Scholar 

  25. Koshti, S., Sen, A., & Jadhav, V. (2017). Dynamic Query Forms (DQF) using ranking models for database queries (pp. 365–370).

    Google Scholar 

  26. Jayapandian, M., & Jagadish, H. (2008). Automated Creation of a Forms-based Database Query Interface. Proceedings of the VLDB Endowment, 1(1), 695–709.

    Article  Google Scholar 

  27. Jain, A., & Ipeirotis, P. G. (2009). A quality-aware optimizer for information extraction. ACM Transactions on Database Systems, 34(1), 1–48.

    Article  Google Scholar 

  28. Rahm, E., & Bernstein, P. A. (2001). A Survey of Approaches to Automatic Schema Matching. the VLDB Journal, 10(4), 334–350.

    Google Scholar 

  29. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval (pp. 275–281).

    Google Scholar 

  30. Sijin, P., & Champa, H. (2020). Fuzzy conceptualization model for document representation. In IEEE International Conference on Electronics, 2020, Computing and Communication Technologies (pp. 1–4).

    Google Scholar 

  31. Cohen, W. W., Ravikumar, P., Fienberg, S. E., et al. (2003). A comparison of string distance metrics for. Name-Matching Tasks, 2003, 73–78.

    Google Scholar 

  32. Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8(1), 1–38.

    Article  Google Scholar 

  33. Fagin, R., Lotem, A., & Naor, M. (2003). Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4), 614–656.

    Article  MathSciNet  Google Scholar 

  34. Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58(1), 83–99.

    Article  MathSciNet  Google Scholar 

  35. Zadeh, L. A. (1965). Fuzzy Sets. Information and Control, 8(3), 338–353.

    Article  MathSciNet  Google Scholar 

  36. Dong, X. L., Halevy, A., & Yu, C. (2009). Data integration with uncertainty. The VLDB Journal, 18(2), 469–500.

    Article  Google Scholar 

  37. Clemen, R. T., & Winkler, R. L. (1990). Unanimity and compromise among probability forecasters. Management Science, 36(7), 767–779.

    Article  Google Scholar 

  38. Chang, K. C.-C., & Hwang, S.-w. (2002). Minimal probing: Supporting expensive predicates for top-k queries (pp. 346–357).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sijin, P., Champa, H.N. (2022). Dynamic Document Localization for Efficient Mining. In: Shakya, S., Balas, V.E., Kamolphiwong, S., Du, KL. (eds) Sentimental Analysis and Deep Learning. Advances in Intelligent Systems and Computing, vol 1408. Springer, Singapore. https://doi.org/10.1007/978-981-16-5157-1_2

Download citation

Publish with us

Policies and ethics