Dynamic Document Localization for Efficient Mining

Sijin, P.; Champa, H. N.

doi:10.1007/978-981-16-5157-1_2

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1408))

1406 Accesses

Abstract

Annotation is the process of adding notes about an entity such as documents, attributes, data repositories and data spaces to make them more visible and expressive. In probabilistic data models, it is used for generating values for attributes which could be used for generating a range of queries which are matched with a database schema. The proposed fuzzy document localization model (FDLM) lists out the top-k attributes by deriving a monotone fuzzy rank function based on query value $Q_\mathrm{val}$ and content value $C_\mathrm{val}$. The newly arrived documents are processed with the annotated documents which are conditionally modeled with ground truth attributes in a dynamic document categorization process. The semantic matches of attributes are identified by a pre-processed conceptualization framework this in turn increases the cardinality of result set. The system is biased with a biasing parameter $\beta $ in order to maintain a balance with workload-based query value and database-oriented content value to set a selection bound over the range of accurate and approximate matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Incremental document clustering using fuzzy-based optimization strategy

Article 17 December 2019

An efficient document information retrieval using hybrid global search optimization algorithm with density based clustering technique

Article 08 February 2023

A Novel Map-Reduce Based Augmented Clustering Algorithm for Big Text Datasets

References

Helm, D. J., & Thompson, B. W. (2001). An approach for totally dynamic forms processing in web-based applications. In ICEIS (pp. 974–977).
Google Scholar
Tornqvist, N. C., & Johnson, A. M. (1999). XML and Objects-the Future of the E-forms on the Web (pp. 303–308).
Google Scholar
Jeffery, S. R., Franklin, M. J., & Halevy, A. Y. (2008). Pay-as-you-go user feedback for dataspace systems (pp. 847–860).
Google Scholar
Li, C., Sun, A., Weng, J., & He, Q. (2015). Tweet segmentation and its application to named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 27(2), 558–570.
Article Google Scholar
Yu, Z., Wang, H., Lin, X., & Wang, M. (2016). Understanding short texts through semantic enrichment and hashing. IEEE Transactions on Knowledge and Data Engineering, 28(2), 566–579.
Article Google Scholar
Schmidt, A., Kersten, M., & Windhouwer, M. (2001). Querying XML documents made easy: Nearest concept queries (pp. 321–329).
Google Scholar
Fuhr, N., & Grosjohann, K. (2001). XIRQL: A query language for information retrieval in XML documents (pp. 172–180).
Google Scholar
Schutze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval, 39.
Google Scholar
Chen, S. F., & Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech Language, 13(4), 359–394.
Article Google Scholar
Ruiz, E. J., Hristidis, V., & Ipeirotis, P. G. (2014). Facilitating document annotation using content and querying value. IEEE Transactions on Knowledge and Data Engineering, 26(2), 336–349.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. Cornell University Library.
Google Scholar
Sijin, P., Champa, H., & Venugopal, K. (2017). A survey on intent-based diversification for fuzzy keyword search. International Journal of Computer Science and Information Technologies, 8(6), 602–618.
Google Scholar
Liu, J., & Yan, D. (2016). Answering approximate queries over XML data. IEEE Transactions on Fuzzy Systems, 24(2), 288–305.
Article Google Scholar
Li, J., Liu, C., & Yu, J. X. (2015). Context-based diversification for keyword queries over XML data. IEEE Transactions on Knowledge and Data Engineering, 27(3), 660–672.
Article Google Scholar
Wang, L. (2017). Heterogeneous data and big data analytics. Automatic Control and Information Sciences, 3(1), 8–15.
Article Google Scholar
Chiang, I. J., Liu, C. C. H., Tsai, Y. H., & Kumar, A. (2015). Discovering latent semantics in web documents using fuzzy clustering. IEEE Transactions on Fuzzy Systems, 23(6), 2122–2134.
Article Google Scholar
Sestakova, E., & Janousek, J. (2018). Automata approach to XML data indexing. Information, 9(1), 12.
Article Google Scholar
Dou, Z., Jiang, Z., Hu, S., Wen, J.-R., & Song, R. (2016). Automatically mining facets for queries from their search results. IEEE Transactions on Knowledge and Data Engineering, 28(2), 385–397.
Article Google Scholar
Zhao, R., & Mao, K. (2017). Fuzzy bag-of-words model for document representation. IEEE Transactions on Fuzzy Systems.
Google Scholar
Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015). From word embeddings to document distances (pp. 957–966).
Google Scholar
Liu, J., Wang, K., & Fung, B. C. (2016). Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering, 28(5), 1245–1257.
Article Google Scholar
Tseng, V. S., Wu, C.-W., Fournier-Viger, P., & Philip, S. Y. (2016). Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 28(1), 54–67.
Article Google Scholar
Suma, V., & Hills, S. M. (2020). Data mining based prediction of demand in Indian market for refurbished electronics. Journal of Soft Computing Paradigm (JSCP), 2(03), 153–159.
Article Google Scholar
Raj, J. S. (2020). Machine learning implementation in cognitive radio networks with game-theory technique. Journal: IRO Journal on Sustainable Wireless Systems, 2020(2), 68–75.
Google Scholar
Koshti, S., Sen, A., & Jadhav, V. (2017). Dynamic Query Forms (DQF) using ranking models for database queries (pp. 365–370).
Google Scholar
Jayapandian, M., & Jagadish, H. (2008). Automated Creation of a Forms-based Database Query Interface. Proceedings of the VLDB Endowment, 1(1), 695–709.
Article Google Scholar
Jain, A., & Ipeirotis, P. G. (2009). A quality-aware optimizer for information extraction. ACM Transactions on Database Systems, 34(1), 1–48.
Article Google Scholar
Rahm, E., & Bernstein, P. A. (2001). A Survey of Approaches to Automatic Schema Matching. the VLDB Journal, 10(4), 334–350.
Google Scholar
Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval (pp. 275–281).
Google Scholar
Sijin, P., & Champa, H. (2020). Fuzzy conceptualization model for document representation. In IEEE International Conference on Electronics, 2020, Computing and Communication Technologies (pp. 1–4).
Google Scholar
Cohen, W. W., Ravikumar, P., Fienberg, S. E., et al. (2003). A comparison of string distance metrics for. Name-Matching Tasks, 2003, 73–78.
Google Scholar
Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8(1), 1–38.
Article Google Scholar
Fagin, R., Lotem, A., & Naor, M. (2003). Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4), 614–656.
Article MathSciNet Google Scholar
Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58(1), 83–99.
Article MathSciNet Google Scholar
Zadeh, L. A. (1965). Fuzzy Sets. Information and Control, 8(3), 338–353.
Article MathSciNet Google Scholar
Dong, X. L., Halevy, A., & Yu, C. (2009). Data integration with uncertainty. The VLDB Journal, 18(2), 469–500.
Article Google Scholar
Clemen, R. T., & Winkler, R. L. (1990). Unanimity and compromise among probability forecasters. Management Science, 36(7), 767–779.
Article Google Scholar
Chang, K. C.-C., & Hwang, S.-w. (2002). Minimal probing: Supporting expensive predicates for top-k queries (pp. 346–357).
Google Scholar

Download references

Author information

Authors and Affiliations

University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, Karnataka, India
P. Sijin & H. N. Champa

Authors

P. Sijin
View author publications
You can also search for this author in PubMed Google Scholar
H. N. Champa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Engineering, Tribhuvan University, Pulchowk Campus, Lalitpur, Nepal
Subarna Shakya
Intelligent Systems Research Centre, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas
Songkla University, Songkhla, Thailand
Sinchai Kamolphiwong
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sijin, P., Champa, H.N. (2022). Dynamic Document Localization for Efficient Mining. In: Shakya, S., Balas, V.E., Kamolphiwong, S., Du, KL. (eds) Sentimental Analysis and Deep Learning. Advances in Intelligent Systems and Computing, vol 1408. Springer, Singapore. https://doi.org/10.1007/978-981-16-5157-1_2

Download citation

DOI: https://doi.org/10.1007/978-981-16-5157-1_2
Published: 26 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5156-4
Online ISBN: 978-981-16-5157-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Dynamic Document Localization for Efficient Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Incremental document clustering using fuzzy-based optimization strategy

An efficient document information retrieval using hybrid global search optimization algorithm with density based clustering technique

A Novel Map-Reduce Based Augmented Clustering Algorithm for Big Text Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Dynamic Document Localization for Efficient Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Incremental document clustering using fuzzy-based optimization strategy

An efficient document information retrieval using hybrid global search optimization algorithm with density based clustering technique

A Novel Map-Reduce Based Augmented Clustering Algorithm for Big Text Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation