Abstract
There is always a need to increase the overall relevance of results in Web search systems. Most existing web search systems are query-driven and give the least preferences to the users’ needs. Specifically, mining images from the Web are a highly cumbersome task as there are so many homonyms and canonically synonymous terms. An ideal Web image recommendation system must understand the needs of the user. A system that facilitates modeling of homonymous and synonymous ontologies that understands the users’ need for images is proposed. A Hybrid Semantic Algorithm that computes the semantic similarity using APMI is proposed. The system also classifies the ontologies using SVM and facilitates a homonym lookup directory for classifying the semantically related homonymous ontologies. The users’ intentions are dynamically captured by presenting images based on the initial OntoPath and recording the user click. Strategic expansion of OntoPath based on the user’s choice increases the recommendation relevance. An overall accuracy of 95.09% is achieved by the proposed system.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Web search is an important technique for retrieving the required information from the World Wide Web. The Web is an affluent informational repository [1] where the information density is the highest. Web searches take place through search engines [2] which are an implementation of several Web mining techniques and algorithms that makes the mining of the Web quite easy and efficient. When a Web page or a Web content document need to be retrieved, a simple approach involving calculating the semantic similarity [3] between the query and Web documents, probabilistic matching [4], keyword matching [5], etc., is traditionally applied in order to retrieve the text content, whereas when an image relevant to a query has to be retrieved a problem arises. There are several terms which are homonyms which may be spelt the same but have different meaning. Image search is an important application of Web mining where the search engine must be able to extract the required images as per the query in a manner such that the images obtained must be highly relevant to the query that is input by the user. The user’s intention as well as the retrieved images must have a high degree of correlation. Also, the search engine must be able to distinctly retrieve all the unique images for the query involving homonyms, synonyms, and also, several unique elements for the search query must be displayed. The users’ intention must act as a driving force to display the images of high correctness and satisfy the users’ need for image search. The search efficiency must still be maintained, i.e., the number of relevant images for a specific query based search must be maximized.
Motivation: Most of the existing systems for Web image retrieval have a query-oriented perspective [6] and are not user-oriented. The ultimate goal of any retrieval system must be based and directed as per the users’ choice, thus satisfying the user’s need for the images. Certain existing systems which capture user’s preferences still do not make a mark as they neglect the perspective of the user to give the best results. This enhances the noise [7] of Web image search and increases the irrelevance of images retrieved in the context of Web image search that needs to be overcome.
Contribution: An ontological approach that deals with modeling appropriate ontologies for homonyms is proposed. Aggregation of several semantically similar classes and then establish a hierarchical pathway for ontologies for classifying images based on the query input. The proposed system also captures the individual user’s choice and then retrieves all the possible images based on the user’s intention. The proposed system also incorporates SVM for classifying the ontologies based on their query terms. An APMI strategy is incorporated for semantic similarity computation. Also, the incorporation of homonym lookup table reduces the overall response time of the recommendation process.
Organization: This paper is organized as follows. Section 2 provides an overview of the related research work. Section 3 presents the proposed system architecture. Implementation is discussed in the Sect. 4. Performance evaluation and results are discussed in Sect. 5. This paper is concluded in Sect. 6.
2 Related Work
Kousalya and Thananmani [8] have put forth content-based image retrieval technique with multifeature extraction that involves the extraction of graphical features from the image. Euclidean distance is used for similarity computation in this approach. Dhonde and Raut [9] used the hierarchical k-means strategy for the retrieval of images from the Web and have increased the overall Web search proficiency. Umaa and Thanushkodi [10] proposed a content-based image retrieval technique using hybrid methodologies. Amalgamation of several methods like cosine transforms, wavelet transforms, feature point extractions, Euclidean distance computations is incorporated. Deepak and Andrade [11] have proposed OntoRec algorithm that incorporates NPMI technique with dynamic Ontology modeling for synonymous ontologies. Deng et al. [12] have proposed the Best Keyword Cover for searching using keywords with minimum inter-object distance. A nearest neighbor approach is imbibed for increasing the overall entities in Web search. The drawback of strictly query-specific Web search is not overcomed here without any preference given to the users’ choice. Ma et al. [13] have measured ontologies by achieving normalization of ontologies. Ontology normalization refers to removing those ontologies which are not a best fit to a domain. Shashi S et al. have proposed a novel framework using multi-agents for Web image recommendation. An object-centric approach for annotation and crawling of images has been proposed to overcome several existing problems.
Bedi et al. [14] have proposed a focused crawler that uses domain-specific concept type ontologies to compute the semantic relevance. The ontological concepts are further used in expansion of a search topic. Sejal et al. [15] have proposed a framework that recommends images based on relevance feedback and visual features. The historical data of clicked and unclicked images is used for relevance feedback, while the features are computed using the cosine similarity measure. Kalanditis et al. [16] have proposed a paradigm of locally optimized hashing with the justification of the fact that it requires set intersections and summations alone. Also, a clustering-based recommendation has been imbibed into the system using a graph-based strategy. Gerard Deepak and Priyadarshini [17] have proposed an Ontology-driven framework for image tag recommendation by employing techniques like Set Expansion, K-Means Clustering, and Deviation computation. The Modified Normalized Google Distance Measure is employed for computing the semantic deviations. Wang et al. [18] proposed a new methodology of multimodal re-ranking using a graph. This approach encompasses modal weights, distance metrics, and relevance scores together into a single platform. Chu and Tsai [19] have considered visual features for proposing a hybrid recommendation model to predict favorite restaurants. Content-based filtering and collaborative filtering is encompassed together to depict the importance of visual features considered.
3 Proposed System Architecture
The proposed system architecture of the Hybrid Semantic Algorithm is depicted in Fig. 1 and comprises of two individual phases. Phase 1 mainly concentrates on building of ontologies for the homonymous search keywords. The Ontology development need not be a homonym always but semantically similar or even slightly related ontological terms with several ontological commitments can be included to produce an essence of similarity searches. The phase 1 implementation is definitely with respect to the ontologies where a semantic meaning is imparted to the Web search algorithm. An ontological strategy is proposed to achieve the possibility of the relevance of images at a single step by mining the possible heterogeneous images based on the Ontological commitments for a specific domain-relevant ontological search term. Several homonyms need to be initially listed and together defined along with the several similar terms and are modeled as ontologies. The homonym lookup directory is a HashMap with a single key but multiple values. The key is the homologous Ontologies and the values are the descriptions of the Ontologies, and this enhances the classification of Ontologies. The various search paths for an individual ontological term is noted, and furthermore, their hierarchy is expanded based on similarity and relevance of search terms. An individual pathway is actually depicted and modeled as OWL ontologies that are used to conceptually establish the OntoPath for the proposed algorithm.
The actual Web image recommendation for user query search takes place in the phase 2 where the system accepts query from the user and processes the query. Upon query preprocessing, the Ontologies in the OntoBase are classified by the hard margin SVM based on the input query words. The semantic similarity is computed between the homologous terms relevant to the query obtained from the lookup directory and the class labels of the Ontologies. Based on the Axiomating Agents and the Description Logics of OWL Semantics, an OntoPath is established by hierarchically arranging the relevant ontologies. Based on the Ontologies in OntoPath, the semantically relevant images are yielded to the user. The user’s click on the image is also a driving force for the query to be expanded. The query is then expanded dynamically based on the OntoPath that was formulated recommending the images to the user. There is a dynamic input of user’s preferences based on the user click, and thereby a lot of irrelevant images are abstracted from the user increasing the overall recommendation relevance of the system. The semantic similarity is computed using the Adaptive Pointwise Mutual Information (APMI) measure is a modified version of the Pointwise Mutual Information (PMI) measure and is used to compute the semantic similarity. The APMI depicted in Eq. (1) is much better than the other variants of the PMI and is associated with an adaptivity coefficient y. The adaptivity co-efficient y depicted in Eq. (2) is associated with a logarithmic quotient in its numerator and its denominator. The adaptive coefficient when coupled with the PMI value enhances the overall performance of the system.
4 Implementation
The implementation is accomplished using JAVA as a programming language for the front end. The Ontology definition based on the ontological commitments of the homonymous or synonymous terms is modeled Using Protégé 3.4.8. The rendered Ontologies are in the OWL format, which incorporate the intelligence into the proposed algorithm shown in Table 1. The unstructured image data is stored in the image repository designed using MYSQL. Once the Ontologies are modeled and are integrated within the search environment by automatic Web crawling, the system is ready to query the user preferences for any search term.
5 Results and Performance Evaluation
The data sets for the experimentation are collected from the results of Bing and Google Image Search engines. The experimentation was done for 1492 out of which 1321 images were automatically crawled using a customized image crawler, and the remaining images were manually entered into the database. All the images were collected with their labels. Protégé was used for Ontology modeling. The results of various search queries are depicted in Table 2.
The performance is evaluated using precision, recall, and accuracy as metrics for the proposed algorithm and is depicted in Table 3. Standard formulae for precision, recall, and accuracy have been used for evaluating the system performance. The proposed Hybrid Semantic Algorithm yields an average precision of 94.42%, an average recall of 95.76%, and an average accuracy of 95.09%. The reason for a higher performance of the proposed Hybrid Semantic Algorithm is that it uses a homologous lookup directory which reduces the average classification time of the homonymous ontologies. The incorporation of hard margin SVM makes it quite feasible for initial classification of ontologies in the OntoBase. The use of APMI for computing the semantic similarity and capturing of user preferences by dynamic user clicks increases the precision, recall, and accuracy to a larger extent.
To facilitate the comparison of the performance of the proposed Hybrid Semantic Algorithm, performances of the MFE_CBIR, Hybrid Optimization Technique, and OntoRec were re-evaluated in the environment of the proposed system. The average performance of the chosen methodologies as well as the proposed Hybrid Semantic Algorithm is documented in Table 4. It is clearly inferable that the Hybrid Semantic Algorithm yields a better performance than all the systems used for comparison. The justification for a very high performance of the Hybrid Semantic Algorithm is that it uses APMI technique for semantic similarity computation and dynamically captures user intentions.
6 Conclusions
Images are the most intrinsic part of the WWW pages in the most recent times [20]. Retrieving the most relevant image is a tedious task. A Hybridized Semantic Algorithm is proposed for Web image recommendation that incorporates Ontology modeling for homonyms and canonically synonymous ontologies. The proposed approach requires Ontology authoring and description for homonyms as well as synonymous ontologies. The semantic similarity is computed using the APMI strategy and also involves the construction of dynamic OntoPath based on the homonyms lookUp directory as well as Ontology classification through SVM. The proposed strategy also involves a user click feedback for various classes of images recommended. A strategic query expansion technique based on the users’ choice in the OntoPath for a class of image as per users’ intention is implemented. The proposed Hybrid Semantic Algorithm yields an average accuracy percentage of 95.09 which is much better than the existing Web page recommendation systems.
References
Gordon, M., Pathak, P.: Finding information on the World Wide Web: the retrieval effectiveness of search engines. Inf. Process. Manage. 35(2), 141–180 (1999)
Goodchild, M.F.: A spatial analytical perspective on geographical information systems. Int. J. Geogr. Inf. Syst. 1(4), 327–334 (1987)
Ferrando, S.E., Doolittle, E.J., Bernal, A.J., Bernal L.J.: Probabilistic matching pursuit with gabor dictionaries. Sig. Process. 80(10), 2099–2120 (2000)
Kanaegami, A., Koike, K, Taki. H., Ohgashi, H.: Text search system for locating on the basis of keyword matching and keyword relationship matching. US Patent 5,297,039 (1994)
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339 (1995)
Gong, Z., Cheang C.W.: Multi-term web Query Expansion using wordnet. In: Database and Expert Systems Applications, pp. 379–388. Springer, Berlin (2006)
Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL (2001)
Kousalya, S., Thananmani, A.S.: Image mining-similar image retrieval using multi-feature extraction and content based image retrieval technique. Int. J. Adv. Res. Comput. Commun. Eng. 2(1), 4370–4372 (2013)
Dhonde, P., Raut, C.M.: Precise & proficient image mining using hierarchical K-means algorithm. Int. J. Sci. Res. Publ. 5(1), 1–4 (2015)
Umaa Maheshvari, A., Thanushkodi, K.: Content based fast image retrieval using hybrid optimization techniques. In: International Conference on Recent Advancements in Materials. J. Chem. Pharm. Sci. 102–107 (2015)
Deepak, G., Andrade: OntoRec: a semantic approach for ontology driven web image search. In: Proceedings of the International Conference on Big Data and Knowledge Discovery (ICBK), pp. 157–166 (2016)
Deng, Ke., Li, X., Lu, J., Zhou X.: Best keyword cover search. IEEE Trans. Knowl. Data Eng. 27(1), 61–73 (2015)
Ma, Y., Wang, C., Jin, B.: A framework to normalize ontology representation for stable measurement. J. Comput. Inform. Sci. Eng. 15(4) (2015)
Bedi, P., Thukral, A., Banati, H.: Focused crawling of tagged web resources using ontology. Computers & Electrical Engineering, vol. 39, no. 2, pp. 613–628. Elsevier (2013)
Sejal, D., Abhishek, D., Venugopal, K.R., Iyengar, S.S., Patnaik, L.M.: IR_URFS_VF: image recommendation with user relevance feedback session and visual features in vertical image search. Int. J. Multimed. Infor. Retr. 5(4), 255–264 (2016)
Kalantidis, Y., Kennedy, L., Nguyen, H., Mellina, C., Shamma, D.A.: LOH and behold: web-scale visual search, recommendation and clustering using Locally Optimized Hashing. In: European Conference on Computer Vision, pp. 702–718. Springer International Publishing (2016)
Deepak, G., Priyadarshini, S.J.: Onto tagger: ontology focused image tagging system incorporating semantic deviation computing and strategic set expansion. Int. J. Comput. Sci. Bus. Inform. 16(1) (2016)
Wang, M., Li, H., Tao, D., Ke, L., Xindong, W.: Multimodal graph-based re-ranking for web image search. IEEE Trans. Image Process. 21(11), 4649–4661 (2012)
Chu, W.-T., Tsai, Y.-L.: A Hybrid Recommendation System Considering Visual Information for Predicting Favorite Restaurants. World Wide Web, pp. 1–19 (2017)
Shekhar, S., Singh, A., Agrawal, S.C.: An object centric image retrieval framework using multi-agent model for retrieving non-redundant web images. Int. J. Image Min. 1(1), 4–22 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deepak, G., Sheeba Priyadarshini, J. (2018). A Hybrid Semantic Algorithm for Web Image Retrieval Incorporating Ontology Classification and User-Driven Query Expansion. In: Rajsingh, E., Veerasamy, J., Alavi, A., Peter, J. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-10-7200-0_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-7200-0_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7199-7
Online ISBN: 978-981-10-7200-0
eBook Packages: EngineeringEngineering (R0)