Keywords

1 Introduction

Web search is an important technique for retrieving the required information from the World Wide Web. The Web is an affluent informational repository [1] where the information density is the highest. Web searches take place through search engines [2] which are an implementation of several Web mining techniques and algorithms that makes the mining of the Web quite easy and efficient. When a Web page or a Web content document need to be retrieved, a simple approach involving calculating the semantic similarity [3] between the query and Web documents, probabilistic matching [4], keyword matching [5], etc., is traditionally applied in order to retrieve the text content, whereas when an image relevant to a query has to be retrieved a problem arises. There are several terms which are homonyms which may be spelt the same but have different meaning. Image search is an important application of Web mining where the search engine must be able to extract the required images as per the query in a manner such that the images obtained must be highly relevant to the query that is input by the user. The user’s intention as well as the retrieved images must have a high degree of correlation. Also, the search engine must be able to distinctly retrieve all the unique images for the query involving homonyms, synonyms, and also, several unique elements for the search query must be displayed. The users’ intention must act as a driving force to display the images of high correctness and satisfy the users’ need for image search. The search efficiency must still be maintained, i.e., the number of relevant images for a specific query based search must be maximized.

Motivation: Most of the existing systems for Web image retrieval have a query-oriented perspective [6] and are not user-oriented. The ultimate goal of any retrieval system must be based and directed as per the users’ choice, thus satisfying the user’s need for the images. Certain existing systems which capture user’s preferences still do not make a mark as they neglect the perspective of the user to give the best results. This enhances the noise [7] of Web image search and increases the irrelevance of images retrieved in the context of Web image search that needs to be overcome.

Contribution: An ontological approach that deals with modeling appropriate ontologies for homonyms is proposed. Aggregation of several semantically similar classes and then establish a hierarchical pathway for ontologies for classifying images based on the query input. The proposed system also captures the individual user’s choice and then retrieves all the possible images based on the user’s intention. The proposed system also incorporates SVM for classifying the ontologies based on their query terms. An APMI strategy is incorporated for semantic similarity computation. Also, the incorporation of homonym lookup table reduces the overall response time of the recommendation process.

Organization: This paper is organized as follows. Section 2 provides an overview of the related research work. Section 3 presents the proposed system architecture. Implementation is discussed in the Sect. 4. Performance evaluation and results are discussed in Sect. 5. This paper is concluded in Sect. 6.

2 Related Work

Kousalya and Thananmani [8] have put forth content-based image retrieval technique with multifeature extraction that involves the extraction of graphical features from the image. Euclidean distance is used for similarity computation in this approach. Dhonde and Raut [9] used the hierarchical k-means strategy for the retrieval of images from the Web and have increased the overall Web search proficiency. Umaa and Thanushkodi [10] proposed a content-based image retrieval technique using hybrid methodologies. Amalgamation of several methods like cosine transforms, wavelet transforms, feature point extractions, Euclidean distance computations is incorporated. Deepak and Andrade [11] have proposed OntoRec algorithm that incorporates NPMI technique with dynamic Ontology modeling for synonymous ontologies. Deng et al. [12] have proposed the Best Keyword Cover for searching using keywords with minimum inter-object distance. A nearest neighbor approach is imbibed for increasing the overall entities in Web search. The drawback of strictly query-specific Web search is not overcomed here without any preference given to the users’ choice. Ma et al. [13] have measured ontologies by achieving normalization of ontologies. Ontology normalization refers to removing those ontologies which are not a best fit to a domain. Shashi S et al. have proposed a novel framework using multi-agents for Web image recommendation. An object-centric approach for annotation and crawling of images has been proposed to overcome several existing problems.

Bedi et al. [14] have proposed a focused crawler that uses domain-specific concept type ontologies to compute the semantic relevance. The ontological concepts are further used in expansion of a search topic. Sejal et al. [15] have proposed a framework that recommends images based on relevance feedback and visual features. The historical data of clicked and unclicked images is used for relevance feedback, while the features are computed using the cosine similarity measure. Kalanditis et al. [16] have proposed a paradigm of locally optimized hashing with the justification of the fact that it requires set intersections and summations alone. Also, a clustering-based recommendation has been imbibed into the system using a graph-based strategy. Gerard Deepak and Priyadarshini [17] have proposed an Ontology-driven framework for image tag recommendation by employing techniques like Set Expansion, K-Means Clustering, and Deviation computation. The Modified Normalized Google Distance Measure is employed for computing the semantic deviations. Wang et al. [18] proposed a new methodology of multimodal re-ranking using a graph. This approach encompasses modal weights, distance metrics, and relevance scores together into a single platform. Chu and Tsai [19] have considered visual features for proposing a hybrid recommendation model to predict favorite restaurants. Content-based filtering and collaborative filtering is encompassed together to depict the importance of visual features considered.

3 Proposed System Architecture

The proposed system architecture of the Hybrid Semantic Algorithm is depicted in Fig. 1 and comprises of two individual phases. Phase 1 mainly concentrates on building of ontologies for the homonymous search keywords. The Ontology development need not be a homonym always but semantically similar or even slightly related ontological terms with several ontological commitments can be included to produce an essence of similarity searches. The phase 1 implementation is definitely with respect to the ontologies where a semantic meaning is imparted to the Web search algorithm. An ontological strategy is proposed to achieve the possibility of the relevance of images at a single step by mining the possible heterogeneous images based on the Ontological commitments for a specific domain-relevant ontological search term. Several homonyms need to be initially listed and together defined along with the several similar terms and are modeled as ontologies. The homonym lookup directory is a HashMap with a single key but multiple values. The key is the homologous Ontologies and the values are the descriptions of the Ontologies, and this enhances the classification of Ontologies. The various search paths for an individual ontological term is noted, and furthermore, their hierarchy is expanded based on similarity and relevance of search terms. An individual pathway is actually depicted and modeled as OWL ontologies that are used to conceptually establish the OntoPath for the proposed algorithm.

Fig. 1
figure 1

Proposed architecture of Hybrid Semantic Algorithm

The actual Web image recommendation for user query search takes place in the phase 2 where the system accepts query from the user and processes the query. Upon query preprocessing, the Ontologies in the OntoBase are classified by the hard margin SVM based on the input query words. The semantic similarity is computed between the homologous terms relevant to the query obtained from the lookup directory and the class labels of the Ontologies. Based on the Axiomating Agents and the Description Logics of OWL Semantics, an OntoPath is established by hierarchically arranging the relevant ontologies. Based on the Ontologies in OntoPath, the semantically relevant images are yielded to the user. The user’s click on the image is also a driving force for the query to be expanded. The query is then expanded dynamically based on the OntoPath that was formulated recommending the images to the user. There is a dynamic input of user’s preferences based on the user click, and thereby a lot of irrelevant images are abstracted from the user increasing the overall recommendation relevance of the system. The semantic similarity is computed using the Adaptive Pointwise Mutual Information (APMI) measure is a modified version of the Pointwise Mutual Information (PMI) measure and is used to compute the semantic similarity. The APMI depicted in Eq. (1) is much better than the other variants of the PMI and is associated with an adaptivity coefficient y. The adaptivity co-efficient y depicted in Eq. (2) is associated with a logarithmic quotient in its numerator and its denominator. The adaptive coefficient when coupled with the PMI value enhances the overall performance of the system.

$$APMI\left( {m;n} \right) = \frac{{pmi\left( {m ;n} \right)}}{p\left( m \right)\left( n \right)} + y$$
(1)
$$y = \frac{{1 + { \log }\left[ {p\left( {m,n} \right)} \right]}}{{p\left( n \right)\log [p\left( m \right)\left] { - p\left( m \right){ \log }} \right[p\left( n \right)]}}$$
(2)

4 Implementation

The implementation is accomplished using JAVA as a programming language for the front end. The Ontology definition based on the ontological commitments of the homonymous or synonymous terms is modeled Using Protégé 3.4.8. The rendered Ontologies are in the OWL format, which incorporate the intelligence into the proposed algorithm shown in Table 1. The unstructured image data is stored in the image repository designed using MYSQL. Once the Ontologies are modeled and are integrated within the search environment by automatic Web crawling, the system is ready to query the user preferences for any search term.

Table 1 Proposed Hybrid Semantic Algorithm for Web page recommendation

5 Results and Performance Evaluation

The data sets for the experimentation are collected from the results of Bing and Google Image Search engines. The experimentation was done for 1492 out of which 1321 images were automatically crawled using a customized image crawler, and the remaining images were manually entered into the database. All the images were collected with their labels. Protégé was used for Ontology modeling. The results of various search queries are depicted in Table 2.

Table 2 Results yielded for various search queries

The performance is evaluated using precision, recall, and accuracy as metrics for the proposed algorithm and is depicted in Table 3. Standard formulae for precision, recall, and accuracy have been used for evaluating the system performance. The proposed Hybrid Semantic Algorithm yields an average precision of 94.42%, an average recall of 95.76%, and an average accuracy of 95.09%. The reason for a higher performance of the proposed Hybrid Semantic Algorithm is that it uses a homologous lookup directory which reduces the average classification time of the homonymous ontologies. The incorporation of hard margin SVM makes it quite feasible for initial classification of ontologies in the OntoBase. The use of APMI for computing the semantic similarity and capturing of user preferences by dynamic user clicks increases the precision, recall, and accuracy to a larger extent.

Table 3 Performance analysis of Hybrid Semantic Algorithm

To facilitate the comparison of the performance of the proposed Hybrid Semantic Algorithm, performances of the MFE_CBIR, Hybrid Optimization Technique, and OntoRec were re-evaluated in the environment of the proposed system. The average performance of the chosen methodologies as well as the proposed Hybrid Semantic Algorithm is documented in Table 4. It is clearly inferable that the Hybrid Semantic Algorithm yields a better performance than all the systems used for comparison. The justification for a very high performance of the Hybrid Semantic Algorithm is that it uses APMI technique for semantic similarity computation and dynamically captures user intentions.

Table 4 Comparison of performance of Hybrid Semantic Algorithm with other systems

6 Conclusions

Images are the most intrinsic part of the WWW pages in the most recent times [20]. Retrieving the most relevant image is a tedious task. A Hybridized Semantic Algorithm is proposed for Web image recommendation that incorporates Ontology modeling for homonyms and canonically synonymous ontologies. The proposed approach requires Ontology authoring and description for homonyms as well as synonymous ontologies. The semantic similarity is computed using the APMI strategy and also involves the construction of dynamic OntoPath based on the homonyms lookUp directory as well as Ontology classification through SVM. The proposed strategy also involves a user click feedback for various classes of images recommended. A strategic query expansion technique based on the users’ choice in the OntoPath for a class of image as per users’ intention is implemented. The proposed Hybrid Semantic Algorithm yields an average accuracy percentage of 95.09 which is much better than the existing Web page recommendation systems.