Keywords

1 Introduction

Ontology is a description (like a formal specification of a program) of the concepts and relationships that can formally exist for an agent or a community of agents [1]. A domain ontology (or concept ontology) represents the conceptual model underlying a certain domain, describing it in a declarative way [2]. In the present time, majority of the data is in the textual form such as journals, documents, and website. Converting these unstructured data into ontologies requires a lot of time and manual work. So, there is high requirement of some kind of automation to make the task of conversion easier. To fulfill this requirement, we developed a tool that converts text into concept ontology with minimum human intervention. The tool uses three main methods relation extraction, rule-based mapping, and validator and storage which are described in later sections. This paper is organized as follows: Sect. 2 presents the related work on building ontologies. Details of the proposed system are provided in Sect. 3. Section 4 provides performance evaluation of the system, followed by the conclusion in Sect. 5.

2 Related Work

A lot of attention has been given toward the building of ontology over the years. A huge set of efforts have been put in the field of automatic building of ontologies, mainly focusing on domain (concept) ontology. Maddi et al. [3] presented the method of extraction of ontologies for text documents using linear algebraic method called as single-value decomposition for obtaining the concepts from terms and used bipartite ontology graphs to represent the results. Yeh and Yang [4] proposed a method of automatic ontology creation for historical documents from digital library using latent topic extraction and topic clustering. Bedini and Nguyen [5] presented a framework that evaluates the automation of ontology generation. The work also provides a comprehensive analysis of existing software. The work of Moreno and Sanchez [6] provides a methodology to build ontologies automatically and extracting information from web documents. The work of Gantayat and Iyer [7] for automatic building of ontology from lecture notes is available at several courseware repositories. It uses NLP for keyword extraction, term frequency inverse document frequency (tf-idf) for extracting the concepts from keywords, and “apriori algorithm” to determine associations among concepts. Gillam and Ahmad [8] have proposed statistical methods for extracting the concepts from the text. Most of the work done for building of ontology is for a specific domain, whereas the proposed tool is a generic system that can build any ontology in less time under the domain expert (user) supervision, irrespective of its domain. Moreover, final decision is in the hands of domain expert, so it prevents the insertion of ambiguous and incorrect information into the ontology. This tool generates standard OWL ontology which can be accessed and manipulated in ontology editors such as protégé [9].

3 Proposed Approach

The proposed tool is a user-friendly generic semi-automated ontology builder which uses relation extraction and rule-based mapping procedure to build concept ontologies from text (Please refer Fig. 1). The ontology builder creates the concept ontology, besides domain learning during the building process. With time and enough learning, the system is able to produce concept ontology in less time with minimal human intervention. It prunes the input texts and converts it into basic form, i.e., triples (subject, predicate, and object) using relation extraction and adds it to ontology after validation from user. If builder is not able to recognize the semantics of the information, user will provide the semantics, which will intern add to the learning of the software, so that it becomes capable to recognize the similar semantics of the information in future and is able to add the correct information in the ontology.

Fig. 1
figure 1

Proposed system flow

The proposed system contains three modules analyzer subsystem, mapper subsystem, and triple validator and storage. Home screen of the tool provides the interface for the features like starting a new ontology building session, viewing the current ontology file, importing and exporting of OWL file, rule file, and performance CSV file which are explained below.

3.1 Analyzer Subsystem

Analyzer subsystem performs the task of relation extraction. It processes the input text and presents all the possible relations that can be extracted from the given sentence by passing the text through OpenIE component [10] of “Stanford CoreNLP API [11]”. User then selects the best relation among the system suggestions, which is then passed to the mapper subsystem for further processing.

3.2 Mapper Subsystem

The mapper uses a relation that was extracted using analyzer, schema of the concept ontology, and the rule file which defines certain mapping rules to map predicate text to an appropriate OWL property. Mapping of subject/object text is done using label annotations of existing individual in the ontology. If the predicate text is mapped to a data property, then object and subject text are mapped to a literal and an individual, respectively. But, both subject and object texts are mapped to an individual if predicate is mapped to an object property.

Rule File

The rule file contains rule of format “Predicate text” → “OWL property IRI”. Rules are arranged in the lexical order of predicate text for efficient searching. During the mapping process whenever user explicitly maps predicate text to an OWL property, then corresponding rule is added to the rule file.

Working of Mapper

The mapper first tries to map subject, predicate, and object itself (Ref. Fig. 2). In case the mapper does not find an appropriate solution, it recommends the possible options to be selected by a user manually. After mapping all the components of relation, it creates appropriate OWL triples using OWL API [12]. User is asked to give a feedback from scale of 0–5 as on what level tool was helpful. User feedback is saved in a CSV file with the help of Java CSV API.Footnote 1 The resulted triples are then passed to the next module, i.e., Triple validator and storage system.

Fig. 2
figure 2

Mapper screen

3.3 Triple Validator and Storage

Validator module checks for the existence of all the triples received from mapper, in the ontology either as it is or in the form of inference from the existing information in ontology using HermiT Reasoner [13]. “Already Exists” status depicts that triple information is already present in the ontology, and “New” status depicts that the triple is added to the ontology as it was not there earlier. After adding all the triples, ontology is checked for the consistency using HermiT Reasoner [13]. In case of inconsistency, changes are not reflected in the original ontology file.

4 Experimental Results

We created ontology on “Technology Classification” for experimental purpose. The ontology classifies various fields and subfields of science and technology. The ontology and relevant text data (collected from various sources) were fed into the system. Experiment was performed over four sessions under supervision of domain expert to display the building process. Growth of ontology, generated rule file, and performance of the system were captured. Ontology graph was created using SOVA plug-inFootnote 2 for Protégé ontology editor [9].

4.1 Technology Classification Ontology

Ontology is illustrated in Fig. 3. Ontology schema “Technology Classification” fed to the application was created using protégé ontology editor [9] and was fed to the system as ontology schema.

Fig. 3
figure 3

Ontology schema “Technology Classification” fed to the application

Figure 4 illustrates the populated “Technology Classification” ontology which was obtained after passing significant amount of text data into the system.

Fig. 4
figure 4

Ontology obtained after processing through application

4.2 Rule File Generation

All rules formed during the mapping procedure for the predicate text part of the relation are stored in rule file. The system refers to this rule file for mapping purpose.

4.3 Performance

A user may follow the guidelines (tabulated in Fig. 5) while providing feedback regarding the mapping accuracy.

Fig. 5
figure 5

Feedback guidelines regarding mapping

For subject/object, 0, 1, and 2 stand for new resource, existing resource but not mapped by the system, and existing resource which is mapped by the system, respectively. Similarly, for predicate, 0 and 1 stand for predicate which was not mapped to a property and mapped to a property, respectively.

Figure 6 shows the performance of the tool during four sessions. It shows that as the process of ontology building continues, the system is able to efficiently convert text to RDF triples, making the ontology building process faster.

Fig. 6
figure 6

The tool performance (rating based)

5 Conclusion

A tool for building ontology from text was proposed in the paper. The design of the proposed tool is modular with three components: analyzer subsystem, mapper subsystem, and triple validator and storage. The experiment was performed on technology classification ontology. It was noted that system’s performance and results are greatly dependent upon the ontology schema and the input text fed to the tool. The ontology building is a learning process, and the perceived knowledge is a result of user’s interaction with the system. It was also noted that whatever be the ontology, after converting a certain amount of text (which covers good amount of domain’s knowledge) into triples, the system was able to get trained and was able to make correct recommendations. As the future work, the analyzer subsystem module can be improved to handle complex sentences.