Keywords

1 Introduction

The Web is a global information space. With A rapidly increasing rate of information available to users through the Web, there is a pressing need for efficient information retrieval systems such as search engines, question answering systems, etc. Nowadays, those systems develop very fast and successfully. However they still suffer from a lack of accuracy and the relevance of the provided results is just not up to the mark. To solve this problem, Ontologies and semantic web are becoming a pivotal methodology to represent any domain-specific conceptual knowledge in order to promote the semantic capability of an information retrieval system [1].

The Semantic Web aims to extend the current web standards and technology so that all the Web contents and information can be processed by machines. The use of ontology in the search process provides an interaction between machine and human.

The traditional search is based on term matching techniques, which helps retrieving all resources containing the user’s query terms. While in a semantic search, queries that can be expressed in several ways, will be mapped on the semantic level to define topics related to the user informational need that must be retrieved from the web [1].

The question answering (QA) systems [2] aims at providing precise answers to the user’s questions. For example, for a question such as (What is the capital of Morocco?), traditional term matching search systems might return a large number of web pages about Morocco and the user would have to dig into these web pages to find the answer. While an efficient QA system would directly answer the question with the name of the capital “Rabat”. For that, a QA system needs an efficient natural language question processing mechanism to understand the users question and a semantic data source to get the exact answer for the question.

The QA system we propose in this paper, transforms the user’s questions in natural language to a query language (SPARQL or MQL). The last is then used to interrogate different online Knowledge Bases to return an exact answer to the user.

The paper is organized as follows: Section 2 describes the NL question processing. Section 3 presents the semantic data sources we used (the knowledge bases). Section 4 explains the proposed QA system. And finally, the conclusion and directions for future work.

2 Related Work

Researches in the field of Question Answering has been advanced in the past couple of years [2]. With the semantic web technologies a domain-specific QA system working on a specific technical domain can make use of the specific domain-dependent ontology to recognize the true meaning included in a natural language text. So we realize that the ontology plays a pivotal role in a technical domain.

One of the typical examples of a QA system is Jeeves [3] which allows users to ask natural language questions and returns a list of matching questions to which it knows the answer. Another example [4], is a research in information processing that has focused on health care consumers. These users often have a frustrated experience while seeking online information.

Other works such as [5, 6], have proved the feasibility of implementing an ontology based question answering system. However, the degree of complicity of these systems is considerable. And also the relevance of the answers is not optimal. GINSENG [7], a guided-input natural language search engine, and Cuebee [8] progressively guide the scientists by suggesting concepts and relationships that decompose the question into an RDF triple, which is then internally translated into a SPARQL query. This process demands more effort from the user and is a time consuming task.

Most of the studies focusing on ontology based QA systems use a domain local ontology which cannot answer a wide range of questions. In our research we made use of some global knowledge bases offering a huge amount of data to be interrogated and that are available for online access. Those data sources offers a wider range of relevant answers. We also focus on the simplicity of the system, using a simple graphical user interface assisted with autocomplete feature and an error handling component. Our system also offers the user the possibility to interrogate other knowledge base by transforming his question to a SPARQL query that he can copy to the other endpoints.

3 Natural Language Question Processing

The Natural Language Processing (NLP) is a research field that explores how computers can be used to understand and process natural language text or even speech to do useful things [9]. Once the user enter his question in a natural language, the system must process it and transform it to the query language. The question can be classified as follows:

  • What—objects specification or an activity definition

  • Who—object or person specification

  • When—date

  • Where—geographical location; …

There are some frameworks that can process natural language question and transform it to a query language such as NLTK [10] and Quepy [11].

In our system we made use of the last one, Quepy, which is a python framework because it can be easily adapted to different question types and query languages. Quepy uses NLTK tagger which is a linguistic tool to analyze natural language questions. It’s composed of: a tokenizer, a part-of-speech tagger and a lemmatizer.

So, once the user enters his question, for example: “Who is Bill Gates?” the framework runs NLTK tagger on the string and returns a list of quepy.tagger.Word objects. The transformation from natural language text to the SPARQL query is done by first using a special form of regular expressions:

And then using a convenient way to express semantic relations:

The rest is handled automatically by the framework to finally produce this SPARQL:

The SPARQL query is then sent to the knowledge base server which will return the answer. The system offer the user the possibility to choose and search in multiple knowledge bases. The next section will describe those knowledge bases.

4 Semantic Data Sources

Semantic knowledge base is a machine-readable resource for the dissemination of information, generally online or with the capacity to be online. A knowledge base is not a static collection of information, but a dynamic resource that may have a learning capacity, as part of an artificial intelligence [12].

Knowledge bases are playing a major role in optimizing the intelligence of Web and search systems and in supporting information integration [12]. Today, most knowledge bases cover only specific domains, created by relatively small groups of knowledge engineers and specialists, and are very cost intensive to keep updated as domains change.

In this system we used multiple knowledge bases in order to increase the chance of getting answers to every user’s question. For that, the system interrogates tree large scale publicly available knowledge bases that cover a wide range of domains.

The first one is DBPedia [13]; The English version of this knowledge base describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology, including 251,000 species, 1,445,000 persons, 735,000 places, 411,000 creative works, 241,000 organizations and 6,000 diseases.

Another important knowledge base used in our system is the Freebase [14]. A large collaborative knowledge base composed mainly by its own community members. This online collection of structured data was collected from many sources, including individual contributions, wikis, etc. Freebase aimed to create a global resource that allow people and machines to access and process information more effectively [15]. We also made use of the LOD knowledge base [16]. The used knowledge bases are connected with other data sets to cover even a wider range of domains. Other knowledge bases can be easily added to the system to cover more domains if needed.

5 Proposed Semantic QA System

Our Question Answering System includes three components: question processing based on the Quepy framework, knowledge base Interrogator and answer processing.

The Question Processing component’s job is to analyze and transform a natural language question to a SPARQL query. SPARQL (Protocol and RDF Query Language) [17] is an RDF query language, that is, a semantic query language for knowledge bases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

The knowledge base Interrogator is responsible for sending the query to the SPARQL endpoint of the knowledge base selected by the user. This model returns a set of RDF results. The Answer Processing component handles the returned results to filtrate and transform into a natural language result.

5.1 System Architecture

In this section, we present a high level overview of how the whole system works as pictured in Fig. 2. The aim of QA systems is to find exact and correct answers for user’s questions. In addition to the graphical user interface (GUI), our QA system contain three main components:

  1. 1.

    Question processing

  2. 2.

    Knowledge base Interrogator

  3. 3.

    Answer processing

Using the GUI the user enters a question in natural language and hits the search button. The Question Processing component analyses the user’s question to extract the keywords to be used in the query and prepares an adapted SPARQL and MQL query. The user then is asked to choose the knowledge base to interrogate. According to the user’s choice, the Question Processing component will send either the SPARQL query (For DBpedia and Linked Open Data) or the MQL query (For Freebase

The knowledge base Interrogator uses the SPARQL/MQL query to interrogate the selected knowledge base via the endpoint and then returns an RDF answer.

The Answer processing component gets the RDF answer and process it to extract the exact answer and any related information possible and transform it into a natural language answer. The answer is finally returned to the user via the GUI (Fig. 1).

Fig. 1
figure 1

Design Architecture of the QA system

5.2 Simulation

By accessing the interface of the application, you will find, like all search engines, a text field where you type your question (in English) and you search with the little orange button. You can also enjoy the “autocomplete” feature that helps you type the question (Fig. 2).

Fig. 2
figure 2

User interface of the QA system (1) while typing the question

Once you click the search button, three other buttons appear asking you to choose which knowledge base you want to interrogate: Freebase—DBPedia—Linked Open Data (LOD) (Fig. 3).

Fig. 3
figure 3

User interface of the QA system (2) after hitting the search button and choosing the knowledge base to interrogate

The answer to a question is not necessarily the same in all three knowledge bases it is also possible that you find the right answer in a single database or two. This is the reason why we used the three in this system instead of single one.

Sometimes the search engine cannot transform a question in natural language to a SPARQL or MQL query. For one reason or another, either because the question contains errors, a problem of capitalized names, or it was not well formulated.

5.3 Evaluation

After the implementation phase, we conducted some initial experiments on two versions of the system with the help of ten volunteers. The first version was based on a single ontology (DBpedia), while the second was based on the tree knowledge bases combined (DBpedia, Freebase and LOD). To evaluate the precision and relevance of the returned results we use the precision and recall method [1]. For that, we asked ten subjects (Si) to use the system and ask different questions. Once the experience was done we saved the users feedback in the following table (Table 1).

Table 1 Experimentation results

The goal of our ontology based semantic search system is to maximize precision and accuracy of the results by combining different knowledge bases. And we can see from the experimentation results in the following figures that the precision and recall values increases in the second version where we combined different knowledge bases (Figs. 4 and 5).

Fig. 4
figure 4

Precision—calculated for both versions based on each user’s feedback

Fig. 5
figure 5

Recall—calculated for both versions based on each user’s feedback

The precision graph shows how useful the search results are, while the recall graph shows how complete the results are. The experimentation results shows that when combining multiple ontologies, the system relevance rate increases, however an important number of questions didn’t have answers in the used knowledge bases so there is still room for improvements.

6 Conclusion and Future Work

The initial evaluation result shows the feasibility and benefits of building a semantic QA system based on Ontology. And other experiments in related works do prove that it is feasible to use the Ontology-based method to develop Question Answering Systems. Comparing our system with other Question Answering Systems in the Related Work section. Our system offers an easy to use user interface and offers the user to enter questions in natural language. Also combines multiple ontologies to increase the relevance and range of answers so our system can answer a wider range of user’s question in multiple domains. Another advantage of this system is that it returns exact answers to the user’s questions.

We have implemented a natural language question answering system based on multiple ontologies however there are still many features to implement, on the realized system, such as error handling, or a result relevance rating component.