Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In recent years, and mainly because of the arrival of the web, more and more collections of data are becoming available to everyone in fields ranging from biology to economy or geography. One of the consequences of this fact is that end users, but not experts in Computer Science, demand easy ways to retrieve data from these collections.

Beginning in 1975 with Query By Example (QBE) [39] there have been many proposals in this direction, that is, to facilitate the work of the final user. In [8], the authors reviewed the so-called Visual Query Systems (VQS) from 1975 to 1996 defined as “systems for querying databases that use a visual representation to depict the domain of interest and express related requests”.

In this paper, we extend the review from 1997 to date, concentrating our efforts on visual queries to structured information, for example, queries to underlying relational or XML databases. We do not consider the typical search on semistructured documents such as web pages through search engines like Google. Although they are also a good solution for end-users, in this survey we do not take into account natural language interfaces for database query formulation.

The main goal of this survey is to answer the following question: To what extent have the VQS been the solution for novel users for querying databases?

To answer this question, we have studied two features: web availability of and validation undergone by the systems. The first feature indicates that the system was designed to be reached easily by novel users simply by means of a web browser, without the burden of installation and with universal availability. The second feature indicates the widespread use of VQSs in practice. Thus, the more systems commercially available, the greater the extension reached by VQSs.

The short answer to the question is that, as far as we know, there is only one system commercially available and designed for the web: Polaris [34].

Moreover, we have included two basic features extracted from the paper [8]: the visual representation adopted to present the reality of interest and the visual representation adopted to express queries. With respect to web features, we have also considered relevant whether the prototype deals with data formatted for the web, that is, XML data or RDF data.

The rest of the paper is organized as follows. In Sect. 2 we state the method followed for elaborating the survey and we briefly describe the values of the relevant features included in the paper. Finally, in Sect. 3, we have drawn several conclusions about the VQSs.

2 Statement of the Method

A survey about a particular object must determine the relevant features of the object with respect to a particular purpose. Once the features have been determined, the next step is to find the possible values of these features. Finally, we have to determine the best combinations of the pairs (feature, value) for the particular purpose.

Usually, we can extract the relevant features and their possible values from published papers about the object, by assuming features in their entirety or by adapting them to new perspectives appearing after the papers have been published. Moreover, we can add features detected by ourselves which were not previously included in any paper.

The survey develops through several steps, which are usually interspersed. In the first step, a complete search of sources determines the candidate papers that deal with the object. In the second step, the relevant features of the object with respect to the particular purpose are determined.

Our object in this survey are the visual query systems with the purpose of facilitating querying databases to non expert in Computer Science users.

The survey [8] reviews up to 80 references from 1975 until 1996 used for querying traditional databases. For this survey, we have searched for papers related with VQS from 1997 to date and we have found 194 candidate papers. Next, we have discarded papers about query languages but without visual part (122) and papers about natural language query languages (8) because they deserve a separate survey. In the remaining 64 works, we have determined sets of ‘similar papers’ and we have discarded all but one paper in each set. A set of similar papers is composed of several papers built on different aspects of the same idea for a VQS. They also include preliminary versions of the VQS which were later on subsumed by more complete journal publications. We have found 30 similar papers. So, we have discarded 122 + 8 + 30 papers, that is, 160 papers. As a result, the number of papers reviewed in this survey is 34.

As for relevant features, we have extracted the following from the survey of Catarci [8]: Visual representation adopted to present the reality of interest and visual representation adopted to express queries. The values of these features have been determined from the work [8] and from other papers, such as [11], where the faceted option appeared. For answering the question of this paper, we have added the following features: Web orientation and validation.

Let us explain briefly each of the features as well as their values.

2.1 Visual Representation Adopted to Present the Reality of Interest

This feature has been borrowed from the work of Catarci [8]. The reality of interest is modeled by a designer by means of a data metamodel as, for example, the entity/relationship metamodel or a graph data metamodel. As a result of the modelization process, a data model is obtained and it is presented to the user so that (s)he formulates queries on it.

The ways the data model is presented to the user are briefly described next and a more detailed explanation of some of the papers is given in [20].

Diagram-based. Data metamodels come with an associated typical representation for their elements. For example, in the entity/relationship metamodel, there are many representations available and one of them consists of drawing rectangles for the entity types, diamonds for the relationship types and ovals for the attributes. In the diagram-based option, the user has available a diagrammatical representation of the data model elaborated with the typical graphical representation for the elements of the metamodel.

Icon-based. Unlike the diagram-based approach, in this representation there are only iconic representations of some elements of the data model, but the user does not have available the complete data model. According to Catarci [8], ‘these VQS are mainly addressed to users who are not familiar with the concepts of data models and may find it difficult to interpret even an E-R diagram’. The aim of the icons is to represent a certain concept by means of its metaphorical power. The problem of these systems is how to construct them in such a way that they express a meaning which is understandable without ambiguity to the users.

Form-based. The typical forms of web pages serve for presenting the extensional database. This occurs in papers such as [34].

Faceted. The data are modeled as faceted classifications which organize a set of items into multiple, independent taxonomies. Each classification is known as a facet and the collection of classification data is faceted metadata. The specific category labels within a facet are facet values. For example, the set of items can be architectural works. For these items, the facets are the architect, the location or the materials. The facet values for materials are stone, steel, etc.

Unknown. As the data model always exists, this option refers to the case where the data model is unknown. For example, the data model may be presented in a paper in textual form but there is no explanation about the way it is presented to the user. For example, paper [26] hides the database and tries to guess the paths for the query from the entities chosen by the user.

2.2 Visual Representation Adopted to Express the Queries

This feature has been borrowed from the work of Catarci [8] and we have adapted it to the object of the survey by adding the Faceted value.

The ways the queries are formulated are briefly described next and a more detailed explanation of some of the papers is given in [20].

Diagram-based. The diagram-based option means that the query is expressed on a diagrammatic representation of the data model.

Icon-based. The icon-based option includes two cases. In the first case, the system offers icons for representing the elements involved in the query. For building a query, the user drags and drops the appropriate icons into a canvas. The second case is the same as in [8], where the icons ‘denote both the entities of the real world and the available functions of the system’.

Form-based. Another way to facilitate the query is the form option where the user composes the query by completing options of different elements of a form. The drawback is that the query logic of the end-user does not always fit into a form.

Faceted. We have added as a new value ‘Faceted’ for describing a system which includes data and metadata in the same page. There, the user specifies the query by clicking on the appropriate links. We have found this situation only in one paper [11].

2.3 Web Orientation

For the web orientation, we have selected two features which are not mutually orthogonal. The first feature is whether the prototype is working on the web or has been conceived to be used in local mode. For the first situation, the value is Available on the web and this means that the final user can query the database by means of a prototype which is working on the web. The two values are: There is no web orientation and Available on the web. The second feature indicates whether the user can query data formatted for the web and the values are: Data not formatted for the web, Query XML data, Query RDF data. The values are not orthogonal. So, a paper can have the two values. This is the case, for example, of paper [7].

2.4 Validation

The validation of an idea can be done from several points of view. Regarding query systems, there are, at least, two dimensions: usability and performance.

For example, paper [10] focuses on performance and explains query rewriting techniques that improve the query evaluation performance so that the query execution time is reduced. However, in this paper we concentrate on the usability dimension, that is, the experiments made with users in order to determine the ease of use of the proposed prototype. For this feature, the list of values is: Only prototype, Prototype tested with users, Prototype tested in a real environment, Commercial tool.

Next, we describe briefly each value of this feature. The option only prototype means that a prototype has been built but no test has been made with users. The value prototype tested with users means that several experiments have been carried out in order to determine the usability of the prototype. The value prototype tested in a real environment means that it has been used for real tasks in a particular setting, for example in a department of a university. Finally, the option commercial tool means that the VQS has been fully implemented, offered to the public and is in real use in diverse installations.

Table 1. Visual query systems (1997–2003)
Table 2. Visual query systems (2004–2015)

3 Discussion

The arrival of the web brought with it more facilities for users to query databases. As a consequence, users expect to access easily through the web databases situated anywhere in the world.

For expert users, one solution is to express queries in query languages such as SQL or XQuery. However, for novice users whose main concern is to extract data from the database but not the query languages themselves, learning SQL or XQuery is a huge task that is very far from their main concern.

One solution for novice users is to hide the complexity of query languages behind a visual scenery where it is supposed that the complexity is softened with the aid of visual metaphors. This is the idea of Visual Query Systems (VQS) defined in [8] as “systems for querying databases that use a visual representation to depict the domain of interest and express related requests”.

In this paper, we have reviewed basic features of Visual Query Systems, such as the representation of databases and the representation of queries. We have also considered the feature of accessing data formatted for the web. Finally, we have reviewed two features we consider relevant to determine whether the VQSs ease querying for novel users: web availability and validation. Next, we discuss the results for each of these features.

The majority of papers offer a diagrammatic representation of the database, only four papers an iconic one [2, 13, 25, 33] and one paper with form representation [34]. For several reasons, there are many papers whose database representation is unknown. For example, paper [26] hides the database and tries to guess the paths for the query from the entities chosen by the user.

With respect to the query representation, the distribution is more balanced between the icon (12 papers), the diagram (11 papers) and the form (8 papers) representation. A special form of query, the faceted one, appears only in one paper [11].

Regarding the data format, there are 9 papers [1, 4, 7, 10, 12, 16, 22, 27, 30] out of 34 which query XML data and only two papers which query RDF data [15, 17]. The rest of the papers do not query web data.

The rest of the features we have identified deal with the main question we have formulated in this paper, that is, to what extent have the VQS been the solution for novel users for querying databases?

For answering this question with respect to the web availability, we can distinguish two periods. From 1997 to 2003 (see Table 1), when the web usage was beginning to spread, there was only one paper oriented to the web [13]. This was very understandable because of the time needed for reorienting the research into the new web setting. In the period 2004 to 2015, only papers [6, 7, 11, 34] propose a web implementation (see Table 2). Although the number of web oriented papers in this period is greater than in the 1997–2003 period, the low number of papers indicates that web orientation has scarcely been taken into account.

For the validation feature, we have found a great number of papers which have only a prototype or have been tested with users in reduced experiments. Only three prototypes have been tested in real environments [16, 29, 36] and we have found only one commercial tool [34]. So, few papers go beyond testing the prototype with a few users.

As a conclusion of these two features, very few papers are web oriented and also very few papers offer a prototype which has been tested in a real environment. In fact, the combination of both features is only found in paper [34]. Then, although the visual query systems seem to be a great idea for easing the query process for novice users, the reality is that very few papers describe real implementations.

So, the answer to the main question of the paper is that, for the moment, VQSs have not been a widely accepted solution for novel users. From this observation a new, more general question arises: Is there any solution for easing the specification of queries?

If the answer is no, novel users have to learn by themselves query languages or they have to ask computer experts for the specification of queries. In the latter case, no new research would be needed in this field. If the answer is ‘we do not know’, then new research is required in order to find simple visual query languages which help novice users.

We strongly believe that the idea of VQSs is a good one and that the research should continue in this direction. Recent papers such as [19] also support the idea that a solution for naive users is not available but is necessary in this world in which the use of databases is democratized. The paper proposes as a solution visual systems in which the user writes examples of queries and the system extracts and specifies the desired query in the corresponding query language.