Keywords

1 Introduction

In [6,7,8] we widely studied tag clouds and proposed a standard method to generate them. These works were focused on the extraction of valuable information stored in textual databases. The main aim was to represent this information to users with no previous knowledge of a textual database content.

With this objective in mind, we established a complete methodology for text processing. This methodology includes the tasks of syntactic and semantic preprocessing, generation of an intermediate form, postprocessing and visualization through a tag cloud. The novelty of the proposal was the preservation of text semantics, due to the fact that related terms could remain together in the visualization, that is, the tag cloud generated was a multi-term tag cloud.

To evaluate the tag cloud obtained through this methodology when used for text retrieval, we used the precision, recall and F1 Score metrics [2]. But, to evaluate it as a tool of content representation, we did not find adequate metrics in the literature, thus modifying some of those proposed in [9] to assess the tag cloud obtained from the query results.

In this work, we establish a formal definition for the modified metrics, “coverage” and “overlap”, and propose a new one, the “disparity”. The last one fixes the inconveniences found in other existing metrics such as balance or entropy. We expose these inconveniences in the next section. In addition, we give a formal definition for the balance, establishing a new way of calculating it through OWA operators.

This paper is organized as follows: Sect. 2 presents a brief summary of the existing metrics to evaluate the tag cloud, exposing their inconveniences for evaluation of content representation in tag clouds. Section 3 proposes new metrics and illustrates the proposal with an example. Finally, Sect. 4 gives some conclusions and future work.

2 Existing Metrics

There are not a lot of metrics in literature to evaluate the goodness of a tag cloud and much less when it works as a tool for content representation.

We found the first in [1]. The authors define the entropy of a tag cloud as follows:

Let t \(\in \) T be a tag in a tag cloud T:

$$\begin{aligned} Entropy(T)=-\sum _{t \in T}p(t)log\{p(t)\} \end{aligned}$$

where

$$\begin{aligned} p(t)=\frac{weight(t)}{\sum _{t \in T}weight(t)} \end{aligned}$$

Entropy quantifies the weight disparity between tags. If it is low, the tag cloud is significant or effective. If, on the contrary, it is high, the weights of the tags are uniform, which visually is not very informative. A tag cloud will be effective if it consists of significant tags.

The inconvenience of this metric is that it is unbounded, so it is difficult to know when its value can be considered high or low.

A set of metrics to capture the structural properties of the tag cloud generated from the query results is defined in [3, 9]. These metrics are:

  1. 1.

    Coverage. Gives the fraction of the query set \(C_q\) covered by the tag cloud S. This metric takes values between 0 and 1. If it is close to 0, the tag cloud covers few objects in the query set, but if it is close to 1, it covers many objects.

  2. 2.

    Overlap. Different tags in S may be associated with the same objects in \(C_q\). With this metric the extension of such redundancies is evaluated. This metric also takes values in the interval [0, 1]. If it is close to 0, there is little overlap. If, on the contrary, the value of this metric is close to 1, the overlap is high and the tags are not very different from each other.

  3. 3.

    Cohesiveness. Measures the closeness of the objects in each query set associated to each tag in the tag cloud, according to the relationships between these objects.

  4. 4.

    Relevance. It is defined as the overlap between the set of the results obtained with the query (\(C_q\)) and the set of the objects retrieved with each tag (\(C_t\)). It is calculated as the fraction of results in \(C_t\) that are also in \(C_q\).

  5. 5.

    Popularity. A tag in S is popular in \(C_q\) if it is associated with many objects in \(C_q\).

  6. 6.

    Independence. Two tags in S are independent if the objects they recover are not similar to each other. The metric in this case is similar to the cohesiveness, but the latter is calculated for each pair of tag sets.

  7. 7.

    Balance. A tag cloud S is balanced if its tags represent a similar number of objects in \(C_q\). The balance takes values in the range [0, 1]. A tag cloud is considered to be balanced if the value of this metric is close to 1. This metric is calculated as the fraction between the minimum and the maximum number of objects retrieved through a tag, so only two values are considered for its calculation.

From these metrics, only the coverage, overlap, and balance are suitable for its adaptation to evaluate the tag cloud generated with the purpose of representing text content. The other metrics are only useful for tag clouds coming from query sets or if they are calculated for one or two isolated tags.

Furthermore, the balance metric has the inconvenience that for its calculation, only two tags are considered, the tag which more objects represents and the one representing less. A tag cloud is said to be unbalanced if this metric is close to 0, but if we think for example in a tag cloud with all tags representing the same number of objects except for one tag which differs so much from the others (see tag cloud in Fig. 1 where tag8 is much smaller than the others), we could say that this tag cloud is unbalanced when it is not. For this reason, this metric does not seem appropriate to us.

Fig. 1.
figure 1

Example of balanced tag cloud said to be unbalanced according to [9]

In [5] the metric Selectivity is also proposed, which measures the number of objects filtered in a tag cloud when a tag that has no relation with the former tag is selected. Other metrics such as Simplicity or Detailedness can be found in [4] to evaluate the cluster grouping into the tag clouds.

3 Metrics Proposed for the Evaluation of the Tag Cloud as a Tool of Content Representation in Textual Databases

In [7, 8] we defined a methodology of text processing in databases that had as final step the visualization of the text through a tag cloud. This representation helps in content identification and in querying and browsing tasks, for textual data.

To evaluate the goodness of the information retrieved through the tags, we used the precision, recall and F-Score metrics [2], which are very standardized measures for these purposes. But, for evaluating the tag cloud as a content representation tool, we find few specific metrics that do not meet our requirements.

We take the metrics “coverage” and “overlap” from [9] and adapt them to evaluate the tag cloud as a tool of content representation. The metric “balance” has the inconvenience of taking only two values for its calculation, the minimum and maximum weights, being a metric very influenced by the extreme values, which could lead us to erroneous conclusions.

A way to calculate the balance avoiding these abnormal values in the weight is excluding first the outliers. Values that exceed 1.5 times the interquartile range are considered to be outliers. In this way the influence of values due to errors in the data would be avoided, but still only two values would be considered for the calculation of the balance.

Next, we define three metrics to evaluate the tag cloud when it is used as a tool for content representation in textual databases: the coverage, the overlap and the disparity. The third one is related with the balance and the entropy, but is bounded and considers all the tags for its calculation. In addition, a new way of calculating the balance through OWA operators is proposed, avoiding first the outliers as we exposed above.

3.1 Coverage, Overlap and Disparity

Let \(x_i\) be a tag of a cloud X and \(t_i\) be a tuple in a set of tuples T, then we call \(T(x_i)\) to the set of tuples associated to a tag \(t_i\).

We calculate the coverage of X as:

$$\begin{aligned} cov(X)=\frac{card(\cup _{x_i \in X}T(x_i))}{card(T)} \end{aligned}$$
(1)

This metric takes values in the interval [0, 1]. A value close to 1 indicates that the tag cloud represents most of the content of the database.

The overlap is calculated through the expression:

$$\begin{aligned} over(X)=avg_{i \ne j}\left( \frac{card(T(x_i) \cap T(x_j))}{min\{card(T(x_i)),card(T(x_j))\}}\right) \end{aligned}$$
(2)

This metric also takes values between 0 and 1. A value close to 0 means that tags in the tag cloud represent different objects in the database.

And finally, we calculate the disparity with the next expression:

$$\begin{aligned} dis(X)=avg_{i \ne j}\left( \frac{|card(T(x_i))-card(T(x_j))|}{max\{card(T(x_i)),card(T(x_j))\}}\right) \end{aligned}$$
(3)

As the two other metrics, the disparity takes values between 0 and 1. A value close to 1 indicates that the disparity between the weights of the tags is high, which is necessary in order to highlight the importance of the tags through the weight.

3.2 Balance Calculated Through OWA Operators

Ordered Weighted Averaging (OWA) Operators. An OWA operator of dimension n is a mapping [11]:

$$\begin{aligned} f:R^n \rightarrow R \end{aligned}$$

that has an associated n vector W:

$$\begin{aligned} W=[w_1~~ w_2~~ w_3]^T \end{aligned}$$

such that

$$\begin{aligned}&1\, . \, w_i \in [0,1] \\&2\,. \, \sum _i w_i=1 \end{aligned}$$

Furthermore \(f(a_1,\ldots ,a_n)=\sum _j w_jb_j\) where \(b_j\) is the jth largest of the \(a_i\).

An important aspect to consider in this operation is the re-ordering step, since a weight is associated with a particular ordered position of aggregate, that is, the \(a_1,\ldots ,a_n\) vector has to be previously ordered in a descending way.

In [10] Yager pointed out three important special cases of OWA aggregations:

  1. 1.

    \(F^*\): In this case \(W=W^*=[1~~0~~\ldots ~~0]^T\). Then \(F^*(a_1,\ldots ,a_n)=Max_i(a_i)\),

  2. 2.

    \(F_*\): In this case \(W=W_*=[0~~0~~\ldots ~~1]^T\). Then \(F_*(a_1,\ldots ,a_n)=Min_i(a_i)\),

  3. 3.

    \(F_{Ave}\): In this case \(W=W_{Ave}=[1/n~~\ldots ~~1/n]^T\). Then \(F_{Ave}(a_1,\ldots ,a_n)=\frac{1}{n}\sum _i a_i\).

Balance Calculated Through OWA Operators. If we consider the two first special cases of OWA operators pointed out in [10], we can express the balance as:

$$\begin{aligned} bal(S)=\frac{F_{*}(C)}{F^{*}(C)} \end{aligned}$$
(4)

where \(F_{*}\) and \(F^{*}\) are OWA operators of size n with associating weighting vectors \(W_{*}=[0~ 0~\ldots 1]^T\) and \(W^{*}=[1 ~0~\ldots 0]^T\), respectively and \(C=\{card(T(x_1)), card(T(x_2)),\ldots ,card(T(x_n)\}\) is the set of cardinals from the associated sets \(T(x_i)\) to each tuple \(t_i\).

Before applying this formula we have to exclude the outliers from C, identifying these as the values that exceed 1.5 times the interquartile range.

The balance takes values in the interval [0, 1]. A value close to 0 indicates that the tag cloud is considered to be unbalanced.

3.3 Example of Calculation of Coverage, Overlap, Disparity and Balance Through OWA Operators

Table 1 presents several movie titles selected from FilmAffinityFootnote 1 in the category of romance.

Table 1. Titles of movies selected from FilmAffinity

After cleaning the text in Table 1 and following the methodology explained in [8], two tag clouds have been generated considering different supports in terms of absolute frequency. We can see them in Fig. 2.

For both tag clouds, metrics proposed in Subsects. 3.1 and 3.2 have been calculated. We can see the values obtained in Table 2.

The details of the calculation of balance through OWA operators is in Table 3, where B is the ordered vector of items in C.

Fig. 2.
figure 2

Tag clouds for text in Table 1

Table 2. Coverage, overlap and disparity for tag clouds in Fig. 2
Table 3. Details of the balance calculation

As we can see, Tag Cloud 1 performs better in coverage and disparity, but worst in overlap. The Tag Cloud 1 has smaller support than Tag Cloud 2, so a greater number of tags appears in its visualization, increasing the coverage. The bigger number of tags causes redundant information to be represented, so the overlap is increased as well. The disparity is also increased with the appearance of tags with smaller weights and the tag cloud is more unbalanced. Apparently, disparity is a softer measure than balance, due to the fact that it considers all the tags for its calculation and the balance only two.

The choice of one or another tag cloud will depend on the preferences in each case.

4 Conclusions

In order to evaluate the tag cloud as a content representation tool, three metrics have been proposed: coverage, overlap and disparity. The three take values in the interval [0, 1] and use all the tags in the visualization for its calculation. In addition, a new way to calculate balance through OWA operators has been proposed.

Coverage and overlap are usually in confrontation with each other. When the coverage increases, the overlap also increases as more tags appear in the cloud, representing a large amount of information which brings more redundancies. Disparity and balance can also be affected with the variations in coverage and overlap. Disparity seems to be a softer measure than balance. The proportion between these values will be established according to the requirements in each case.

As future work we plan to continue researching with tag clouds generated over textual databases, the inclusion of fuzzy logic in tags is considered, their grouping in clusters, the adoption of OWA operators for tag cloud comparison, as well as the study of the properties of said operators. We intend also to use the tag cloud for searching entities and the introduction of other languages to create a multilingual tool based on ontologies.