Keywords

1 Introduction

Digital libraries are collections of information that have associated services offered to the users community using a variety of technologies. The information collections can be scientific, business or personal data, and can be represented in different formats such as digital text, images, audio, video, or any other digital media. This information can be digitalized or digitally born information and the services offered with this content can be varied, and can be offered to individuals or user communities. Internet access has made digital libraries more and more used by diverse communities for various purposes, in which sharing and collaboration have become important social elements. As digital libraries have become common spaces, and their contents and services are more varied, people expect more sophisticated services from their digital libraries [6, 9, 23]. The digital libraries are composed by human resources (staff) that are responsible for managing and enabling access to the most interesting documents for users, taking into account both their areas of interest and their needs [18]. The library staff search, evaluate, select, catalogue, classify, preserve and schedule the access to the digital documents [14]. Digital libraries have been incorporated into many environments, but we will focus on the academic context. specifically, we talk about University Digital Libraries (UDL), which provide information resources and services to students, faculty and staff in an environment that supports learning, teaching and research [7, 25].

The exponential growth of Web sites and documents contributes to users not being able to find the information they are looking for in a simple and time-effective way. Users need tools to help them deal with the large amount of information available to them on the Web [12]. Therefore, search and mining techniques of the Web are becoming vital. Furthermore, the Web influences the development of others information media, for example, newspapers, journals, books, etc. and specifically the development of academic digital libraries [25]. As on the Web, the exponential growth of information is the mayor problem of these libraries because the employees have problems carrying out the tasks of delivering the information to the users. For this we can use Web context tools in UDL, to facilitate the tasks of employees and therefore improve access to information for students, teachers and researchers.

A traditional search function is an essential part of any digital library, but the frustration of users increase as their needs are more complex and the volume of information handled by the library grows. Digital libraries should move from being passive to being more proactive in offering and tailoring information for individuals and communities, and in supporting community efforts to capture, structure and share knowledge [6]. So, the digital libraries can anticipate the users’ needs and recommend resources that could be of their interest. Given these features, in a UDL a service that is particularly important is the selective dissemination of information or filtering. Users develope their interests profile, so when new materials are added to the collection of information, the UDL can notify the users with relevant items [14]. Due to the problem of information overload, although there is a great abundance of information available, sometimes it is difficult to obtain useful or relevant information when necessary. When the users of a UDL try to access to useful information, they often obtain irrelevant information or information which does not meet their needs. So, users need easier access to the thousands of resources that are available but yet hard to find [17, 24].

As on the Web, we can use recommender systems to facilitate the access to information. A recommender system attempts to discover information items that are likely of interest to a user. Recommender systems are especially useful when they identify information that a person was previously unaware of. Furthermore, recommender systems are personalized services because they may treat each user in a different way. These recommender systems play an important role in highly rated Web sites, such as Amazon,Footnote 1 YouTube,Footnote 2 Netflix,Footnote 3 TripadvisorFootnote 4 or IMDbFootnote 5 [8].

The provision of personalized recommendations, requires that the system knows something about every user, such as the ratings provided by the users about the explored items [4, 28]. This knowledge implies that the system must maintain users’ profile containing the users’ preferences or needs. But the way in which this information is acquired and exploited depends on the particular recommendation approach. The system could acquired implicit information about the users analyzing the users behavior, or the system might request the users insert explicitly their preferences. Another question to consider is what additional information is required by the system, and how this information is processed and managed to generate a list of personalized recommendations.

Following these ideas, in this paper we review and analyze different proposals, which favor the dissemination of information in UDL. Based on the success shown by the application of recommender system we focus on proposals based on these recommendation techniques [24]. Besides, these proposals also face the problem of the wide variety of representations and evaluations of information, which is more pronounced when users are part of the process, as is the case of UDL. Therefore, we also expose the fuzzy linguistic modelling that will help us to represent and efficiently manage the qualitative information present in the communication processes, as in previous proposals in which fuzzy approaches were applied [25, 26]. Specifically, we analyze the multi-granular approach that gives us greater flexibility in the system-user interaction [16, 19].

We analyzed four proposals, each of them improving the performance of the previous one. The first one proposes a fuzzy linguistic recommender system that recommends both specialized resources of the user interest area, and complementary resources that could be interesting to form multi-disciplinar groups [22]. The second one proposes a new method for acquiring the user profiles reducing the great effort of previous proposals; users provide their preferences on some research resources (by means of incomplete fuzzy linguistic preference relations) and from this information the system obtain their respective preference vectors on topics of interest [21]. The third one improves the previous proposals with a recommender system which uses a memory to avoid the information overload problem still persistent in UDL; the main idea is to use previous selected items to make a new selection in a new recommendation round [20]. Finally, the last proposal faces the recommendations generation process about research resources as a task with two distinct elements: On one hand, finding research resources that are relevant to the users, and on the other hand, finding valid research resources from the standpoint of the quality of items [28].

The paper is structured as follows. Section 2 revises the preliminaries needed to understand the analyzed proposals. In Sect. 3 we analyze several proposals to improve the dissemination of information in digital libraries. Finally, some conclusions and future research are pointed out.

2 Preliminaries

2.1 Basis of Recommender Systems

Recommender systems try to guide the users in a personalized way towards suitable tasks among a wide range of possible options [4, 28]. Personalized recommendations rely on some knowledge about the users, which might be tastes, preferences as well as the ratings of previously explored items. The way of acquiring this information may vary from implicit information, obtained analyzing users behavior, or explicit information, where users directly provide their preferences.

Other aspect to take care of is the way of generating recommendations. In the literature we can find them mainly pooled in two categories [4, 27]. In the first one authors consider two different approaches: On one side, the content-based approaches generate the recommendations taking into account the characteristics used to represent the items and the ratings that a user has given to them. On the other side, the collaborative approaches generate recommendations using explicit or implicit preferences from many users, ignoring the items representation. The second one extends the categorization with another three approaches: Demographic systems, Knowledge-based systems and Utility-based systems [4].

Since each approach has certain advantages and disadvantages, depending on the scope settings. In order to combine different approaches to reduce the disadvantages of each one and to exploit their benefits, a widespread solution is the combination of approaches, known as hybrid approach [4].

2.2 Fuzzy Linguistic Approach

The fuzzy linguistic approach is a tool based on the concept of linguistic variable proposed by Zadeh [29]. This theory has given very good results to model qualitative information and it has been proven to be useful in many problems. We briefly describe the approaches used in the reviewed proposals.

The 2-Tuple Fuzzy Linguistic Approach. In order to reduce the loss of information of other methods such as classical or ordinal, in [10] was proposed a continuous model of information representation based on 2-tuple fuzzy linguistic modelling. To define it both the 2-tuple representation model and the 2-tuple computational model to represent and aggregate the linguistic information have to be established.

Let \(S=\{s_0,...,s_g\}\) be a linguistic term set with odd cardinality. We assume that the semantics of labels is given by means of triangular membership functions and consider all terms distributed on a scale on which a total order is defined. In this fuzzy linguistic context, if a symbolic method aggregating linguistic information obtains a value \(\beta \in [0,g]\), and \(\beta \notin \{0,...,g\}\), we can represent \(\beta \) as a 2-tuple \((s_i,\alpha _i)\), where \(s_i\) represents the linguistic label, and \(\alpha _i\) is a numerical value expressing the value of the translation between numerical values and 2-tuple: \(\varDelta (\beta )=(s_i,\alpha )\) and \(\varDelta ^{-1}(s_i,\alpha )=\beta \in [0,g]\) [10].

In order to establish the computational model negation, comparison and aggregation operators are defined. Using functions \(\varDelta \) and \(\varDelta ^{-1}\), any of the existing aggregation operators can be easily be extended for dealing with linguistic 2-tuples without loss of information [10]. For instance arithmetic mean, weighted average operator or linguistic weighted average operator could be used.

Multi-granular Linguistic Information Approach. A problem modelling the information arises when different experts have different uncertainty degrees on the same phenomenon or when an expert has to evaluate different concepts. Then, several linguistic term sets with a different granularity of uncertainty are necessary. In such situations, we need tools to manage multi-granular linguistic information [11]. In [11] a multi-granular 2-tuple fuzzy linguistic modelling based on the concept of linguistic hierarchy is proposed. A Linguistic Hierarchy LH, is a set of levels l(tn(t)), where each level t is a linguistic term set with different granularity n(t). In [11] a family of transformation functions between labels from different levels was introduced. To establish the computational model we select a level that we use to make the information uniform and thereby we can use the defined operator in the 2-tuple model. This result guarantees that the transformations between levels of a linguistic hierarchy are carried out without any loss of information.

Incomplete Fuzzy Preference Relations. A fuzzy preference relation P on a set of alternatives \(X=\{x_1,..,x_n\}\) is a fuzzy set on the product set \(X \times X\), i.e., it is characterized by a membership function \(\mu _{P}:X \times X \longrightarrow [0,1].\) When cardinality of X is small, the preference relation may be conveniently represented by the \(n \times n\) matrix \(P=(p_{ij})\), being \(p_{ij}=\mu _{P}(x_{i},x_{j})\ (\forall i,j \in \{1,\ldots ,n\})\) interpreted as the preference degree of the alternative \(x_{i}\) over \(x_{j}\), where \(p_{ij}=1/2\) indicates indifference between \(x_i\) and \(x_j\), \(p_{ij}=1\) indicates that \(x_i\) is absolutely preferred to \(x_j\), and \(p_{ij}>1/2\) indicates that \(x_i\) is preferred to \(x_j\).

As the proposals analyzed integrate the multi-granular fuzzy linguistic modeling based on 2-tuples, a linguistic preference relation must be defined. Let \({X=\{x_1,..,x_n\}}\) a set of alternatives and S a linguistic term set. A linguistic preference relation \(P=p_{ij} (\forall i,j \in \{1,\ldots ,n\})\) on X is:

$$\begin{aligned} \mu _{P}: X \times X \longrightarrow S \times [0.5,0.5) \end{aligned}$$
(1)

where \(p_{ij}=\mu _{P}(x_i,x_j)\) is a 2-tuple which denotes the preference degree of alternative \(x_i\) regarding to \(x_j\).

However, in many problems the experts are often not able to provide all the preference values that are required. In order to model these situations, incomplete fuzzy preference relations are used [1, 2, 15]. A function \(f :X \longrightarrow Y\) is partial when not every element in the set X necessarily maps onto an element in the set Y. When every element from the set X maps onto one element of the set Y, then we have a total function. A two-tuple fuzzy linguistic preference relation P on a set of alternatives X with a partial membership function is an incomplete two-tuple fuzzy linguistic preference relation.

3 Proposals to Improve the Dissemination of Information in Digital Libraries

3.1 A Multi-disciplinar Recommender System to Advice Research Resources in University Digital Libraries

The first proposal was presented in [22]. This paper presents a fuzzy linguistic recommender system that recommends two types of resources: specialized resources of the user research area, and complementary resources in order to include resources from related areas that could lead to interesting collaboration possibilities with other researchers and form multi-disciplinar groups. The vector model [13] is used to represent both the resource scope and the topics of interest that characterize the users profiles. A classification composed by 25 disciplines is used, and in each position of the resource or user vector, a linguistic 2-tuple value represents the importance degree of the discipline regarding to the resource or the user topics of interest is stored. The recommendation approach is based in a matching process among the terms used in the users and resources representations [13]. The vector model is used to represent both the resource scope and the users topics of interest. Since the system works with linguistic values, a linguistic similarity measure \(\sigma _l(V_1,V_2)\) is defined, based on cosine measure but defined in a linguistic context. The recommendation strategy has two phases:

  • To generate recommendations for a resource i, \(\sigma _l(V_i,V_j)\) is computed among the resource scope vector (\(V_i\)) against all the stored resources in the system (\(V_j\), \(j=1 \dots m\) where m is the number of resources). If \(\sigma _l(V_i,V_j) \ge \alpha \) (linguistic threshold value to filter out the information), the resource j is chosen. Next, the system searches for the users which were satisfied with these chosen resources. To obtain the relevance of the resource i for a selected user x, the system aggregates (using the arithmetic mean) \(\sigma _l(V_i,V_j)\) with the assessments previously provided by x about the similar resources and with the assessments provided by others users. If the calculated relevance degree is greater than a linguistic threshold \(\mu \), then, the system sends the resource information and its calculated linguistic relevance degree to the selected users. If not, the system proceeds to estimate if the resource could be interesting as a complementary recommendation.

    To obtain the complementary recommendations, the system computes \(\sigma _l(V_i,V_x)\) among the resource i and the user x (for all users). Then, it applies a multi-disciplinar function to the value \(\sigma _l(V_i,V_x)\). This function must give greatest weights to similarity middle values (near 0.5), because values of total similarity contribute with efficient recommendations but are probably known for the users. Like null values of similarity show a null relationship between areas. In the proposed system a triangular function, g(x) is used. Next, if the obtained multi-disciplinar value is greatest than a previously defined linguistic threshold \(\gamma \), the system recommends the complementary resource.

  • The proccess of generating recommendations for a user x, is similar, but computing \(\sigma _l(V_x,V_y)\) between the topics of interest vectors of the new user (\(V_x\)) against all users in the system (\(V_y\), \(y=1..n\) where n is the number of users). If \(\sigma _l(V_x,V_y) \ge \delta \) (linguistic threshold value), the user y is chosen as near neighbor of x. Next, the system searches for the resources that satisfied these users. To obtain the relevance of a resource i for the user x, the system aggregates \(\sigma _l(V_x,V_y)\) with the assessments previously provided about i by the nearest neighbors of x. If the calculated relevance degree is greater than the linguistic threshold \(\mu \), then, the system recommends to the new user the resource information and its calculated linguistic relevance degree. If not, the system proceeds to estimate if the resource could be interesting as a complementary recommendation for the user. The system computes \(\sigma _l(V_x,V_i)\) among the user x and the resource i (for all resources). Then, it applies the multi-disciplinar function g(x) to the value \(\sigma _l(V_x,V_i)\). If the obtained multi-disciplinar value is greatest than the linguistic threshold \(\gamma \), the system recommends the resource as complementary.

3.2 Dealing with Incomplete Information in a Fuzzy Linguistic Recommender System to Disseminate Information in a University Digital Library

The second proposal is presented in [21]. The problem of the previous proposal is that users must directly specify their user profiles by providing their preferences on all topics of interest and it requires too much effort by the user. The system presented in [21] allows users to provide their preferences by means of incomplete fuzzy linguistic preference relations [1], and this facilitate the determination of user profiles. To reduce that effort and make the process of acquiring user preferences easier, an alternative method to obtain the user preferences on topics of interest is proposed. The system shows to the users only a selection of the most representative resources, and the users stablish their preferences about these resources by means of an incomplete fuzzy preference relation. Furthermore, according to results presented in [2], it is enough that the users provide only a row of the preference relation. Then the method proposed in [2] is used to complete the relations. Once the system completes the fuzzy linguistic preference relation provided by the user, it is possible to obtain a vector representing the user preferences on the topics of interest.

The recommendation strategy is based on a matching process developed between user profiles and resource representations, using a linguistic similarity measure based on cosine measure, \(\sigma _l(V_1,V_2)\). To generate the recommendations for a resource i, \(\sigma _l(VR_i,VU_j)\) is computed, between the representation vector of the resource (\(VR_i\)) and all the user preference vectors, \(\{VU_1,\ldots ,VU_m\},\) where m is the number of users in the system. If \(\sigma _l(VR_i,VU_j) \ge \psi \) (linguistic threshold previously defined), the user j is selected to receive recommendations about resource i. For users who want it, the system also recommends collaboration possibilities. The linguistic compatibility degree is obtained computing \(\sigma _l(VU_x,VU_y)\) between each two users x and y who want to collaborate.

3.3 An Improved Recommender System to Avoid the Persistent Information Overload in a University Digital Library

In third place, we analyze the proposal presented in [20]. Despite that the use of the two previous techniques to avoid the information overload problem was successful, the number of electronic resources daily generated keeps growing continuously and the problem rises again. Therefore, a persistent problem of information overload was found. The idea is to use a memory to remember selected items but not recommended previously, and in such a way, the system could incorporate them in future recommendations to complete the set of recommendations. For example, if there are a few items to be recommended or if the user wishes outputs obtained by combination of items selected in different recommendation rounds. Users are asked to express restrictions on the quantity of items to receive in each recommendation round and about the novelty of such items.

This system works in two phases:

  1. 1.

    To generate the recommendations using the recommendation approach of the previous proposal [21].

  2. 2.

    To apply a second filter or selection process according to the user’s restrictions. Taking into account the number of recommendations that the user would like to receive:

    1. (a)

      If there are not enough resources to satisfy the amount of recommended resources specified by the user, the system remembers the items previously selected but not recommended and now could be recommended. The system then repeats the recommendation process detailed in phase 1, but now incorporating these remembered resources.

    2. (b)

      If the amount of selected resources is enough, the system checks the restrictions talking about the novelty of the resources or if the user is also interested in previous resources but still with validity, which could be most interesting than a new resource. If the user wants both kinds of resources, the system repeats the recommendation process of the first phase, but now incorporating these remembered resources.

    Finally, the system shows to the users the resource information and its calculated linguistic relevance degree, and for the users who want to collaborate, the system sends the resource information, its calculated linguistic relevance degree and the collaboration possibilities characterized by its linguistic compatibility degrees.

3.4 A Quality Based Recommender System to Disseminate Information in a University Digital Library

Finally, we analyze the proposal presented in [28]. Analyzing the previous proposals, different aspects that may limit their performance were found. Really they worked as an information retrieval system based on matching functions which acted among the resources representation and user profiles, and this limited their performance. Furthermore, the number of electronic resources daily generated grows continuously, so the problem appears again and the system performance decreases.

In this proposal the system implements a hybrid recommendation strategy based on a switching hybrid approach [3], which switches between a content-based recommendation approach and a collaborative one to share the user individual experience and social wisdom. With this dual perspective, the cold-start problem is minimized because the system switch from one approach to another, depending on the circumstances.

Besides, now the recommendations generation process is a task with two distinct elements: On one hand, finding resources that are relevant to the users and on the other hand, finding valid resources from the standpoint of the quality of the items [5]. The system incorporates a new module which performs a re-ranking process which takes into account the estimated relevance of an item along with the item quality. But the problem is how to obtain the resource quality without much interaction from users. So, a new way to evaluate the quality of resources is proposed. Based on the idea of whether one resource is usually preferred than others, indicates that the resource has a certain quality. To do that, the system incorporates the method presented in [21] in which the users are asked to provide their preferences on five research resources, by means of an incomplete fuzzy preference relation. Then, the system completes this preference relation. This method is used to obtain the user profiles, but it is also used to estimate the quality of these resources. It is assumed that resources usually preferred over others have a higher quality. So, you can count the times that each resource has been selected to be shown as well as the times that each resource has been preferred over other. The displayed resources will vary over time, so the system must record each time a resource is selected and each time a resource is preferred to other. The quality of a resource is estimated as the probability that the resource is chosen against another.

Once a research resource is considered relevant for a user, and both the estimated relevance degree and the resource quality score, have been computed, the last step is to aggregate both in a single score. To do this, the system uses a multiplicative aggregation in which the estimated relevance is multiplied by the translated quality score (with the corresponding linguistic transformations). Then, the systems recommends the user these resources along with these final estimated scores to justify the recommendations.

3.5 Comparative Analysis

Now, we include some brief comments about the capacity of ratings predictions. In order to obtain data to compare the Mean Absolute Error (MAE) was computed for the different propsosals, i.e. the average absolute deviation between a predicted rating and the user’s true rating. The first two approaches get similar values of yield, but the advantage of the second is the lower participation of users, thus improving the satisfaction of users. The third approach presents a small performance improvement, with greater precision. But it is the fourth approach that performs best. Therefore, the predictions obtained by using the quality of resources are better than the predictions obtained only with the relevance or memory. Specifically an improvement of 4.80% is obtained. That is, the predictions generated with the new system are closer to the users’ preferences.

4 Conclusions and Future Work

In a UDL the selective dissemination of information about research resources is a particularly important service. The UDL staff and users need tools to help them in their processes of information discovering because of the large amount of information available on these systems. Recommender systems have been successfully applied in academic environments to assist users in their access to relevant information. For this reason, we found it really interesting and in this paper we have reviewed and analyzed several proposals based on recommender system that help students, teachers and researchers to find information. These proposals can improve the services provided by the UDL to their users. The four different proposals reviewed follow an evolution in the time. All of them are based on the application of recommender system and used the fuzzy linguistic modeling, besides each one improves the performance of its predecessors.

Analyzing these proposals, we could conclude and point out that although some progress has been made, it is fundamental to continue working to solve the information overload problem, even more pressing with the continuous advances in technology and especially social networks. In this sense, and focusing on future research, we believe that a promising direction is to study automatic techniques to establish the representation of resources. Moreover, given the current situation of intensive use of social networks, other idea is to explore new improvements in the recommendation approach, exploring new methodologies for the generation of recommendations, for example, extracting knowledge from the information we share in social networks.