Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Case-based reasoning (CBR) is a subfield of Artificial Intelligence rooted in the works of Roger Schank in the early 80s, on dynamic memory and the central role that the recall of earlier episodes (cases) and scripts (situation patterns) has in problem solving and learning [1]. In spite of its maturity as a field of research, in 2003 when we started to develop colibri [2], there was no open source tool that would serve as a reference implementation for the common techniques and algorithms developed and refined in the CBR community over the years. Our goal was to fill this gap by developing an open source framework that could fulfil several goals: to provide a readily usable implementation of common techniques in CBR useful for academic environments; to foster collaboration among research groups by providing an extensible framework which their prototypes can use and extend; and to promote the repeatability of results achieved by other researchers.

Nowadays, we can conclude that colibri is fulfilling the goal of becoming a reference tool for academia, having, as of this writing, hit the 10,000 download mark with users in 100 different countries. Although the main target of the platform is the research community, a relevant feature of colibri is its support for large-scale commercial applications. It means that our software can be used not only to do rapid prototyping of CBR systems but also to develop applications that will be deployed in real scenarios.

To illustrate the usefulness of the tool some of its applications are outlined here, both in the industry and the academia, that take advantage of the platform.Footnote 1 Aquastress is a tool to mitigate water stress problems in Europe developed by a consortium of companies and universities under a European Project [3]. Authors point out the out-of-the-box support of colibri for building CBR systems and its compatibility with web applications. Another application developed under a European project that integrates colibri into a web environment is Ami-Moses. This is a web system for energy efficiency optimisation in manufacturing companies [4]. Another relevant system using colibri for developing standard CBR systems is DEEP, from the US Air Force Research Laboratory, which utilizes past experiences to suggest courses of action for new situations [5].

An important feature of the colibri platform is its support for different families of specialized CBR systems, such as: Textual, where the cases are given as text in natural language; Knowledge-intensive, where a rich domain model is available and thus the system requires a smaller case base; Data-intensive, where cases are the main source of information with no domain knowledge available; or Distributed, where different agents collaborate to reach conclusions based on their particular case bases. This way, there are several applications using the extensions provided by colibri to build specialized systems. There is an application for assisting criminal justice [6] that uses the textual capabilities of the platform. Another recommender system developed by an European Project supports users during negotiation processes with recommendations on how to conduct, at the best, a given negotiation [7]. Many of the applications use the knowledge intensive capabilities and point out the facilities to incorporate ontologies and semantic resources in their CBR applications. Some examples are a digital library management system [8], a Web-based catalogue for the electronic presentation of Bulgaria’s cultural-historical heritage [9], or a medical application for classifying breast cancer [10].

This chapter describes how colibri supports the development of such applications and provides several examples of working CBR systems that use the features provided by the platform. This overview should serve to motivate and guide those readers that plan to develop CBR systems and are looking for a tool that eases this task.

colibri has been designed following a layered architecture where users can find two different but complementary tools: the jcolibri framework that provides the components required to build programmatically CBR systems; and the colibri studio Development Environment that aids users in the generation of those systems through several graphical tools. The first tool is targeted to developers that prefer to build applications by programming directly, whereas the second one has been created for designer users that prefer high-level composition tools.

colibri offers a collaborative environment where users could share their efforts in implementing CBR applications. It is an open platform where users can contribute with different designs or components that will be reused by other users. colibri proposes a software development process based on the idea of reusing previous designs to aid users in the generation of software systems. In general terms, this process -named the colibri development process- proposes (and promotes) the collaboration among independent entities (research groups, educational institutions, companies) involved in the CBR field. Benefits are manifold: code reuse, automation of system generation, systematic exploration of the CBR design space, validation support, optimization and correct reproducibility are the most significant. Additionally, such capabilities have enormous advantages for educational purposes as students can easily understand and implement complex CBR systems.

Next section introduces the colibri platform and its layered architecture. Then, its development process is presented in Sect. 3 together with the high-level composition tools. Following sections will present the functionality provided by the platform to implement different CBR systems. The basic components are introduced in Sect. 4 and subsequent Sects. 59 describe how to build specialized CBR systems. Finally, Sect. 10 contains the related work and Sect. 11 concludes this chapter.

2 The COLIBRI Platform

COLIBRI is an advanced platform that involves several elements. The first building block are the components required to create tangible applications. These components are provided by the jcolibri framework. jcolibri is a mature software framework for developing CBR systems in Java that has evolved over time, built over several years of experienceFootnote 2 [11].

Once jcolibri was sufficiently mature and has became a reference tool in the CBR community, we have developed the second building block of our platform: an Integrated Development Environment (IDE) that includes graphical tools to aid users in the generation of CBR systems. This environment is called colibri studio Footnote 3 and follows a development process based on the idea of reusing previous designs to aid users in the generation of software systems. This development process defines several activities to interchange, publish, retrieve, instantiate and deploy workflows that conceptualize CBR systems. These workflows are called templates and comprise CBR system designs which specify behaviour but do not explicitly define functional details.

This way, the colibri platform follows a two-layers architecture where the building blocks of a CBR system are provided by jcolibri and colibri studio enables its composition through the colibri development process. A schema of this organization is shown in Fig. 1. The bottom layer contains the basic components of the jcolibri framework: cases, similarity functions, methods and the interfaces required to implement the core components of a CBR system. The top layer presents an overview of the tools included in COLIBRI Studio.

Fig. 1
figure 1

Two layers architecture of colibri

Fig. 2
figure 2

Screenshot of the template generation tool

Fig. 3
figure 3

Screenshot of the template retrieval tool

Fig. 4
figure 4

Screenshot of the template adaptation tool

Following section motivates and presents colibri studio and its associated development process to move later into the description of the capabilities provided by the jcolibri framework.

3 The COLIBRI Development Process

The colibri development process identifies several roles that face the development of CBR systems from different points of view: senior researchers design the behaviour of the application and define the algorithms that will be implemented to create software components that are composed in order to assemble the final system. On the other hand, developers will implement these systems/components. Furthermore, during the last few years there has been an increasing interest in using colibri as a teaching tool, and consequently, our platform also eases this task. These activities conform the colibri development process and are supported by the tools in colibri studio. Next, we describe these activities, tools and user roles:  

Template generation. A template is a workflow-based representation of a CBR system where several tasks are linked together to define the desired behaviour. They should be generated by ‘expert’ users although other users may create them. This is the first activity in our software development process. Figure 2 shows a simple template in our specialized tool to design templates.

Template publishing. Templates can be shared with the community. Therefore, there is a second tool that lets users publish a template in the colibri central repository.

Template retrieval and adaptation. Although the publication of templates is a key element in the platform, the main use case consists of retrieving and adapting a template to generate a new CBR system. Here the actors are not only CBR experts: developers, teachers, students, or inexperienced researchers may perform these activities. Due to their importance, these activities are referred to as the Template Base Design (TBD). The TBD begins with the retrieval of a template to be adapted from the central repository. This retrieval is performed by a recommender system which proposes the most suitable template depending on the features of the target CBR system. Figure 3 shows a screenshot of this recommender system. It follows a “navigation by proposing” approach where templates are suggested to the user. Next, the adaptation of the retrieved template consists of assigning components that solve each task. This activity is supported by the tool shown in Fig. 4 where users can click on a task, obtain a list of compatible components and select the most suitable one. Then the inputs and outputs of the components can be configured graphically. The components assigned to tasks are the ones provided by the jcolibri framework. For a detailed description of the TBD we point readers to [12].

Component development. The design of components is closely related to the advance in CBR research as they implement the different algorithms being discovered by the community. Therefore, this is the second main task of expert researchers. However, it is not expected that expert researchers will implement the components. This task will be delegated to users in a ‘developer’ role. We also contemplate the role of ‘junior researcher’ that could design and even implement his own experimental components. Again, these components could be shared with the community by means of the publication tool that uploads it to the colibri repository.

System Evaluation. As we have mentioned, one of the most relevant benefits of colibri is that it provides an easy to use experimental environment to test new templates and components. Consequently another activity in the development process is the evaluation of the generated systems. It enables the comparison through cross-validation of the performance of different system configurations.

The colibri development process is supported by the tools included in colibri Studio. colibri studio is integrated into the popular EclipseFootnote 4 IDE. This way, it takes advantage of the facilities provided to manage projects and Java source code. It enables the generation of “CBR projects” where the libraries required are automatically configured. It also allows users to compile and run the source code generated by the tools in colibri studio. To begin using colibri studio we provide several wizards and tools that guide the user in the development activity to be performed. For example, the standard wizard lets users configure the following elements of a CBR system: persistence, case structure, similarity measures and in-memory organization of the case base. These tools, partially shown in Fig. 5, are also available in an Eclipse perspective that displays all of them together. In this case Fig. 5 contains the screenshots of the tools used to define the case structure and configure the persistence of the case base.

Fig. 5
figure 5

Screenshots of the wizard tools to configure the case structure (left) and persistence (right)

The first advantage of our proposal is the reduction of the development cost through the reuse of existing templates and components. This is one of the aspirations of the software industry: that software development advances, at least in part, through a process of reusing components. In this scenario, the problem consists of composing several software components to obtain a system with a certain behaviour. To perform this composition, it is possible to take advantage of previously developed systems. This process has obvious parallels with the CBR cycle consisting on retrieval, reuse, revise and retain. The expected benefits are improvements in programmer productivity and in software quality.

The colibri development process [13] has an additional advantage: the collaboration among users promotes the repeatability of the results achieved by other researchers. Nowadays, the reliability of the experimental results must be backed up by the reproducibility of the experiments. This feature ensures the advance in a research area because further experiments can be easily run by extending the existing ones. It is a development process that promotes the reproducibility of experiments for the CBR realm.

An extended description of the colibri development process and colibri studio can be found in [14]. We point readers to that paper for details. Now that we have outlined how to build CBR systems with the colibri platform we detail what kind of systems can be implemented. The implementation is based on the components available in the jcolibri framework, therefore its capabilities are explained next.

4 A Functional Description of the jcolibri Framework

Addressing the task of developing a CBR system raises many design questions: How are cases represented? Where is the case base stored and how are the cases loaded? How should algorithms access the information inside cases? How is background knowledge included? and so on. The best way to solve these issues is to turn to the expertise obtained from previous developments. Thus, colibri defines how to design CBR systems and their composing elements. This definition of the structure of a CBR system is a key element in the platform as it enables the compatibility and reuse of components and templates created by independent sources. The main elements of this structural design are:

  1. 1.

    Organization into persistence, core and presentation layers. It follows the structure used by J2EEFootnote 5 where persistence, business logic and presentation are independent layers.

  2. 2.

    Definition of a clear and common structure for the basic objects found in a CBR application: cases, queries, connectors, similarity metrics, case-base organization, etc. For example, in colibri a query is the description of a problem, and a case is an extension of a query that adds the solution to that problem, a justification of that solution and, probably, a record of the result of applying the case in the real world. This organization was decided after a revision of several works found in the CBR literature [15, 16].

  3. 3.

    Run-time structuring of the CBR systems: jcolibri organizes the behaviour of the CBR systems into: precycle, where required resources (mostly cases) are loaded; cycle, which performs the 4 R’s tasks that structure the reasoning (retrieve, reuse, revise and retain [17]); and postcycle, which releases resources.

Fig. 6
figure 6

Persistance organization in jcolibri. It includes connectors for databases, textual files, ontologies and Weka’s ARFF format

This structural design of the CBR systems requires a reference implementation that enables users to create tangible applications. This implementation is provided by the jcolibri framework. It includes the components required to build CBR systems both programmatically or through the colibri studio composition tools:

  • Connectors. Cases can be stored using different media. Common media are databases or plain text. However, there may be any other potential source of cases that can be exploited by the framework, such as ontologies, which are accessed through Description Logic reasoners like RACER [18]. Therefore, jcolibri defines a family of components called connectors to load the cases from different media into the in-memory organization: the case base. This division into two layers enables an efficient management of cases, an issue that becomes more relevant as the size of the case base grows. This design based on connectors and in-memory storage of the case base is shown in Fig. 6.

  • Cases. jcolibri represents the cases using Java Beans. Using Java Beans in jcolibri, developers can design their cases as normal Java classes, choosing the most natural design. This representation simplifies programming and debugging CBR applications, and configuration files became simpler because most of the metadata of the cases can be extracted using the introspection capabilities of the Java platform. Java Beans also offer automatically generated user interfaces that allow the modification of their attributes, and direct persistence into databases and XML files. It is important to note that many Java web applications use Java Beans as a base technology, so the development of web interfaces is very straightforward. Moreover, HibernateFootnote 6 —the library used to develop the database connector in jcolibri—uses Java Beans to store information in a database. Java Beans and Hibernate are core technologies in the J2EE platform. By using these technologies in jcolibri, we guarantee the possibility of integrating CBR applications developed using this framework into large scale commercial systems. [19] provides an extended description of case representation in the framework.

  • Case base. There are several components to organize the case base once cases have been loaded from the persistance media. These organizations use different data structures such as: linear lists, trees, hash maps, etc.

  • Retrieval methods. The most important retrieval method is Nearest Neighbour scoring. It uses global similarity functions to compare compound attributes and local similarity functions in order to compare simple attributes. Although this method is very popular, there are other methods that are also included in the framework. For example, we implement Expert Clerk Median scoring from [20] and a filtering method that selects cases according to boolean conditions on the attributes. Both methods belong to the recommenders field and will be explained in Sect. 6. In the textual CBR field, we also find specialized methods using several Information Retrieval or Information Extraction libraries (Apache Lucene, GATE, ...) that will be detailed in Sect. 5. Regarding knowledge intensive CBR, we enable retrieval from ontologies (Sect. 7). Data intensive retrieval is addressed by means of clustering algorithms and a connector for the Weka ARFF format (Sect. 8). And, finally, jcolibri provides the infrastructure required to retrieve cases in a distributed architecture of agents (Sect. 9).

    Once cases are retrieved, the best ones are selected. The framework includes methods to select the best scored cases but also provides diversity metrics. Most of these methods belong to the recommender domain, therefore they are detailed in Sect. 6.

  • Reuse and revision methods. These two stages are coupled to the specific domain of the application, so jcolibri only includes simple methods to copy the solution from the case to the query, to copy only some attributes, or to compute direct proportions between the description and solution attributes. There are also specialized methods to adapt cases using ontologies that will be explained in Sect. 7.

  • Retain. These methods are in charge of adding new cases to the case base. There are strategies to reflect the changes to the persistence media or just modify the in-memory storage.

  • Evaluation tools measure the performance of a CBR application. jcolibri includes the following cross-validation strategies: Hold Out, Leave One Out and N-Fold. A detailed explanation of these evaluation tools is presented by [19].

  • Maintenance. These methods allow developers to reduce the size of the case base.Footnote 7 Components provided are: BBNR (Blame-based noise reduction) and CRR (Conservative Redundancy Removal) [21], RENN (Repeated Edited Nearest Neighbour) [22], RC (Relative Cover) [23], or ICF (Iterative Case Filtering) [24].

  • Visualization. These methods represent graphically the similarity between cases.Footnote 8 This tool serves to debug the fitness of the similarity measure and is shown in Fig. 7 It assigns a color to each type of solution and lays out the cases according to their similarity.

Summarizing, jcolibri offers 5 different retrieval strategies with 7 selection methods and provides more than 30 similarity metrics. It provides around 20 adaptation and maintenance methods plus several extra tools like system evaluation or the visualization of the case base.

Fig. 7
figure 7

Screenshot of the evaluation tool [19]

Now that we have described the structure and functionality provided by the framework, the following sections detail how to build specialized CBR systems. The components required to build such systems are delivered as packages named extensions, that offer additional services and behaviour beyond basic CBR processes. The most relevant extensions are included in the main distribution but others can be obtained separately at the web site.Footnote 9 This web site also provides contributions, i.e. extensions developed by third-party research teams. Extensions include several examples to let developers understand the services provided. Then, the API documentation details how to use and integrate each component into an existing development. Following sections describe the most relevant extensions provided by jcolibri to offer a complete overview of the framework’s capabilities.

5 Textual CBR Applications

Textual CBR (TCBR) is a subfield of CBR concerned with research and implementation on case-based reasoners where some or all of the knowledge sources are available in textual format. It aims at using these textual knowledge sources in an automated or semi-automated way to support problem-solving through case comparison [25].

Although it is difficult to establish a common functionality for TCBR systems, several researchers have attempted to define the different requirements for TCBR [25, 26]: how to assess similarity between textually represented cases, how to map from texts to structured case representations, how to adapt textual cases, and how to automatically generate representations for TCBR.

The textual CBR extension of jcolibri provides methods to address some of these requirements using different strategies. These sets of methods are organized according to:

  1. 1.

    A library of CBR methods solving the tasks defined by the Lenz layered model [27].Footnote 10 The goal of these layers is to extract the information from the text into a structured representation that can be managed by a typical CBR application. This way, the CBR application can perform a retrieval algorithm based on the similarity of the extracted features and use them to adapt the text to the query.

    These layers are typical processes in Information Extraction (IE) systems: stemming, stop words removal, Part-of-Speech tagging, text extraction, etc. jcolibri includes several implementations of these theoretical layers. A first group of methods uses the Maximum Entropy algorithms provided in the OpenNLP package.Footnote 11 The second implementation uses the popular GATE library for text processing.Footnote 12 Reference [28] exemplifies these methods to annotate web pages and perform semantic retrieval. This way, we provide a structured representation of cases that can be managed by standard similarity matching techniques from CBR. The main disadvantage of these methods is that they can only be used where texts are mapped to cases with a fixed structure (cases always have the same attributes). An example is the restaurant adviser we use in [29]. Another drawback is that it requires the definition of IE rules for each concrete domain, increasing the development cost. We point readers to that paper [29] for further details about the semantic TCBR methods in jcolibri.

  2. 2.

    A natural language interaction module that facilitates querying CBR systems by using written natural language. IE techniques and Description Logics (DLs) reasoning [30] play a fundamental role in this interaction module, where it analyses a textual query from the user to generate a structured representation using the relevant information from the text. The understanding process, from the textual query to the structured query, follows the steps of IE, Synonyms, Reasoning with Ontologies and User Confirmation, where the user validates or corrects the extracted information. The details and an evaluation of this module are presented by [31]. Figure 8 shows a screenshot of the application described in that paper.

  3. 3.

    There is another group of textual CBR methods that can be used when cases are texts without a fixed structure [32]. These methods are based on Information Retrieval (IR) conjointly to clustering techniques where reuse and retrieval are performed in an interleaved way. This third approach is clearly different from the two previous approaches, which are based on IE techniques to “capture the meaning”, i.e, to engineer structured case representations. This set of textual methods belongs to the statistical approach that has given such good results in the IR field. jcolibri methods are based on the Apache Lucene search engine [33]. Lucene uses a combination of the Vector Space Model (VSM) of IR and the Boolean model to determine how relevant a given document is to a user’s query. The main advantages of this search method are its positive results and its applicability to non-structured texts. The big drawback is the lack of knowledge regarding the semantics of the texts.

Fig. 8
figure 8

Natural language interaction module

5.1 A TCBR Application Built with colibri

Based on the new methods included in jcolibri, in [32] we present a complete Textual CBR application to deal with the automatic generation of failure reports. This system retrieves textual reports and guides the user to adapt them to the current situation. The absence of experts leads us to use IR+Clusters instead of Information Extraction. Figure 9 contains an screenshot of this application—named Challenger—that illustrates the steps followed by the user to adapt a text.

Fig. 9
figure 9

Screenshot of the Challenger application

Although statistical IR methods give good retrieval results they do not provide any kind of explanation about the documents returned. One way for solving this problem is to cluster the retrieval results into groups of documents with common information. Usually, clustering algorithms like hierarchical clustering or K-means [34] group the documents but they don’t provide a comprehensive description of the resulting clusters. Lingo [35] is a clustering algorithm implemented in the Carrot2 frameworkFootnote 13 that allows the grouping of search results but also gives the user a brief textual description of each cluster. Lingo is based on the Vector Space Model, Latent Semantic Indexing and Singular Value Decomposition to ensure that there are human readable descriptions of the clusters and then to assign documents to each one. jcolibri provides wrapper methods to hide the complexity of the algorithm and allows a simple way for managing Carrot2.

This labeled-clustering algorithm can be applied to TCBR in the retrieval step to make it easier to choose the most similar document. However, in [32] we present an alternative approach that uses the labels of the clusters to guide the adaptation of the texts. The adaptation of texts is a knowledge intensive task, especially for non-structured and technical texts. So, we propose a software solution whereby a user is provided with interactive tools to assist with adapting a retrieved report. This way, the user is in charge of the adaptation although the system finds the knowledge required to perform this task by looking for similar pieces of texts.

This method is described as a transformational reuse method where one copy of the case retrieved is used as the source case. Then the user makes changes using a text editor: deleting text components (words, phrases, paragraphs, or sections); writing new text components; or substituting text components using other similar text components. Our method assists the user in the substitution step, as it uses the clusters to show which are the text components that are related to the piece of text that the user is currently adapting. To clarify, we are proposing a simple way of adaptation where the user makes the changes from the choices we provide. These choices are based on the IR and clustering techniques previously explained so that, in a way, the adaptation process is an extension of retrieval, where the system retrieves good candidates to do substitutions.

This approach would be quite similar to the one presented in [36], although our supervised approach allows us to deal with texts with different structure.

For interested readers, [19] presents an extended tutorial on textual CBR with jCOLIBRI.

6 Recommender Systems

CBR has played a key role in the development of several classes of recommender systems [37, 38]. The jcolibri extension for building recommender systems is based in part on the conceptual framework described in the paper by [37]Footnote 14 that reviews different recommendation approaches. The framework distinguishes between collaborative and case-based, reactive and proactive, single-shot and conversational, and asking and proposing. Within this framework, the authors review a selection of papers from the case-based recommender systems literature, covering the development of these systems over the last ten years (we could cite [20, 3942] as illustrative examples). Based on this revision, jcolibri includes the following set of methods:

  • Methods for retrieval. Different approaches to obtain and rank items to be presented to the user: typical similarity-based retrieval [39], filter-based retrieval based on boolean conditions, ExpertClerk’s median method [20] and collaborative retrieval method based on users’ ratings to predict candidate items [43, 44].

  • Methods for case selection. Approaches to select one or more items from the set of items returned by the retrieval method: (1) Select all or k cases from the retrieval set. (2) The Compromise-driven selection [45] method chooses cases according to a number of attributes compatible with the user’s preferences. (3) Finally, the Greedy Selection [46] method considers both similarity and diversity (inverse of similarity).

  • Methods for navigation by asking. Navigation by asking is a conversational strategy where the user is repeatedly asked about attributes until the query is good enough to retrieve relevant items. There are different methods based on heuristics to select the next best attribute to ask about. For example, Information Gain, which returns the attribute with the greatest information gain in a set of items [40, 47], and Similarity Influence Measure [40], which selects the attribute that has the highest influence on Nearest Neighbor similarity.

  • Methods for navigation by proposing. The navigation by proposing strategy asks the user to select and critique one of the items recommended. The selected item is modified according to the critique and produces a new query. There are different strategies to modify the query as enumerated by [48]. The More Like This (MLT) strategy replaces the current query with the description of the selected case. Partial More Like This (pMLT) strategy partially replaces the current query with the description of the selected case but it only transfers a feature value from the selected case if none of the rejected cases have the same feature value. Another option is to use MLT but weighting the attributes (Weighted More Like This, wMLT). Less Like This (LLT) is a simple one: if all the rejected cases have the same feature-value combination, which is different from the preferred case, then this combination can be added as a negative condition. Finally, More + Less Like This (M+LLT) combines both More Like This and Less Like This.

There are other methods to display item lists, make critiques, display attribute questions, manage user profiles, etc. Details are provided by [49] and the jcolibri code examples.

6.1 A Recommender Application Built with colibri

To illustrate the capabilities of the recommender extension, we will present the HappyMovie application. HappyMovie is a recommender system that provides a recommendation for a group of people that wish to go to the cinema together. Figure 10 includes a screenshot of the system. It is a Facebook application that exploits social knowledge to reproduce more concisely the real argumentations made by real groups of users. Its generic architecture –named ARISE (Architecture for Recommendations Including Social Elements )—is depicted in Fig. 11.

Fig. 10
figure 10

Screenshot of the HappyMovie system

Fig. 11
figure 11

Generic architecture of the HappyMovie system

First, it uses a personality test in order to obtain the different roles that people play when interacting in a decision making process. Additionally, the tie-strenght (or trust) between users is also included in the recommendation method by extracting specific information from their profiles in the social network.

The satisfaction database represents the “memory” of the system for future recommendations. This module stores all the recommendations that have been made for every user and every group. Having recommendations with memory allows our system to avoid previous recommendations so that it does not repeat itself and allows also to ensure a certain degree of fairness. If one member accepts a proposal that she is not interested in, next time she will have some kind of preference, so that in the long run all the members of the group are equally satisfied. This module is actually a CBR system where cases are the previous recommendations.

The recommendation strategies in HappyMovie predict the rating that each user would assign to every item in the catalogue and then these estimated ratings are combined to obtain a global prediction for the group. Then, the movie with the highest prediction is proposed. Therefore, a basic building block of this application is the module in charge of computing individual predictions. These individual predictions are obtained by applying the retrieval methods in jcolibri that take into account the personal preferences of the user and the available movies.

Additional details of this system can be found in [50].

7 Knowledge Intensive CBR

Our research groupFootnote 15 has been working for more than a decade on knowledge intensive CBR (KI-CBR) using ontologies [5154]. Commonly, KI-CBR is appropriate when developers do not have enough experiences available but there is a considerable amount of knowledge of the domain. [19] provides further details about the implementation of KI-CBR applications in jcolibri.

We state that the formalization of ontologies is useful for the CBR community regarding different purposes, and therefore, jcolibri supports the following services:

  1. 1.

    Persistence: jcolibri provides a connector that loads cases represented as concepts or individuals in an ontology (see Fig. 6). This connector delegates to our library for managing ontologies named OntoBridge.

  2. 2.

    Definition of the case structure through elements in ontologies. This extension provides a specialized data type used to represent attributes of the case structure that point to elements in an ontology. For example, an attribute city used in the representation of a case can be linked to the concept City in an ontology. This way, cases following that structure will store the values Madrid, London, N.Y., Tokyo in the attribute city because they are the individuals belonging to the concept City in the ontology.

    This approach that links case attributes to elements from an ontology can be used either if the cases are embedded as individuals in the ontology itself, or if the cases are stored in a different persistence medium, such as a database, but some attributes contain values from the ontology.

  3. 3.

    Retrieval and similarity. There are different strategies to compute the local similarity based on ontologies [5557]. Following the previous example, a more elaborate ontology will classify cities according to continents. Therefore the concept city will be specialized by the subconcepts EuropeanCity, AmericanCity, AsianCity, ... the individuals being organized consequently. This way we can use this hierarchy to compute similarity according to the distance between individuals. In this case, the similarity between Madrid and London will be higher than Madrid and Tokyo because Madrid and London belong to the same subconcept. This approach to compute similarity based on the distances—named concept-based similarity—can be performed in different ways [55], and jcolibri provides the implementation of these similarity metrics to developers.

  4. 4.

    Adaptation. The usage of ontologies is especially interesting for case adaptation [51], as they facilitate the definition of domain-independent, but knowledge-rich adaptation methods. For example, imagine that we need to modify the city attribute of a retrieved case because the current value Madrid is not compatible with the restrictions of the query. According to the ontology, the best substitute is London because it is also a EuropeanCity. Here again we use the structure of the ontology to adapt the case. Because this schema is based on distances within the ontology, jcolibri offers several domain-independent methods to perform this kind of adaptation. These methods only need to be configured with the ontology (which contains the domain knowledge). Following subsection presents two adaptation methods included in jcolibri. See [51, 58, 59] for a detailed description of different adaptation proposals.

  5. 5.

    Learning [60]. As ontologies are used as a persistence media, ontologies can be reused to store the experiences learnt. This is performed by means of the connector able to manage ontologies.

7.1 A KI-CBR Applicacion Built with colibri

A KI-CBR system using the adaptation methods included in jcolibri is described in [59]. This paper presents a folk tale generation system that analyzes the role of reuse in CBR systems in originality driven tasks, where a new solution has not only to be corrected but noticeably different from the ones known in the case base. Each case is a story plot that, is formalized by its actions, and each action by its properties, like the participant characters and their roles (Donor, Hero, FalseHero, Prisoner, Villain), the place where the action takes place (City, Country, Dwelling), the involved objects, attributive elements or accessories (a ring, a horse). Cases must keep a semantic coherence that ensures the dependencies between actions, characters and objects. For example, Release-from-Captivity and Kidnapping, or Resurrection and Dead functions. Therefore, ontologies are the best approach to represent this kind of cases. Each case is composed of a great number of interrelated individuals, i.e instances of concepts, from the ontology. A visual representation of the tales ontology is presented in Fig. 12.

To generate new tales, the system applies two broadly different Reuse techniques, one based on transforming an existing solution into a new solution and another based on generating or constructing a new solution.

Transformational Reuse—or Transformational Adaptation (TA)—is the most widely used approach to case reuse. Typically, a new case is solved by retrieving the most similar case and copying the solution (although some techniques may use solutions from multiple cases); then a transformational process using domain knowledge and/or case-derived knowledge modifies that copy (which we consider a form of search) until a final solution adequate to the current problem is found. Basically, a node in the “working case” is substituted by finding another related node in a taxonomic hierarchy—e.g. a sword is a type of weapon in the folk tale generation domain, and may be substituted by another weapon like a crossbow. Moreover, Transformational Reuse is able to modify more than a single node: deep substitution allows to modify a whole subgraph in the solution—e.g. when substituting a character like the evil wolf for an evil wizard, then the constituent aspects of the characters (role, sex, dwelling, physical appearance) are also substituted.

Generative or Constructive Reuse builds a new solution for the new case while using the case base as a resource for guiding the constructive process. Constructive Adaptation (CA)[61] is based on a heuristic search-based process where the heuristic function guiding search is derived from a similarity measure between the query and the case base. This method takes a problem case and translates it into an initial state in the state space; i.e. transform a case representation into a state representation. Then, a heuristic search process expands a search tree where each node represents a partial solution, until a final state (with a complete and valid solution) is found. Notice that final but non-valid states can be reached, but this simply means the search process will backtrack to expand other pending states. This process is guided by a heuristic based on comparing the similarity from states (represented in the state space) to cases (represented in the space for cases). The nodes with higher similarity are expanded first during the search process. The result is that CA adds one node to a partial solution as it moves from one state to the next; that is to say, it builds a solution by piecemeal copies of nodes from similar cases. Notice that there is neither retrieval nor “single case adaptation” here since the component nodes are incrementally copied from multiple cases in the case base, depending only on the similarity measure that works on the whole case base.

Fig. 12
figure 12

Semantic dependencies in the folk tales ontology

Now that we have explained the main methods in jcolibri for KI-CBR we can move on to an opposite family of CBR systems where many cases are used instead of using few but rich ones.

8 Data Intensive CBR

One of the problems to solve when dealing with real world problems is the efficient retrieval of cases when the case base is huge and/or it contains uncertainty and partial knowledge. There are many examples of domains and applications where a huge amount of data arises; for example, image processing, personal records, recommender systems, textual sources, and many others. Many authors have focused on proposing case memory organizations to improve retrieval performance. For example, there are different proposals to manage huge case memories organized in clusters such as the ones by [62, 63]. However none of the existing tools has incorporated capabilities to efficiently manage large case bases. jcolibri provides an extension called Thunder to address this issue. ThunderFootnote 16 allows CBR experts to manage case memories organized in clusters and incorporates a case memory organization model based on Self-Organizing Maps (SOM) [64] as the clustering technique. This extension includes a graphical interface to test the components provided as shown in Fig. 13.

Fig. 13
figure 13

jcolibri test GUI for the data intensive extension

Clustering is implemented by grouping cases according to their similarities and representing each one of these groups by prototypes. Thus, the retrieve phase carries out a selective retrieval focused on using only the subset of cases potentially similar to the new case to solve. The new case retrieval procedure consists of (1) selecting the most suitable group of cases by comparing the input case with the prototypes and, (2) comparing the new input case with the cases from the selected clusters. The benefits of such an approach are both the reduction of computational time and improved robustness with uncertain data. Nevertheless, some open issues remain such as to what extent the accuracy rate is degraded due to the cluster-based retrieval, and furthermore how many clusters and cases should be used according to given requirements of computational time and accuracy degradation.

To support a uniform way for loading these large case bases, the Thunder extension provides a connector compatible with the ARFF format. This format is a standard defined by the popular Weka Footnote 17 toolkit for data mining and machine learning [65].

8.1 A DI-CBR Application Build with colibri

To illustrate the DI-CBR methods in colibri we present an application, described in [66], that serves to classify automatically documents from electronic journals into different categories: laws, history, medicine, ... This system is used by librarians to assign proper bibliographic categories when a new text is included into the catalogue. The application is also a Textual CBR system as it manages textual data. In this case, a statistical similarity function is used to compare documents. The system uses a corpus of 1.500 documents belonging to 20 different categories. Documents are processed to remove stop words and extract the stem. Then the TF*IDF filter is applied to select the most relevant 1.000 terms in the corpus. This corpus is a clear example of a huge case base with uncertain and incomplete cases.

This CBR system uses a majority-voting approach that assigns the most repeated category of the k nearest neighbors. The performance tests run over the case base shown the improvement in the efficiency of the system when using a clustered memory of cases. A key parameter of this system is the number of clusters being considered. For example, when comparing the query with the prototype of the 7 most similar clusters, the precision only decreases from 10 %, meanwhile the number of case being compared is half the size of the case base (47 %). This important reduction in the cases involved in the retrieval process implies a significant improvement in the efficiency of the CBR system.

Figure 14 illustrates this process with a reduced number of cases. Each color represents a different category. As we can observe, the clustering algorithm trends to group similar cases that should belong to the same category. Once each cluster has been identified, the query is compared with the corresponding prototype.

Fig. 14
figure 14

Visualization of a clustered case-base

Next we describe the last extension of jcolibri: an infrastructure for developing distributed CBR systems.

9 Distributed CBR

Research efforts in the area of distributed CBR concentrate on the distribution of resources within CBR architectures and study how it is beneficial in a variety of application contexts. In contrast to single-agent CBR systems, multi-agent systems distribute the case base itself and/or some aspects of the reasoning among several agents. [67] categorized the research efforts in the area of distributed CBR using two criteria: (1) how knowledge is organised/managed within the system (i.e. single vs. multiple case bases), and (2) how knowledge is processed by the system (i.e. single vs. multiple processing agents).

Much of the work in distributed CBR assumes multi-case base architectures involving multiple processing agents that differ in their problem solving experiences [68]. The “ensemble effect” [69] shows that a collection of agents with uncorrelated case bases improves the accuracy of any individual. Multiple sources of experience exist when several CBR agents need to coordinate, collaborate, and communicate. Within this purpose, jcolibri provides two extensions to design deliberative and distributed multiagent CBR systems where the case base itself and/or some aspects of the reasoning process are distributed among several agents. A deliberative system can predict the future state that will result from the application of intentional actions. These predicted future states can be used to choose between different possible courses of actions in an attempt to achieve system goals [70]. Our work focuses on distributed retrieval processes working in a network of collaborating CBR systems.

The basic extension to support distributed CBR applications in jcolibri is called ALADIN (Abstract Layer for Distributed Infrastructures). This layer defines the main components of every distributed CBR system: agents, directory, messages, etc. and could be implemented by using different alternatives: JADEFootnote 18, sockets, shared memory, ... It was defined after reviewing the existing literature on distributed CBR and is mostly compatible with IEEE FIPAFootnote 19 standards for multiagent systems. Because ALADIN is only composed of interfaces that define the behaviour of the system, we have developed an implementation of this abstract layer using standard network sockets. This extension is called SALADIN (Sockets implementation of ALADINFootnote 20) and provides a fully functional multi-agent environment for building distributed CBR systems that can be particularized in many ways.

9.1 A Distributed CBR System Built with colibri

In this section we describe a distributed CBR application in the domain of music recommendation, a classical example of successful recommender applications where there are many users interested on finding and discovering new music that would fulfill their preferences. Moreover, the users of this kind of applications tend to interchange recommendations with other users that have similar preferences. These relationships conform social networks that reflect the similarity in the preferences of the users and allow them to discover new items, and the confidence in the recommendation. This way, the system can measure the confidence between two users depending on their corresponding distance in the social network.

As the case base has a catalogue of songs, each user may have a part of this catalogue in its internal list of rated items. Every user interacts with its corresponding recommender agent. When a recommender agent receives a query from the user, it forwards the query to the other agents in the system it is connected to. Agents are organized according to a social network that ideally reflects the similarity and confidence between users. Then, these agents use their rated items to recommender songs that fulfil the preferences of the query. This organization according the social network reports a higher performance than the typical all-connected configurations. Both organizations are illustrated in Fig. 15.

Fig. 15
figure 15

Agents organization in a distributed CBR system

For further details we refer the interested reader to the paper by [71].

10 Related Work

colibri is nowadays the most popular CBR platform due to its broad scope, number of applications, users and contributors. There are other related CBR tools in the literature, with myCBR [72] being one of the most closely related to jcolibri. myCBR is also an open-source tool for developing CBR systems, although there are some important differences in its scope and architecture. myCBR is intended primarily for prototyping applications that focus on the similarity-based retrieval step while colibri includes components for supporting the whole CBR cycle, including retrieval, reuse, revision and adaptation. To a certain extent myCBR and colibri can be used in collaboration, using myCBR to define the case structure and similarity measures through their Protégé-based interface, and having colibri import those definitions from the XML files generated by myCBR, through a number of wrapper methods that were developed in collaboration by both project teams.Footnote 21

The Indiana University CBR Framework (IUCBRFFootnote 22) was conceived with a similar goal of providing an open-source solution for academic CBR projects but never achieved a mature enough development state and has not been actively maintained in the last years [73].

Although in a different domain jcolibri is also related to Weka, a collection of machine-learning algorithms for data-mining tasks. Weka originally served as an inspiration for colibri  which was intended to play a similar role in the CBR community, as an open-source reference tool in academia, like the one played by Weka in the data mining community. colibri is also influenced by Weka in the way it is designed to facilitate the construction of different configurations of a CBR system which can be compared along different dimensions. The two tools also share the idea of including both a collection of reusable methods, plus a software composition tool that allows to assemble a running system without writing a line of code. They even share some common methods, since Weka also includes an implementation of the Nearest Neighbours algorithm for instance-based classification. Nevertheless, and in addition to the obvious differences coming from covering different application domains, a key difference between the tools is that colibri is designed for supporting not only classification but also problem-solving tasks, and can deal with complex object-based data and not only simple attribute-value pairs like Weka.

11 Conclusions

In this chapter, we have described the colibri platform for building CBR systems. The mail goal of our research is to cover the need for open software engineering tools for CBR. We can conclude that colibri is fulfilling the goal of becoming a reference tool for academia, having, as of this writing, hit the 10,000 downloads mark with users in 100 different countries. The colibri platform comprises a framework—jcolibri—that provides the components required to build a CBR system. Then, several high level tools, packaged into the colibri studio IDE, support a novel development process for generating CBR systems based on the idea of reusing previous system designs. The colibri development process is an approach for composing CBR systems in a semi-automatic way. It is based on the idea of reusing templates, that are workflows that represent in an abstract way the behaviour of several CBR systems.

Once we have presented the platform and its development process, we have detailed the functionality provided by our tool. We have presented the basic functionality together with several complements that extend this basic functionality, namely textual CBR (see Sect. 5), recommendation (Sect. 6), knowledge-intensive CBR (Sect. 7), data-intensive CBR (Sect. 8) or distributed CBR (Sect. 9). Each extension has been illustrated with the description of a working application that uses that specific functionality.

We hope that this complete description of the colibri platform may encourage readers to use our tool.