1 Introduction

Case-based reasoning (CBR) (e.g., López de Mántaras et al. 2005) solves new problems by retrieving stored prior cases capturing the solutions to similar prior problems, and adapting their solutions to fit new circumstances, based on the differences between the new and old situations. When CBR is applied to synthesis tasks in knowledge-rich domains, as for case-based planning, it often relies on sophisticated strategies for adapting cases. However, when instance-base or case-based methods are applied to regression tasks for numerical prediction, it is common to use simple adaptation methods (e.g., Kibler et al. 1989; Aha et al. 1991). For example, k-Nearest Neighbor (k-NN) regression approaches often generate target values by distance-weighted averaging of the values of the k cases closest to the input problem.

Using simple case adaptation methods helps to alleviate the knowledge acquisition problem for case adaptation knowledge, and in practice can achieve good performance (e.g., Kibler et al. 1989; Aha et al. 1991). However, the demonstrated value of rich case adaptation for other CBR tasks suggests the question of whether more sophisticated case adaptation methods could improve the performance of case-based regression. This in turn raises the question of how the performance of “knowledge light” case adaptation could be improved, without increasing the knowledge acquisition burden for the developers of CBR systems. This article presents methods developed for this purpose.

The article proposes and evaluates new methods for case adaptation for regression tasks, using ensembles of case adaptation rules generated from knowledge contained in the cases in the case base. It presents a method for selecting the adaptation rules to apply, and also presents an experimental comparison of alternative strategies for using local and global information in both adaptation rule learning and rule application, to illuminate the relative performance benefits of local and global approaches. The article is adapted from Jalali and Leake (2013b) with additional perspective and extensions.

The article is organized as follows. Section 2 provides a brief overview of case-based reasoning, how case-based reasoning is applied to regression tasks, and how the case difference heuristic method (Hanney and Keane 1997) generates case adaptation rules from the case base. Section 3 introduces EAR (Ensembles of Adaptations for Regression), our ensemble-based case adaptation method. EAR is a general technique for augmenting k-NN for regression tasks. EAR automatically generates adaptation rules, using the case difference heuristic, selects a subset of the rules to apply, and uses the ensemble of rules to generate a new solution. Section 3.5 focuses on the adaptation rule selection and solution generation processes. Section 4 presents an evaluation comparing alternative versions of EAR with k-NN, linear regression, and locally weighted linear regression for estimating solutions in seven domains. The study shows encouraging results for accuracy and for the ability for the method to rely on local information for generating adaptation rules, rather than more computationally expensive use of extensive global information. This suggests the practicality of “lazy generation” of adaptation rules based on local information, as needed. Section 6 compares related work on using ensemble techniques in CBR and knowledge-light methods for generating and applying adaptations for case-based regression tasks. Section 7 presents conclusions and future work.

2 Background

2.1 Case-based reasoning

Case-based reasoning generates solutions to new problems by retrieving and adapting the solutions of similar prior problems, recorded in cases. A case contains at least two components, a problem—a description of the prior problem solved—and a solution. Often, a case contains a third component, a description of the outcome when the prior solution was applied (e.g., Kolodner and Leake 1996). Case-based reasoning systems may generate new solutions from single cases, or may combine the solutions of multiple cases. Prior cases are stored in a library referred to as a case base. We will refer to the input problem as the query.

A predefined criterion, normally a similarity metric, is applied to estimate the relevance of stored cases to the query, with the most similar case or cases returned, for their solutions to be adjusted by the adaptation step. Case adaptation is often performed by rules whose antecedents describe differences between a query and a problem description, and which adjust past solution values and/or modify the past solution’s structure to account for the differences. To “adapt a case to a problem” means to adapt its solution to fit the new problem.

2.2 Case-based regression

This article focuses in particular on the application of CBR to numerical prediction tasks, also referred to as regression tasks. Case-based regression predicts the solution for a regression problem (i.e. predicts a numerical value), based on the solutions of similar problems. Case-based regression normally generates solutions by combining the values of k “nearest neighbor” (most similar) cases to the query, for some predefined k. Often, the solutions of those cases are simply averaged.

For case-based regression, the problem part of cases is often a vector of numerical feature values. For example, for the regression task of real estate appraisal, which predicts house price based on house characteristics, each case would correspond to a house of known value. The problem description of the case would describe house features, such as the number of rooms, square footage, and lot size. Given a new house description as a query, a set of cases for similar houses would be retrieved from the case base, and the solution would be built by adapting and combining the solutions of the selected cases, e.g., by taking an average of their prices, weighted by their houses’ similarity to the query.

For k-NN, there is often no adaptation of solutions of individual cases, beyond the combination function used to collect individual solutions from the k cases. However, prior to combination, adaptation rules could adjust solutions based on problem differences. For example, a rule could increase predicted price if the query property has more rooms than the property of the most similar retrieved case.

Figure 1 illustrates 3-NN. In the figure, \(A:CB \times Queries \rightarrow \mathbb {R}\) is a function which, given as input a case in the case base and a query, returns a candidate solution for that query. In systems using adaptation, A would be the adaptation function. However, in k-NN without case adaptation, A simply returns the prior solution. The prior solutions are averaged by C o m b i n e V a l s.

Fig. 1
figure 1

Illustration of the generic case-based regression process

Some research has pursued more sophisticated adaptation methods for case-based regression. For example, Fuchs et al. (2014) propose a framework which decomposes the differences between a retrieved case and target problem into multiple components and uses operators inspired by differential calculus to adapt the components, forming an adaptation path from the source case to the target problem.

2.3 Generating adaptation rules with the case difference heuristic

A classic challenge for case-based reasoning is how to acquire case adaptation rules. Manual generation of adaptation rules requires domain knowledge, so in many domains rule generation is a difficult and expensive manual task. To help alleviate the rule generation problem, “knowledge light” methods have been developed to automatically generate adaptations from a case base. A highly influential approach to case adaptation generation is the case difference heuristic method introduced by Hanney and Keane (1997) and further explored by others (e.g., McSherry 1998; McDonnell and Cunningham 2006). In the case difference heuristic approach, a new rule is generated by selecting a pair of cases from the case base, comparing their input features and solutions, and generating rules which adapt cases with similar problem differences to achieve similar solution differences. We call the case to be adapted the source case. We call the case to which the source case’s solution is to be adapted the target case. We can view the case difference heuristic as aiming to generate the specific rule needed to adapt the solution of the source case to the problem of the target case; the rule need not be generalized. However, the new rule is expected to apply to a broader range of case pairs.

To illustrate the case difference heuristic, consider the task of estimating fuel efficiency (MPG) of automobiles, and suppose one vehicle differs from another in being 200 pounds heavier, and the heavier vehicle’s gas mileage is 4 % less. From this, it would be possible to hypothesize various rules, such as the extremely simple rule that a 200 pound increase in the weight decreases the MPG by 4 %. Given a new vehicle whose MPG must be predicted based on that of a vehicle which is highly similar, but lighter, the rule could suggest how to adapt the old vehicle’s MPG to compute the new vehicle’s MPG. Note that because the most similar case will be selected to adapt, the rule will normally only be applied to cases similar to the query. The two parts of Fig. 2 depict the generation of an adaptation rule for MPG, using the case difference heuristic, and an application of that rule.

Fig. 2
figure 2

Illustration of using the case difference heuristic to generate an adaptation rule and to generate an MPG estimate

3 EAR: Learning and applying ensembles of adaptation rules

Ensembles of Adaptations for Regression (EAR) is a method for applying the case difference heuristic approach. Any application of the case difference approach depends on making choices such as which pairs of cases will be used to generate adaptation rules and how rules will be generated from a given case pair. Using the rules depends on making choices about how to select source cases to adapt and the rules to use to adapt them. When many candidate adaptation rules are available, the problem of rule selection becomes particularly important. EAR addresses the rule selection problem with a novel method for adaptation rule ranking, and addresses the problem of potential quality variations in generated rules by using an ensemble of adaptations, applied to an ensemble of cases, for each adaptation problem. We have tested some configurations of EAR as a lazy approach to adaptation rule generation, in which adaptations are generated only as needed for particular problems, rather than in advance.

In this section we discuss EAR’s strategies, and in Section 6 we compare EAR’s approaches to related work. We note that EAR’s general ensembles of adaptations approach is not restricted to being used with rules generated by the case difference heuristic: It could be applied to adaptation rules generated by any method, or even generated by hand. However, the adaptation ranking strategy is tailored specifically to rules generated by the case difference heuristic.

3.1 Overview of EAR’s process

Given a query, EAR retrieves similar source cases and generates ensembles of adaptation rules as needed to adapt them, from cases in the case base. EAR is guided by user-selected criteria for (1) the locality of the neighborhood from which to select source cases in the case base to adapt, and (2) the locality of cases from which to generate rules for adapting each source case. For each source case to adapt, the generated adaptation rules are ranked, with the top r rules applied, for a user-specified value for r. The candidate solutions generated from each of the adapted source cases are then combined. The following sections discuss each component: EAR’s neighborhood selection criteria, its process for generating adaptation rules, its rule ranking approach, and its solution combination. Algorithm 1 summarizes the overall process.

Algorithm 1 EAR’s basic algorithm

Input:

Q: query

n: number of source cases to adapt to solve query

r: number of rules to be applied per source case

CB: case base

Output: Estimated solution value for Q

    CasesToAdapt ← NeighborhoodSelector(Q, n, CB)

    NewRules: ← RuleGenerator(Q, CasesToAdapt, CB)

    for c in CasesToAdapt do

            RankedRules ← RankRules(NewRules, c, Q)

            SolutionEstimate(c) ← CombineAdaptations(RankedRules, c, r)

    end for

    return CombineVals(∪ cCasesToAdapt SolutionEstimate(c))

3.2 Neighborhood selection: Selecting source cases to adapt

We consider three general alternatives for selecting source cases to adapt for a given query, defined by the locality of the cases they select:

  1. 1.

    Nearest: Select only the single nearest neighbor to the query (1-NN)

  2. 2.

    Local: Select the k nearest neighbors to the query (k-NN, for a small value of k greater than 1)

  3. 3.

    Global: Select all cases in the case base

Previous CBR research has considered adaptation learning methods using nearest and local case sets (see Section 6). However, the global approach is unusual. Because the global approach may consider cases quite dissimilar from the query, its feasibility depends on the quality of the adaptation and combination strategies used.

3.3 Rule generation: Selecting cases from which to generate adaptation rules

Our approach to adaptation rule generation starts from the case difference heuristic (Hanney and Keane 1997). That approach builds adaptation rules by comparing pairs of cases, attributing the difference in the cases’ solutions to the differences in their problem features, and generating a rule which—given a source case with a similar feature difference from a query—will perform a similar adjustment in that case’s solution.

For each case selected as a source case to be adapted, we consider three strategies for selecting pairs of cases to use to generate adaptation rules, as listed below. The names of strategies have two components (e.g., as described below, “Local cases–Global neighbors”). The first component (e.g., “Local cases”) describes the neighborhood of cases around the source case from which to draw the first element of pairs for generating adaptation rules.

For each first element of the pair, a second element of the pair must be chosen, in order to be able to compare the two elements to generate a rule by the case difference heuristic. The second element of the strategy name (e.g., “Global neighbors”) describes the neighborhood of the first element of the pair from which the second elements of the pairs will be drawn. Each comparison results in a different rule, so a single first element may participate in the formation of many rules.

We consider the following strategies for case selection for adaptation rule generation:

  1. 1.

    Local cases–Local neighbors: Generating adaptation rules by comparing each pair of cases in the local neighborhood of the query.

  2. 2.

    Global cases–Local neighbors: Generating adaptation rules by comparing each case in the case base with its k nearest neighbors

  3. 3.

    Global cases–Global neighbors: Generating adaptation rules by comparing all cases in the case base

Figure 3 provides an intuitive illustration of the sets of cases considered by the three methods. In each figure, the case corresponding to the query is at the center. Circles enclosed by dotted lines show neighborhoods of cases from which adaptations will be generated. A sample first case in an adaptation rule pair is connected to the second cases with which it will be compared. Each comparison generates an adaptation rule.

Fig. 3
figure 3

Illustration of (a) Local cases–Local neighbors, (b) Global cases–Local neighbors, and (c) Global cases–Global neighbors

3.4 Combining strategies for source case selection and rule generation case selection

Combining each of the three source case selection strategies with one of the three adaptation generation strategies gives nine possible alternatives, any of which could be chosen as an overall strategy for EAR. Each approach has potential ramifications for efficiency and accuracy of adaptation.

Ramifications for efficiency

The global case approaches normally require considering more rules than the local approaches. Using global cases–global neighbors, adaptation rules are generated for each possible ordered pair of cases in the case base (for any two cases c 1 and c 2 with different solutions, the transformation needed to adapt c 1 to c 2 is different from that to adapt c 2 to c 1, so a different rule could be generated from the pair (c 1, c 2) than from the pair (c 2, c 1)). Consequently, from a case base with n cases, up to \(2 {n \choose 2}\) rules could be generated. As an example of the difference in processing costs, in sample leave-one-out tests with the 391-case MPG case base from the UCI repository (Frank and Asuncion 2010) on a 2.7 GHz CPU, average solution generation time with the local-local method was 0.7 seconds, with local-global was 15.8 seconds, and with global-global was 640.2 seconds. No attempt was made to optimize these computations, and these times could be improved. However, is clear that for large case bases, exhaustively generating adaptation rules would not be feasible. Mitigating the uncontrolled growth of rules generated by EAR is out of the scope of this article, but it is addressed in authors’ works on adaptation rule retention (Jalali and Leake 2014c) and adaptation guided case base maintenance (Jalali and Leake 2014a). The methods introduced there aim at minimizing the set of adaptations and cases retained by the system, while preserving the system’s competence.

Ramifications for accuracy

It is more difficult to predict the accuracy effects of the strategy choices. One hypothesis might be that generating adaptation rules from local cases would improve accuracy because the adaptations are being generated from the same area of the domain space as the input query, making them more likely to properly address the differences between the input query and the problems addressed by the source case(s). A contrasting hypothesis would be that limiting the scope of adaptations to the context of the input query sacrifices the benefit of considering distant cases for which adaptations may be similar (even if the cases themselves are not similar), decreasing accuracy. We examine the accuracy ramifications in the experiments.

How locality affects accuracy in this setting relates to an interesting general question concerning the locality of case adaptation knowledge. Even if the CBR principle “similar problems have similar solutions” means that solution characteristics are correlated with regions of the problem space, it is possible that adaptation knowledge could still apply globally (i.e., the relationships between problem feature changes and solutions changes could be similar regardless of region of the case base). This stance that adaptations are global has long been taken implicitly in CBR systems, because CBR systems are generally designed with a single set of adaptation rules, adapting the same difference in the same way, regardless of the part of the case base in which the problem falls. To our knowledge, the question of locality of adaptation knowledge has not been studied previously. The experiments shed some light on this question as well.

3.5 Applying EAR with eager or lazy rule generation

Another design alternative concerns whether to generate adaptation rules in an “eager” fashion, in advance of encountering the adaptation problems for which they will be used. The capability to generate adaptation rules automatically enables, in principle, the use of EAR with a lazy approach to adaptation rule generation: It would be possible to generate adaptation rules only when needed. Whether such an approach is feasible may depend on the locality of the cases used for rule generation. The global cases–global neighbors approach requires generating so many adaptation rules that re-generating them might not be practical if rapid response is needed, requiring an eager rule generation approach. Although having a complete set of rules would depend on re-generating the rules as new cases are acquired, rules could be re-generated only periodically, or only if accuracy with the current rule set fell below acceptable levels. For example, global cases–global neighbors could be applied to the system’s initial cases to generate a static set of adaptation rules, to be updated only if needed.

In contrast to methods considering global cases, local cases–local neighbors is amenable to a lazy approach with just-in-time generation of adaptation rules. Such an approach offers the benefit of only generating adaptation rules which are needed for problems the system actually encounters, and enables adaptation rule generation to take into account any new cases added to the case base in the region of the query. An important question is whether the local approach retains sufficient accuracy despite considering fewer cases. We examine this in the experiments.

3.6 Ranking adaptation rules

The adaptation rule generation approaches from the previous sections may result in the generation of many adaptation rules with varying reliability or applicability to the query. EAR exploits the availability of multiple adaptation rules by applying an ensemble-based approach to case adaptation, and uses a rule ranking method to select a subset of the rules to include in each ensemble, both to increase adaptation efficiency and to apply the most on-point rules.

EAR’s adaptation rule ranking considers two factors, the similarity of the current adaptation problem to the adaptation problem from which a rule was generated, and the adaptation context.

Similarity of the adaptation problem to the problem used to generate the rule

Given a query, EAR favors adaptation rules generated to adapt a pair of cases with problem differences similar to those of the current adaptation problem. The similarity of the current adaptation problem to the adaptation problem used to generate a rule is calculated from the similarity of two pairs of cases. The first pair is the current source case and the query to which it must be adapted. We designate these as (s, Q). The second pair is the pair of cases used to generate that rule r by the case difference heuristic. The first element of that pair is the case that was treated as the source case for generating r, and the second is the case treated as the query for generating r. We designate these (s ', q '). EAR calculates two difference vectors, one of the difference between s and Q, and the other of the difference between s ' and q '. Euclidean distance is then used to calculate the distance between those two vectors.

Favoring adaptation rules from similar adaptation contexts

The second factor reflects similarity of the current “adaptation context” to the adaptation context when the rule was generated. We define the adaptation context of a case to reflect the local adaptation characteristics of the case base, i.e., how small changes in the problem description tend to affect its solution value, in the neighborhood of a query. The rationale for considering adaptation context is that the same feature difference between two cases may require different adaptations in different neighborhoods, and, consequently, favoring rules from neighborhoods with similar adaptation contexts may improve adaptation results.

Note that if the adaptations are generated from local cases–local neighbors, all rules are generated from the same neighborhood, and hence have similar context, so in that situation adaptation context has less effect on rule selection.

3.6.1 Calculating adaptation context

EAR represents problems as feature vectors. EAR represents adaptation context as a vector with one element for each feature of the problem description. EAR generates the adaptation context description for a case by comparing it with the n nearest cases in the case base, for some fixed user-selected n. Given a case C, for each feature in its problem description, EAR calculates the covariance between the feature and the case solution, over the set of cases in the neighborhood, and the covariance values become the values of the corresponding elements in the context vector.

Formally, let C 0 be the case for which the adaptation context is being calculated, and let C i , for i = 1,...,n denote the i th case in the selected neighborhood. Let P r o b j (C i ) denote its j th problem feature, and S o l(C i ) denote its solution value. Let C a s e M e a n V a l denote the mean of the solution values of the cases in the neighborhood and F e a t u r e M e a n V a l j represent the mean of the j th feature of the cases in the neighborhood. For any case C 0, let C o v j (C 0) represent the j th element of the covariance vector for case C 0. C o v j is calculated as follows:

$$ Cov_{j}(C_{0}) = \frac{1}{N} \times \sum\limits_{i=1}^{N} (Prob_{j}(C_{i}) -FeatureMeanVal_{j}) (Sol(C_{i}) - CaseMeanVal) $$
(1)

For a problem description with f features, for any case C 0, we define C 0’s adaptation context A d a p t C o n t e x t(C 0) to be the vector (C o v 1(C 0), C o v 2(C 0),...C o v f (C 0)).

3.6.2 Combining similarity and context to generate a rule ranking score

EAR’s adaptation rule ranking method balances similarity of adaptation problems and similarity of adaptation contexts. For each query, it generates a single vector combining the adaptation context and the feature differences between the retrieved case to adapt and the query, by taking the Hadamard (element-wise) product of the feature difference vector and the adaptation context vector.

EAR generates an analogous single vector combining the adaptation context for the case that the retrieved rule was generated to adapt (the first case in the pair of cases from which the rule was generated), and the feature differences between that case and the problem the rule was generated to solve (the second case in the pair). Each rule’s ranking score is based on the Euclidean distance between the two combination vectors.

More formally, suppose query Q is to be solved by adapting the case C i . let Δ i represent the difference vector of the features of the query Q and C i , and let R u l e P r o b r be the problem the r th adaptation rule was generated to address. Let ° represent the Hadamard product of two vectors and R r denote the Rule to be scored. Then the second component of EAR’s rule scoring method is calculated by:

$$ d({\Delta}_{i} , R_{r}) = distance((AdaptContext(C_{i}) \circ {\Delta}_{i}), (AdaptContext(C_{r}) \circ R_{r})) $$
(2)

The final rank score for an adaptation rule is a weighted average of the similarity of adaptation problems and the combined context vectors:

$$ RuleScore(R_{r}) = a \times distance({\Delta}_{i} , RuleProb_{r}) + (1-a) \times d({\Delta}_{i} , R_{r}) $$
(3)

where 0=a = 1. The value of a is set to tune the ranking for different domains.

3.7 Estimating solutions using ensembles of adaptation rules

In its simplest form, k-NN estimates the value of a query by averaging the value of its k nearest neighbors. If Q is the query and E s t(Q) represents its estimated solution, and S o l(C i ) represents the known solution value of the i th neighbor of Q, for i = 1,..., k then k-NN estimates the value of Q as:

$$ Est(Q) = \frac{1}{k}\sum\limits_{i=1}^{k} Sol(C_{i}) $$
(4)

Instead of averaging the nearest neighbors’ values directly, for each neighbor EAR calculates an adapted result—by taking the average result of adapting that neighbor to the query by the top-ranked adaptation rules—and then averages that results over the neighbors. Specifically, for each neighbor case C, EAR averages the values proposed by each of the n highest-ranked adaptation rules generated for that case by (3). If \(R_{i}:CB \times Queries \rightarrow \mathbb {R}\) for 1=i = n are the n top-ranked adaptation rules in order of descending rank score, then:

$$ SuggestedVal(C,Q) = \frac{1}{n}\sum\limits_{i = 1, n}R_{i}(C,Q) $$
(5)

The value for the query is then simply

$$ Solution(Q) = \frac{1}{k}\sum\limits_{i=1}^{k} SuggestedVal(C_{i}) $$
(6)

In the next section we report the experimental results of using EAR for estimating the solutions for a set of candidate test domains.

4 Experimental results

We conducted experiments to address the following questions about the EAR approach:

  1. 1.

    Can using automatically-generated ensembles of adaptations improve accuracy over using a single adaptation?

  2. 2.

    How does accuracy compare for adaptations generated from local vs. global case knowledge?

  3. 3.

    How does EAR’s accuracy compare to that of the baseline regression methods locally weighted linear regression and k-NN?

  4. 4.

    How does EAR’s accuracy compare to that of case-based regression using standard feature subset ensemble methods?

  5. 5.

    How does EAR’s rule ranking (based on adaptation context and case similarity) affect its performance compared to the baselines of (1) random selection of adaptation rules and (2) ranking solely considering case similarity?

4.1 Data sets and experimental design

Our experiments use five data sets from the UCI repository (Frank and Asuncion 2010) that are commonly used for testing regression methods in the literature: Automobile (A), Auto MPG (AM), Housing (H), Abalone (AB), and Computer Hardware (CH). For all data sets, records with unknown values were removed. To enable comparison with linear regression, only numerical features were used. (Note that if the use of adaptation context in EAR were disabled, EAR could be used for cases with symbolic features as well.) For each feature, values were standardized by subtracting that feature’s mean value from each individual feature value and dividing the result by the standard deviation of that feature. Table 1 summarizes the characteristics of the test domains.

Table 1 Characteristics of the test domains

The experiments applied EAR for the Auto, MPG, Housing, Abalone, Hardware, Stock and CPU domains. The respective values to estimate are price, mpg, MEDV (median value of owner-occupied homes in $1000’s), rings (for the Abalone data set we only selected cases with rings ranging 1-8), and PRP (published relative performance).

The linear regression baseline was the simple linear regression class from Hall et al. (2009); the locally weighted linear regression baseline was Weka’s locally weighted learning class (using the linear regression class as the base learner).

Hill climbing was used to select the best neighborhood size for each domain based on the training data, as for calculating adaptation context, for setting the weighting factor a(3). Hill climbing individually tested a range of parameter settings, for each method, by leave-one-out testing on the training data, to select the optimal parameter settings for each method. , and for determining the number of adaptations to consider. The experiments compared nine variants of EAR, EAR1–EAR9, one for each possible combination of locality choices for selecting cases to adapt and selecting cases for generating adaptation rules. The variants are summarized in Table 2. For example, EAR1 is listed as “nearest, local-local:” It solves problems by adapting the nearest case to the problem, using adaptation rules generated by the local-local strategy.

Table 2 MAE of EAR, k-NN, LWLR and LR for the sample domains

Hill climbing was also used to determine the number of adaptations used for each variant, for each set of training data. The number of adaptations used ranges from a minimum of one, for EAR9 to at most 40 for EAR1, EAR2 and EAR3 unless explicitly mentioned otherwise. In all experiments Euclidean distance is used as the distance function in (2).

For each query, the 5 % of cases in the case base closest to the query are used as source cases to adapt. For the local-local adaptation generation method, the closest 5 % of cases to the input query are used for generating adaptations. For the global-local adaptation method, for each case in the case base, the closest 5 % of cases to that case are used for generating adaptation rules.

For all domains, the number of rules applied per source case is determined by hill climbing. The final estimates are generated by combining adapted solutions of the selected source cases.

4.2 Effects of ensembles, local vs. global knowledge, and comparison to baselines

The first three experimental questions are closely coupled, as they involve comparisons of different configurations of EAR and of EAR to baseline methods. To address them, we conducted tests comparing the results achieved by each of the 9 versions of EAR, k-NN, linear regression (LR) and locally weighted linear regression (LWLR) in the sample domains. Table 2 summarizes the results, which we discuss below as they relate to each of the three questions. In the following tables, the best values are indicated in bold.

4.2.1 Question 1: Accuracy using ensembles vs. single adaptations

In the experiments, EAR4 (local, local-local), EAR5 (local, global-local), EAR6 (local, global-global) and EAR9 (global, global-global) usually yield the best results, suggesting the benefit of generating adaptations based on multiple cases and selecting adaptations from their results to combine. For example, EAR4 (local, local-local) usually yields its best results (in all domains except Abalone) when five to nine adaptations are combined. There were some exceptions to the general pattern of ensembles of adaptations improving performance. For EAR9 (global, global-global) in most cases using one adaptation per case in the Auto, MPG and Housing domains yields best results (for the Hardware domain, often two cases are used). For the Abalone domain the optimal number of adaptations based on the training data is on the order of 20, but the difference between using one adaptation rule and more rules is minimal (1 %).

Effect of domain characteristics on EAR’s performance

We observed that EAR showed less benefit for the Abalone data set than for the other data sets, with performance of EAR often comparable to k-NN. We hypothesize that the level of improvement from EAR over k-NN could be related to the diversity of case solutions in the case base. If relatively many cases share identical solutions, and the standard deviation of solutions is low, using an appropriate similarity measure, with, e.g., k-NN, may be sufficient to generate good solutions simply by averaging values. However, with more diversity, more adaptation may be needed. For the tests domains used, Table 1 shows the average number of cases sharing the same solution and the standard deviation of the solutions. This shows that these characteristics of the Abalone data set are substantially different from the other data sets. However, more examination is needed.

4.2.2 Question 2: Local vs. global knowledge for generating adaptations

Table 2 shows that in most domains, the performance of EAR4 (local, local-local) is competitive with the other versions of EAR, and is superior to the baseline methods, despite the fact that it uses limited (local) information. For example, comparing EAR4 to the most global method, EAR9 (global, global-global), MAE’s are 1.38 vs. 1.37, 1.9 vs. 1.95, 2.04 vs. 2.25, 0.60 vs. 0.59, and 28.74 vs. 28.18. Because EAR4 uses limited information, it is computationally much less expensive than the global methods. Thus the local method’s performance at worst has a minimal accuracy penalty, and sometimes is substantially better. Also, EAR4 has the benefit of reducing computational cost and making practical a lazy approach to adaptation rule generation.

EAR7 (global, local-local) and EAR8 (global, global-local) usually yield the worst results. In those two methods all cases are considered as source cases for estimating the target value, so adaptations generated from neighbors of possibly distant cases may not be appropriate for addressing the differences between the input problem and the source cases.

4.2.3 Question 3: EAR vs. LWLR and k-NN

In all domains, the performance of EAR4 surpasses or equals that of the baseline methods, sometimes substantially so. EAR4 has the least benefit in the Abalone and Hardware domains, where its performance is almost identical to k-NN. In all domains, one of the nine versions of EAR has the best performance.

Table 2 shows the absolute results obtained in each domain. To give a clearer comparative view, Fig. 4 contrasts the relative improvement of EAR4 over k-NN (14 %, 5 %, 26 %, 0 % and 1 %) with the relative improvement of LWLR over k-NN (0 %, -1 %, 7 %, -13 % and -6 %) in the Auto, MPG, Housing, Abalone and Hardware domains respectively.

Fig. 4
figure 4

Percent of improvement in MAE by EAR and LWLR over k-NN

A one side paired t-test was used to assess the significance of the results for EAR4. For the comparison with k-NN, the null hypothesis was that MAE of EAR4 was greater than that of k-NN. EAR’s performance improvements were significant for the Auto domain (A) (p<.007), Housing (H) (p<.001), and Hardware (CH) (p<.001). EAR’s improvement over k-NN in the MPG domain (AM) was not significant (p<.062). k-NN had an extremely small edge over EAR4 in the abalone domain (AB), with a decrease in MAE of .01 %. Using the null hypothesis that the MAE of k-NN was greater than that of EAR4, we determined that the performance improvement was not significant (p<.51).

In our tests, EAR4 always outperformed LWLR. p values in the order of the graph are p<.051 (not significant), p<0.05, p<.001, p<.001, and p<.18 (not significant).

4.2.4 Question 4: EAR vs. feature subset ensembles

We also compared EAR4’s performance to a previous instance-based ensemble approach, feature subset ensembles (FSE) (Bay 1998). FSE uses a combination of k-NN predictors, each of which predicts based on a different subsets of case features (all subsets are of fixed size). The feature subsets are selected randomly with replacement (each subset includes at least two features). In our tests, each ensemble contains predictors based on 100 different subsets of features, and FSE and EAR4 are evaluated by ten-fold cross validation. Both EAR4 and the feature subset ensemble methods were compared with their best parameter settings, as determined by hill climbing and leave-one-out testing on the training data for each fold. For feature subset ensembles, this determined the number of predictors (k) to use and the number of features to use. For EAR4, this determined the number of source cases to adapt and number of adaptation rules to use. For each domain, the local neighborhoods were set to contain the top 5 % nearest neighbors of the input query. Learning was disabled for both methods. Table 3, shows Mean Absolute Error for k-NN, FSE and EAR4 (local, local-local) on the test domains.

Table 3 MAE of EAR4, k-NN and the Feature Subset Ensemble method for the sample domains

The absolute performance results in Table 3 show that EAR outperformed FSE in all test domains. For the Abalone domain, k-NN slightly outperformed both ensemble methods, which we hypothesize to be due to lack of domain diversity, as mentioned previously. Figure 5, shows the relative improvement of EAR4 (local, local-local) and FSE over k-NN in the test domains.

Fig. 5
figure 5

Percent of improvement in MAE by EAR4 and FSE over k-NN

4.3 Question 5: Effect of context-based rule ranking

We tested how EAR’s context-based adaptation rule ranking approach affects performance by an ablation study comparing EAR4 and EAR6’s performance with three different adaptation rule ranking methods: (1) random ranking of adaptation rules, (2) rule ranking by difference similarity only, and (3) EAR’s approach, balancing difference similarity and adaptation context similarity.

As Table 4 shows, random ranking has the worst performance, with especially bad performance for the global-global methods, which generate more rules. The comparative difference appears to increase for domains in which values have higher standard deviation (e.g. Hardware), and is lowest for Abalone, which has the largest average number of cases per unique solution and the lowest solution standard deviation. There the random method shows same performance as the distance-only method.

Table 4 MAE of EAR4, EAR6 and three ablated versions

Expanding the pool of adaptation rules, as done by global methods, decreases accuracy for the distance-only method in nearly all domains, while EAR is more robust. This provides support for the use of contextual information in EAR enabling it to select more appropriate adaptations from the global pool.

4.4 When EAR is most beneficial

The experimental results suggest some circumstances in which EAR may be likely to be particularly beneficial. First, in the tests, EAR’s edge over other methods tended to be greater when solutions tended to be unique, i.e., the ratio of possible solutions to cases was low; we hypothesize that diversity increases the importance of adaptation for coverage, increasing the relative benefit of EAR. Second, because the effectiveness of the global-local adaptation method depends on accurate context estimation, and EAR’s context computation depends on covariance, the global-local method will be most accurate to the extent rate of changes in the feature values and target value in the case base can be approximated by a local linear relation. However, such a relationship is not essential: in spaces for which similar problems have similar solutions (a basic assumption of CBR), the availability of cases near the query can help decrease the need for adaptation. Finally, following well-known properties of ensemble methods in general (Dietterich 2000), we expect that EAR’s ensemble method could be helpful for noisy case bases.

5 Extensions to EAR

The previous evaluation is encouraging for the effectiveness of EAR’s core methods. We have been exploring additional refinements and extensions of the basic EAR approach, which we summarize in this section.

5.1 Refining context-aware retrieval of adaptations

EAR’s adaptation rule selection procedure uses a covariance-based criterion to enable selecting adaptations which were generated for similar adaptation problems, even if the cases from which they were generated were outside of the neighborhood of a query. We have also explored a more refined adaptation context model (Jalali and Leake 2013a), CAAR (Context-Aware Adaptation Retrieval), which derives adaptation context by locally weighted linear regression done both at the source case and the query case. CAAR was tested on a set of standard and synthetically generated domains (to have more control over the domain characteristics) and shown to improve performance compared both to the alternative baseline methods and to two ablated versions of CAAR with a less sophisticated context model. Experiments on the synthetically generated domains showed that highly variable domains, for which context changes more rapidly, decrease performance, but in tests for those domains CAAR still outperformed alternatives.

5.2 Reflecting confidence in rule ranking

When the case base contains noisy cases, the quality of adaptation rules generated by the case difference heuristic may suffer if rules are generated based on inaccurate solution values. Consequently, we have developed a method (ConfCBR) (Jalali and Leake 2013c) which extends EAR by favoring high confidence source cases when solving problems and considering rule confidence—derived from the confidence of the values of the case pair from which the rule is generated—in rule ranking. Experimental comparisons of methods such as EAR, ConfCBR, k-NN, k-NN enhanced with confidence considerations supported provided support for the benefit of ConfCBR, which increased as variations in case confidence levels increased.

5.3 Guiding source case retrieval by adaptability

Adaptation-guided retrieval (Smyth and Keane 1998) replaces similarity-based case retrieval—which aims to retrieve similar cases, under the assumption that more similar cases will be more adaptable—with case retrieval guided directly by reasoning about adaptability. We have explored the integration of adaptation-guided retrieval into EAR with the variant EAGR (Jalali and Leake 2014b) (ensemble of adaptations-guided retrieval), an approach in which source cases to retrieve are chosen based on the availability of an ensemble of suitable rules for adapting them. EAGR showed improved accuracy compared to EAR, k-NN and another adaptation-guided retrieval based method. However, there is a tradeoff between EAR and EAGR, in terms of accuracy versus time complexity. Because adaptation-guided retrieval extends the pool of potentially relevant cases—with the right adaptation, any case in the case base might be relevant—EAGR must consider many more cases than EAR. Consequently, EAR is more suitable for large case bases.

6 Related work

The EAR approach relates both to research on ensemble methods in CBR and to automatic adaptation rule generation for case-based regression.

6.1 Ensemble methods in CBR

Ensemble methods aggregate results from a set of models. A number of general-purpose ensemble approaches have been proposed, such as bagging (Breiman 1996), boosting (Schapire 1990) and random forests (Breiman 2001). In CBR research, ensemble methods have been applied to improve accuracy by combining solutions from multiple subsets of a case base or from multiple case bases. For example, Cunningham and Zenobi (2001) propose improving accuracy of nearest neighbor classifiers by using an ensemble of classifiers, each based on different feature subsets. Arshadi and Jurisica (2005) present an ensemble method for combining predictions of a set of classifiers built based on disjoint subsets of cases from the original case base, for which the case features are selected locally by using logistic regression. Li and Sun (2012) propose using an ensemble of CBR systems, with randomly generated feature subsets used for similarity assessment in each individual CBR system, and forming the final solution by combining the results of those individual systems. Wiratunga et al. (2006) apply an ensemble of adaptations, learned by machine learning methods, to adapt symbolic values. However, to our knowledge, previous CBR research has not considered the use of ensembles of case adaptation rules for regression.

6.2 Learning adaptations from the case base

When case-based reasoning systems retrieve a stored case to solve a new problem, they adapt its solution in light of the differences between the new problem and the problem it solved. Often, the strategies for adapting cases to new situations are hand coded, which can be effort- and knowledge-intensive. To help alleviate the burden of manual adaptation knowledge acquisition, a number of approaches have been proposed for automatically generating adaptation rules from the cases in the case base. For for reasons of space, we limit our discussion to methods which learn adaptations from cases in the case base for regression tasks, rather than more knowledge-intensive approaches for other types of domains.

6.2.1 Adaptation learning based on differences

Wilke et al. (1997) discuss general issues for designing a learning algorithm for “knowledge light” learning of case adaptation knowledge. They use their framework for two different approaches: weighted majority voting and the case difference heuristic proposed by Hanney and Keane (1997). However, their case difference method differs from EAR’s in its strategies for ranking rules and composing the final solutions. It also differs in that research on EAR explores effects of generating rules from different neighborhoods of the case base, while their method generates adaptation rules by comparing similar cases only. In addition, the rule application process in EAR is based on application of ensemble of adaptation rules to each difference, while Hanney and Keane’s method aims to find a set of rules, each one addressing a disjoint difference (or set of differences).

For each new adaptation problem, their approach attempts to find an adaptation rule which applies to all differences. If no such rule is found, for each difference between the target and retrieved case, a separate rule is retrieved for that difference. This gives all rules that could potentially address the differences between one or more features of the base and the target case. This set of rules is then generalized. Next, using the resulting rules as building blocks, their method attempts to find an adaptation path that addresses all differences between the base and target cases. If a complete path cannot be found, their system uses the partial path that addresses the most differences.

McSherry (1998) CREST (Case-based Reasoning for ESTimation) also generates adaptations from case differences. Given a case to adapt, CREST attempts to retrieve a case which differs from the input query only in the value of a single feature, called the distinguishing attribute. Next, a pair of cases with the same values for the distinguishing attributes as the query and (respectively) the case to be adapted are retrieved, and the solution of the retrieved case is adjusted based on their difference. Because more than one similar case may be retrieved for an input query, the final estimation of the target value is calculated by averaging different estimates, generated by the same method. McSherry’s method is similar to EAR’s local approach in generating adaptations based on neighbors to the input query. However, CREST adjusts the solutions of each source case by applying a single adaptation, while EAR uses an ensemble of adaptations.

McDonnell and Cunningham (2006) refine the case difference heuristic by considering context in retrieving adaptations. Their method generates adaptations by a local method, comparing the input query to nearby cases, selecting cases for which the gradient is similar to the target case (using local linear regression to approximate the gradients), and deriving adaptations from those cases. This gradient-based approach is in the spirit of EAR’s context-based approach, described in Section 6.2.3, but without the use of ensembles of adaptations. Thus EAR shares the goal of addressing issues for the case difference heuristic, but uses different methods for defining context and generating, ranking , and applying adaptation rules, and generates its adaptations on demand. Also, it applies ensembles of rules rather than aiming to select a single “best” rule.

6.2.2 Non-difference-based adaptation learning methods

Research on adaptation learning for regression has also investigated non-difference-based adaptation learning. For example, Policastro et al. (2008) propose a method for learning and applying adaptation knowledge from a case base by using two components, estimators and combiner. As estimators they use a multi-layer neural network, an M5 regression tree, and a support vector machine. As combiners, they consider the same three techniques, applied to combine the estimators’ values.

Craw et al. (2001), Jarmulak et al. (2001), and Wiratunga et al. (2002) propose automated acquisition of local adaptation knowledge by repeatedly partitioning the case base to form a small set of probe cases, retrieving k similar cases for each probe case, and building adaptation rules based on pairs of probe cases and their top k neighbors. For each set, their method creates rule sets, each one containing adaptation cases that concentrate on differences for a single feature. From those, their method selects rules whose decision tree indexes have above-average predictive accuracy. An initial solution is generated by averaging, with possible refinement by adaptation rules each addressing differences in a single feature.

Patterson et al. (2002) propose a rule acquisition process based on k-NN and linear regression analysis. Given a new problem, its k nearest neighbors are retrieved and combined in a new generalized case in which features are the distance-weighted average of the individual case features. The k nearest neighbors are also used to train a linear regression model for predicting the difference between case solutions, which is applied to the generalized case to predict the target value for the input. Like EAR, this method is compatible with lazy generation of adaptations; it differs in that it relies on linear regression instead of case differences, and on single adaptations rather than on ensembles of adaptations.

6.2.3 Role of context in adaptation rule ranking

McDonnell and Cunningham (2006) propose considering context in case adaptation for regression, defining the context of a point in problem space by approximating the rate of change (i.e., the gradient) of the regression system’s target value function at that point. This approach considers the context for the input problem and for the corresponding case used to generate the adaptation rule. Our work considers context for both the input problem and the source case. We have compared that method, implemented in the system CAAR (Jalali and Leake 2013a) (briefly discussed in Section 5.1), to other alternatives such as considering context for the input problem or source case only and not considering context at all. The experimental results in a set of synthetic and standard domains showed that CAAR can improve accuracy over the other alternatives mentioned. For reasons of space, we do not discuss context-related issues further here.

7 Conclusions and future research

This article has presented EAR, an approach for automatically generating sets of adaptation rules from a case base according to case differences, and for selecting ensembles of adaptations to apply. EAR aims to help alleviate the problem of acquiring case adaptation knowledge for CBR by augmenting the well-known case difference heuristic approach to adaptation generation with new strategies for selecting the cases from which to generate rules and for selecting the rules to apply, as well as by using ensembles of rules to help overcome variations in rule quality.

A key question for the case difference heuristic is how to select the cases from which to generate rules. A particular focus of the experiments was assessing the tradeoffs between using local or global knowledge when generating adaptation rules, and whether taking adaptation context into account could enable non-locally generated rules to applied effectively.

An experimental evaluation of nine variants of the EAR approach showed that EAR variants generally increased accuracy over baseline case-based regression and linear regression approaches, and that rule generation based on local information was sufficient to obtain accuracy competitive with the best performance obtained. This supported the sufficiency of a limited set of adaptations to provide good performance, suggesting the practicality of on-demand adaptation generation, a novel aspect of EAR which enables it to exploit the most recent case information and only to generate adaptation rules which are actually needed. Likewise, an ablation study provided support for the benefit of EAR’s context-based rule ranking approach as a means to select the right adaptations to apply.

Opportunities for future research include exploring further refinements of EAR’s context-based approach and examining alternative rule generation strategies and solution combination techniques. A long-term direction is to develop ways to apply the ideas of the approach to domains beyond regression. Although some specifics of EAR require numeric features, the high level approach could in principle be applied to knowledge-rich domains with symbolic solutions. Applying it to adaptation generation methods from such domains, and developing rule ranking methods for them, is an exciting future challenge.