1 Introduction

The proposal of case-based reasoning (CBR) can be traced back to the late 1970s (Schank 1983). Roger et al. from Yale University in the USA proposed to represent knowledge by means of script, which is regarded as the beginning of CBR research. Since then, CBR has experienced from simple basic application research to theoretical perfection (Kolodner and Simpson 1989; Navarro et al. 2011; Müller and Bergmann 2015; Homem et al. 2020; Le et al. 2020). It originated in the fields of cognitive science (CS) and artificial intelligence (AI). Typically, target cases are used to represent current problems or situations, and source or historical cases are used to represent problems or situations that have occurred. CBR refers to recalling previous successful cases, comparing the similarities and differences between source cases and the target case, finding successful cases that are similar to the current situation, then adapting and applying its solutions to solve the current problem Armengol et al. (2001). In particular, CBR acts a pivotal part in the field of application where there is no known standard, no known cycle, and no complete domain theory (Schmidt and Gierl 2005). CBR can simplify knowledge acquisition, improve problem solving efficiency, improve solving quality, and accumulate knowledge. It provides a method which is quite similar to human solving problems (Jian et al. 2015).

At present, CBR has been widely used in AI, and it has become a new methodology of problem solving and learning (Liu et al. 2019). With the gradual maturity of theories and methods, the applications of CBR have been extended to various fields, including medical treatment (Holt et al. 2005; Georgopoulos and Stylios 2008; Begum et al. 2010; Ramos-González et al. 2017; Torrent-Fontbona et al. 2019), planning (Pinto et al. 2018; Jiang et al. 2019), assessment (Liang et al. 2012; Hong et al. 2015), forecast (Kwon et al. 2020; Xu et al. 2021), game (Catalá et al. 2014), recommendation system (Alshammari et al. 2017), management (González-Briones et al. 2018) and so on (Floyd et al. 2015; Le Ber et al. 2018).

Plenty of scholars developed different CBR models with the intention of providing a better understanding of CBR process. One of the representative models is introduced by (Aamodt and Plaza 1994), in which they propose a process to sovle a new problem. Before reasoning, we need to choose the appropriate method to build a case base Bergmann et al. (2005). For a problem to be solved, we need retrieving one or more similar cases from the existing case base according to the characteristics. Solutions to cases retrieved are employed to create solutions to the new problem, and the solutions will be tested, modified, and evaluated to determine their effectiveness. Solutions that satisfy the user are learned and stored to case bases. The model of the CBR cycle is illustrated in Fig.1, which is called the 4-R lifecycle model.

Fig. 1
figure 1

The 4-R lifecycle model of CBR from Aamodt and Plaza (Aamodt and Plaza 1994)

From the model of CBR cycle, the CBR reasoning process is mainly divided into four stages: retrieval (R-1), reuse (R-2), revise (R-3), and retain (R-4) (De Mantaras et al. 2005).

  • R-1: RETRIEVE information from source case base and select potentially available source cases.

  • R-2: REUSE the solutions of retrieved source cases in new problems or cases.

  • R-3: REVISE the proposed solutions.

  • R-4: RETAIN the solutions to the problem in favor of subsequent reasoning.

The 4-R cycle model is summarized as follows: analyze the features of existing problems, retrieve one or more similar cases, try to reuse cases, and retain new cases in case base in light of their importance after the solution is revised and applied.

From the 4-R cycle model, we can get the fact that the quality of case retrieval strategy can largely determine whether a CBR system can play a strong superiority (Kang et al. 2013). The retrieval method directly affects the retrieval speed and accuracy rate (Petrovic et al. 2016), and whether the retrieval strategy is reasonable or not directly affects the realization effect of the whole case system. So case retrieval is the key to problem solving. In the aspect of retrieval strategy, there are knowledge guidance strategy (Rallabandi and Sett 2008), genetic algorithm strategy (Abualigah and Hanandeh 2015; Abualigah and Khader 2017), text metaheuristic strategy (Abualigah 2019), iterative methodology strategy (Marcos-Pablos and García-Peñalvo 2020), and nearest neighbor strategy (Cover and Hart 1967; Guo et al. 2014).

From the research status of case retrieval at home and abroad (Greene et al. 2010), the K-nearest neighbor (KNN) retrieval strategy is widely used at present (Schmidt et al. 2001). KNN means that each sample can be represented by its closest K neighboring values (Cover and Hart 1967). In the feature space, if most of the K samples closest to the sample belong to a certain category, then the sample is also classified into that category. KNN retrieval strategy calculates the similarity between the target case to be solved and the source cases according to the attribute weight and its eigenvalue (Li et al. 2009) and then selects one or some source case solutions with high similarity as the basis of case reuse (Lin and Chen 2011). In the calculation of similarity, the weight distribution will have a significant influence on the calculation results and the quality of the solution. Attributes that generally play a major role are assigned greater weight; conversely, less weight is given. KNN generally uses the average weight method. Although it is simple and easy to operate, it is sensitive to noise or irrelevant data, which will affect the reliability of the calculation results. The settlement of this problem usually relies on the reasonable allocation of the weight of characteristic attributes, so the allocation of weight has become an important research direction.

In CBR, although the current retrieval method based on similarity has attracted the attention of researchers and been widely used, it is not completely in keeping with the actual reasoning process. It is easily disturbed by small probability events, and the whole result is easily affected by a certain term. For another thing, as CBR systems are developed to facilitate decision-making by decision-makers (DMs), it is inevitable that they need to be able to reflect DMs’ personal attitudes in different situations. However, the attitude characteristics of DMs are often neglected in similarity calculation, which is illogical. Therefore, it is necessary to further explore the mechanism of optimal weight allocation for the sake of improving the quality of problem solving.

On account of the above analysis, inspired by the soft likelihood functions (SLFs) introduced by Yager et al. (Yager et al. 2017), a new case retrieval algorithm using SLFs (abbreviated as CBR-SLFs) is proposed in this study, which offers a new perspective to case retrieval. SLFs allow the ordered weighted average (OWA) aggregation to soften the strong likelihood constraint requirements of all information and, at the same time, provide weight for attitude features, allowing optimistic or pessimistic possible results. SLFs are more flexible than general algorithms, so they are called soft likelihood functions. The basic idea of case retrieval by the proposed method is as follows: Firstly, calculate the local similarity between different attributes of target and source case; then, the CBR-SLFs come up with in this paper is used to calculate the overall similarity, and some potential available source cases with high similarity are obtained; finally, the source case solution that is closest to the target case is obtained through KNN and reuses it. As a flexible method to calculate global similarity, this strategy has stronger robustness and practicability in case retrieval (Tian et al. 2020). Furthermore, SLFs-based case retrieval algorithm is developed introducing an attitudinal characteristic to reflect the subjective preference of DMs, which allows different types of DMs to make more flexible choices.

The rest part of the article is organized as shown below: Section 2 introduces likelihood function in case retrieval, some basic calculations of OWA aggregation operator, and local similarity measurement method for heterogeneous information. Section 3 introduces the application of soft likelihood function in case retrieval, then takes feature similarity into consideration, and gives some examples. Section 4 furnishes some simulated experiments on benchmark data sets. In the end, Section 5 summarizes this article and puts forward the future research direction.

2 Preliminaries

This part first presents likelihood functions in case retrieval and OWA aggregation and then introduces local similarity measurement methods for case information.

2.1 Using likelihood functions in case retrieval

In a CBR system, existing knowledge or experience needs to be represented as a case library typically includes multiple cases. Each case is generally composed of two parts, the problem description and the corresponding solution. For convenience of description, the symbol is given below.

$$\begin{aligned} C_i = \{D_i, S_i\}, i = 1,2,\ldots ,n \end{aligned}$$
(1)

\(C = \{C_1, C_2,\ldots ,C_n\}\) is n historical cases in case base, \(C_i\) represents the ith case (\(i\in \{1,2,\ldots ,n\}\)) including problem description \(D_i\) and corresponding solution \(S_i\). \(\mathcal {C*}\) is the target case, and the problem description for the target case is represented as \(\mathcal {D*}\). Suppose \(SIM_i\) represents the similarity between Ci and the target case. \(Sim_j(\mathcal {D*},D_i)\) represents the similarity of the problem description \(\mathcal {D*}\) of target case and the problem description \(D_i\) of the historical case \(C_i\) about the characteristic attribute j.

In case reasoning, our goal is to find some order of historical cases, that is, the similarity between historical and target cases, so as to support the selection of source cases with the highest similarity as candidate cases for further revision and use. In other words, the more similar the historical case is, the more willing we are to reuse the case. One way to calculate the similarity of a case is to take the product of the local similarity of different attributes.

$$\begin{aligned} SIM_i = \prod _{j=1}^qsim_{ij} \end{aligned}$$
(2)

We can see that each additional feature can only reduce the probability that the case \(C_i\) is the most optimal candidate case. If any \(sim_{ij} = 0\) for \(j = 1 \ldots q\), then \(SIM_i = 0\). For any case \(C_i\), as long as there is a low local similarity value, the overall similarity of the case \(C_i\) will be greatly reduced. This is a kind of logical “anding” for a given \(C_i\). The expression of this possibility is too strong, because it requires the premise that all the local similarity of \(C_i\) is consistent and high, so that we can think of this suspect’s historical case as similar. Therefore, this paper will consider adopting OWA to determine the candidate case similarity of the softer formula. In the following text, we set \(\lambda _i\) as the index function and \(\lambda _i(k)\) as the kth probability index of great compatibility of \(C_i\). Here, \(sim_{i\lambda _i(k)}\) is the kth largest local similarity of the case \(C_i\). We let

$$\begin{aligned} Prod_i(j) = \prod _{k=1}^jsim_{i\lambda _i(k)} \end{aligned}$$
(3)

Here, \(Prod_i(j)\) is the product of the j largest probabilities. \(Prod_i(j)\) is monotonically decreasing as a function of j. At the same time, every \(sim_{i\lambda _i(k)}\in [0,1]\), so \(Prod_i(j)\in [0,1]\). From the above equation, we find the likelihood function can now be expressed as \(SIM_i=Prod_i(q)\).

2.2 Ordered weight averaging aggregation

Below, we will consider using OWA aggregation operator to provide a category of SLFs. In order to do this, OWA needs to be briefly described.

Ordered weight averaging aggregation was first proposed by Yager (1988). An OWA aggregator operator of n dimension is a mapping: \(R^n\rightarrow R\). \(OWA_w(a_1,a_2,\ldots ,a_i,\ldots ,a_n)=\sum _{j=1}^nw_ja_{\lambda (j)}\), where \(W=(w_1, w_2,\ldots ,w_n)^T\) is the weighted vector associated with the function OWA with \(w_j\epsilon [0,1]\) and \(\sum _jw_j=1\) (\(j\epsilon \{1, 2,\ldots ,n\}\)); \(a_{\lambda (j)}\) is the jth largest element in \(a_1, a_2,\ldots ,a_n\) in order from largest to smallest. Then, we called function OWA as ordered weight averaging operator.

The characteristic of OWA is to rearrange the given data \((a_1, a_2,\ldots ,a_i,\ldots ,a_n)\) into \((a_{\lambda (1)},a_{\lambda (2)}, \ldots ,a_{\lambda (i)},\ldots ,a_{\lambda (n)})\) in order from large to small, and aggregate \((a_{\lambda (1)}, a_{\lambda (2)},\ldots ,a_{\lambda (i)}, \ldots ,a_{\lambda (n)})\) by the given weight vector. Furthermore, element \(a_i\) has nothing to do with weight \(w_j\), and weight \(w_j\) is only connected with the jth position in the assembly process, so we can also call the weighted vector W a position weighted vector.

Let’s notice some special operators (Yager 1988):

  1. 1.

    \(W^*=(1, 0,\ldots ,0)\), the OWA is reduced to the max operator, \(OWA(a_1,\ldots ,a_n)=a_{\lambda (1)}=max_i(a_i)\).

  2. 2.

    \(W_*=(0, 0,\ldots ,1)\), the OWA is reduced to the min operator, \(OWA(a_1,\ldots ,a_n)=a_{\lambda (n)}=min_i(a_i)\).

  3. 3.

    \(W_n=\left( \frac{1}{n}, \frac{1}{n},\ldots ,\frac{1}{n}\right) \), the OWA is reduced to a simple arithmetic average operator, \(OWA(a_1,\ldots ,a_n)=\frac{1}{n}\sum _{i=1}^na_i\).

  4. 4.

    \(W_{n-2}=\left( 0, \frac{1}{n-2}, \frac{1}{n-2},\ldots ,\frac{1}{n-2}, 0\right) \), the OWA is reduced to an arithmetic average operator that removes the extremum, \(OWA(a_1,\ldots ,a_n)=\frac{1}{n-2}(\sum _{i=1}^na_i-max_i(a_i)-min_i(a_i))\)

  5. 5.

    \(W_k=(0,\ldots ,1,\ldots ,0)\), \(OWA(a_1,\ldots ,a_n)=a_{\lambda (k)}\).

When \(w_j\) near the top of W allocates more weight, the total value is larger; while \(w_j\) near the bottom of W allocates more weight, the total value is smaller. Weighted vector W can reflect the tendency of the DMs to be optimistic or pessimistic, and it determines how OWA is aggregated. Now, we define attitudinal character (Yager 1996):

$$\begin{aligned} AC(W) = \sum _{j=1}^n\frac{n-j}{n-1}w_j \end{aligned}$$
(4)

\(AC(W)\in [0,1]\) and the numerical value of AC(W) determines the degree of optimism. In other words, the more optimistic the DM is, the greater the attitudinal eigenvalue is and the higher the aggregated value is.

We use a method to get OWA weights,\(w_j\). Assume a monotonic function f: \([0,1]\rightarrow [0,1]\); when \(x>y\), \(f(x)>f(y)\); \(f(0)=0\) and \(f(1)=1\). We obtain

$$\begin{aligned} w_j=f\left( \frac{j}{n}\right) -f\left( \frac{j-1}{n}\right) \end{aligned}$$
(5)

We get \(w_j\in [0,1]\) and \(\sum _{j=1}^nw_j=1\); \(w_j\) satisfies all attributes required by OWA weights Yager (1996).

This method of obtaining OWA weights is called the function method, and the function itself and cardinality n jointly determine \(w_j\) and the associated attitudinal character. Then, we define the attitudinal character (Yager 1996):

$$\begin{aligned} Opt(f) = \int _0^1f(x)dx \end{aligned}$$
(6)

When n gets really big, Opt(f) is really just AC(W).

It is easy to find out \(f(x)=x^m\) for \(m\ge 0\), and for this function,

$$\begin{aligned} \alpha = \int _0^1x^mdx=\left. \frac{x^{m+1}}{m+1}\right| _0^1=\frac{1}{m+1} \end{aligned}$$
(7)

We have \(m = \frac{1-\alpha }{\alpha }\), and \(\alpha \in [0,1]\). We can see that the larger the \(\alpha \), the more optimistic the attitude of users. \(m=1\) when \(\alpha =0.5\); \(m=0\) when \(\alpha =1\); \(m\rightarrow \infty \) when \(\alpha \rightarrow 0\).

Using the function form described above, we can get

$$\begin{aligned} w_j=f\left( \frac{j}{n}\right) -f\left( \frac{j-1}{n}\right) =\left( \frac{j}{n}\right) ^m-\left( \frac{j-1}{n}\right) ^m \end{aligned}$$
(8)

Then, \(\alpha \) once given, we can obtain

$$\begin{aligned} w_j=\left( \frac{j}{n}\right) ^{\frac{1-\alpha }{\alpha }} - \left( \frac{j-1}{n}\right) ^{\frac{1-\alpha }{\alpha }} \end{aligned}$$
(9)

Then, we will next consider using OWA to determine softer formulas for computing similarity.

2.3 Local similarity measurement methods for case information

CBR is very similar to the way humans solve problems. When a new problem is encountered, CBR retrieves and selects possible source cases from the case bases by some retrieval method (Cunningham 2008). CBR can not only give full play to the advantage of the immediacy of computer processing information, but also improve the scientific nature and effectiveness of decision-making (El-Sappagh et al. 2019). In the CBR system, whether all the follow-up work can play its due role largely hinges on the quality of the cases retrieved, so case retrieval is very critical.

The information or data in a CBR system are usually heterogeneous, and heterogeneity indicates a difference in the type and nature of information or data (Yu et al. 2017). The key link in the decision-making process is to process heterogeneous information (Yahong and Xiuli 2018; Wan et al. 2016). As case events are usually characterized by risk, complexity, and uncertainty (Nikpour and Aamodt 2021), plus the imprecision of the environment, decision information is often not always expressed as accurate numbers (Fei et al. 2021), including Boolean values, interval values, fuzzy values, and so on. Furthermore, because of the fuzziness of human mind, sometimes in the decision-making process, expressing all decision information with quantitative values is very hard, and qualitative language is also applied to describe attributes (Fei and Deng 2020; Fei and Feng 2021).

Suppose \(Sim_j(\mathcal {D*},D_i)\) represents the similarity between the target case \(\mathcal {D*}\) and the historical case \(D_i\) about the characteristic attribute j. Heterogeneous decision information contains many types of attribute information such as numerical features, Boolean features, symbolic features with orders, symbolic features without orders, string features, fuzzy features, and interval features, and its similarity is calculated as follows (Tan et al. 2020):

  • For numerical features, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned} Sim_j(\mathcal {D*},D_i) = 1 - \frac{\vert \mathcal {D*}-D_i \vert }{\max } \end{aligned}$$
    (10)
  • For Boolean features, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned} Sim_j(\mathcal {D*},D_i) = \left\{ \begin{array}{rcl} 0 &{} &{} {\mathcal {D*}\ne D_i}\\ 1 &{} &{} {\mathcal {D*} = D_i} \end{array} \right. \end{aligned}$$
    (11)
  • For symbolic features with orders, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned} Sim_j(\mathcal {D*},D_i) = 1 - \frac{\vert \mathcal {D*}-D_i \vert }{g} \end{aligned}$$
    (12)

    where g is the number of value levels.

  • For symbolic features without orders, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned} Sim_j(\mathcal {D*},D_i) = \frac{num( \mathcal {D*}\wedge D_i )}{num( \mathcal {D*}\vee D_i )} \end{aligned}$$
    (13)
  • For string features, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned} Sim_j(\mathcal {D*},D_i) = \frac{t\times l}{\max (len(\mathcal {D*}),len(D_i))} \end{aligned}$$
    (14)

    where t is the matching number, l is the matching length, and len is the string length.

  • For fuzzy features, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned} \begin{aligned}&Sim_j(\mathcal {D*},D_i) =1- \{(n_{i}-n^{'}_{i})^{2}\\&\quad +\frac{1}{9}[(m_{i}-m^{'}_{i})^{2}+(r_{i}-r^{'}_{i})^{2} -(m_{i}-m^{'}_{i})(r_{i}-r^{'}_{i})]\\&\quad -\frac{1}{2}(n_{i}-n^{'}_{i})[(m_{i}-m^{'}_{i})-(r_{i}-r^{'}_{i})]\}^{\frac{1}{2}} \end{aligned}\end{aligned}$$
    (15)

    \(\mathcal {D*}\),\(D_i\) are triangular fuzzy number, \(\mathcal {D*} =(n_{i},m_{i},r_{i}), D_i = (n^{'}_{i},m^{'}_{i},r^{'}_{i})\)

  • For interval features, the similarity between \(\mathcal {D*}\) and \(D_i\) can be obtained as

    $$\begin{aligned}&Sim_j(\mathcal {D*},D_i)\nonumber \\&= \frac{len(\mathcal {D*}\bigcap D_i)}{len(\mathcal {D*})+len(D_i)-len(\mathcal {D*}\bigcap D_i)} \end{aligned}$$
    (16)

    where len is the interval length and \(\mathcal {D*}\bigcap D_i\) is the overlapping interval.

3 Case retrieval strategy

We first give a global similarity calculation method-based soft likelihood function that integrates the similarity of each attribute, and then, considering the feature similarity, we give a SLFs case retrieval algorithm combining the feature similarity. Our retrieval strategy is to combine case retrieval algorithm based on SLFs with KNN, thus improving the performance of case retrieval.

3.1 Case retrieval method based on SLFs

In the previous section, we have obtained local attribute similarity between target case and historical cases under a variety of heterogeneous information environments. The global similarity is then calculated to retrieve the historical cases that are most similar to the target case from the case base. We apply SLFs based on OWA to case retrieval process and propose an original global similarity calculation method to improve the previous case retrieval strategy.

Let’s consider using SLFs-based OWA as a retrieval strategy for CBR. For each source case \(C_i\) that we denote global similarity as \(SIM_{i,W}\), we use W and \(Prod_i(j)\) to calculate it. Here, W is the weighting vector, \(W = \{w_{1},\ldots ,w_{q}\}\), \(w_j\in [0,1]\), \(\sum _{j=1}^nw_j=1\). We define

$$\begin{aligned} SIM_{i,W} = \sum _{j=1}^qw_jProd_i(j) \end{aligned}$$
(17)

It has been pointed out above that \(Prod_i(j) = \prod _{k=1}^jsim_{i\lambda _i(k)}\). Here, \(\lambda _i\) is index function hence \(\lambda _i(k)\) is an index of the local similarity of attribute with the kth largest probability of compatibility of case \(C_i\).

For each \(C_i\), \(Prod_i(j) = Prod_i(j-1)sim_{i\lambda _i(k)}\), as \(sim_{i\lambda _i(k)}\le 1\), so \(Prod_i(j)\) is decreasing in j. Therefore, the \(Prod_i(j)\) using W based on OWA aggregation is

$$\begin{aligned} SIM_{i,W}= & {} \sum _{j=1}^qw_jProd_i(j) \nonumber \\= & {} OWA_W\{Prod_i(1),\ldots ,Prod_i(q)\} \end{aligned}$$
(18)

We can see that the SLFs are determined by weighting vector W which is only related to the location. For some of the special weighting vector,

(1):\(W^{*} = \{w_{1}=1,w_{j}=0|j=2,\ldots ,q\}\)

$$\begin{aligned} SIM_{i,W^{*}} =Prod_i(1)=sim_{i\lambda _i(1)} \end{aligned}$$
(19)

This is the maximum possible value, which is equal to the maximum probability in the property \(C_i\).

(2):\(W_{*} = \{w_{q}=1,w_{j}=0|j=1,\ldots ,q-1\}\)

$$\begin{aligned} SIM_{i,W_{*}} =Prod_i(q)=\prod _{j=1}^qsim_{ij} \end{aligned}$$
(20)

This is the form of a strong likelihood function that requires all properties of \({D_j}\) to be compatible with the target case \(C_i\).

(3):\(W_{n} = \{w_{j}=\frac{1}{q}|j=1,\ldots ,q\}\)

$$\begin{aligned} SIM_{i,W_{n}} =\frac{1}{q}\sum _{j=1}^qProd_i(j)=\frac{1}{q}\sum _{j=1}^q\left( \prod _{k=1}^jsim_{i\lambda _i(k)}\right) \end{aligned}$$
(21)

This is the simple average.

(4):\(W_{n} = \{w_{1}=0,w_{j}=0,w_{j}=\frac{1}{q-2}|j=2,\ldots ,q-1\}\)

$$\begin{aligned}&SIM_{i,W_{n}} =\frac{1}{q-2}\left( \sum _{j=1}^qProd_i(j)-Prod_i(1)-Prod_i(q)\right) \nonumber \\&\quad =\frac{1}{q-2}\left( \sum _{j=1}^q\left( \prod _{k=1}^jsim_{i\lambda _i(k)}\right) -sim_{i\lambda _i(1)}-\prod _{j=1}^qsim_{ij}\right) \end{aligned}$$
(22)

This is an arithmetic mean minus the extreme value.

DMs who are more optimistic about the likelihood will assign more weight to \(w_j\) that has a smaller index; DMs who are more pessimistic about the likelihood will assign more weight to \(w_j\) that has a larger index. Because \(SIM_{i,W}\) is depending on W, we discover that the likelihood functions rest with \(\alpha \) which can impact weighting vector. If the user is more positive, then the \(\alpha \) is near to 1 and \(SIM_{i,W_{N}}\) is larger; if the user is more negative, then the \(\alpha \) is closer to 0 and \(SIM_{i,W_{N}}\) is smaller.

This has been discussed above that \(w_j = f\left( \frac{j}{q}\right) -f\left( \frac{j-1}{q}\right) \) and \(f(x) = x^m\). In addition, we use \(m = \frac{1-\alpha }{\alpha }\) to show the desired degree of optimum \(\alpha \). As a result, we can express users’ attitude by a softer likelihood function which is more in line with the reality. We can get

$$\begin{aligned} SIM_{i,\alpha } =\sum _{j=1}^q\left( \left[ \left( \frac{j}{q}\right) ^{\frac{1-\alpha }{\alpha }}-\left( \frac{j-1}{q}\right) ^{\frac{1-\alpha }{\alpha }}\right] \prod _{k=1}^jsim_{i\lambda _i(k)}\right) . \end{aligned}$$
(23)

Because of the physiological and cognitive limitations of the DMs, he is bounded rational in reality (Simon 1955). DMs’ reasoning is not only influenced by the information of historical cases, but also implies their personal wisdom, emotion, attitude, cognition, etc. Psychological characteristics make a difference to decision-making process of DMs (Mi et al. 2021). Therefore, attitude characteristics take a significant role in CBR, so it is necessary to keep a watchful eye on DMs’ attitude characteristics in case retrieval. On the one hand, the use of attitude characteristics is subjective and highly dependent on users. An optimistic decision-maker and a pessimistic decision-maker tend to make different judgments about the same issue. On the other hand, if description of the target case is accurate and the calculation of similarity is accurate, an optimistic attitude should be adopted. If there is reason to doubt the accuracy of the similarity between cases, a pessimistic attitude should be adopted. Therefore, the attitude characteristics of users can be considered as finding a balance between risks and benefits.

Next, we give an example to illustrate our case retrieval algorithm.

Example 1

Let’s have \(q=6\) primary attributes. Local similarity with the 6 attributes between source case and target case is: \(C = \{sim_{i1}=0.7, sim_{i2}=0.4, sim_{i3}=0.9, sim_{i4}=1, sim_{i5}=0.5, sim_{i6}=0.8\}\). We can get \(\lambda _i(1)=4, \lambda _i(2)=3, \lambda _i(3)=6, \lambda _i(4)=1, \lambda _i(5)=5, \lambda _i(6)=2\). Then, we can compute \(Prod_i(j)=\prod _{k=1}^jsim_{i\lambda _i(k)}\) and these results are given in Table 1.

Table 1 Probability products

The value of \(\alpha \) is different for different users, and we can calculate some typical \(SIM_{i,\alpha }\). For \(q=6\), \(w_j=\left( \frac{j}{6}\right) ^{\frac{1-\alpha }{\alpha }}-\left( \frac{j-1}{6}\right) ^{\frac{1-\alpha }{\alpha }}\) and \( SIM_{i,\alpha } = \sum _{j=1}^6w_jProd_i(j)\).

(1) For an optimistic attitude, \(\alpha = 0.8\): \(m=\frac{1-\alpha }{\alpha }=0.25\) and \(w_j=\left( \frac{j}{6}\right) ^{0.25}-\left( \frac{j-1}{6}\right) ^{0.25}\). The results are given in Table 2.

Table 2 The numerical example of \(\alpha = 0.8\)
Table 3 The numerical example of \(\alpha = 0.2\)

So \(SIM_{i,\alpha } = 0.8553\) when \(\alpha = 0.8\).

(2) For a neutral attitude, \(\alpha = 0.5\): \(m=\frac{1-\alpha }{\alpha }=1\) and \(w_j=\left( \frac{j}{6}\right) -\left( \frac{j-1}{6}\right) =\frac{1}{6}\). We can get: \(SIM_{i,\alpha } =\frac{1}{6}\sum _{j=1}^6Prod_i(j)=\frac{1}{6}(1+0.9+0.72+0.504+0.252+0.1008)=0.579\). So \(SIM_{i,\alpha } = 0.579\) when \(\alpha = 0.5\).

(3) For a pessimistic attitude, \(\alpha = 0.2\): \(m=\frac{1-\alpha }{\alpha }=4\) and \(w_j=\left( \frac{j}{6}\right) ^{4}-\left( \frac{j-1}{6}\right) ^{4}\). The results are given in Table 3. So \(SIM_{i,\alpha } = 0.2393\) when \(\alpha = 0.8\).

We can find from these examples that as \(\alpha \) increases, so does \(SIM_{i,\alpha }\). We see that the order of \(C_i\) basically depends on the order of \(sim_{ij}\).

3.2 SLFs case retrieval algorithm combined with feature similarity

When CBR is carried out, the attributes of target cases and source cases are not necessarily identical (Li et al. 2006), that is, we need to consider the feature similarity (McSherry 2011). To solve the global similarity, both local similarity and feature similarity should be taken into consideration. In case retrieval, feature similarity is represented by different reliability of each attribute. Therefore, the reliability of each attribute should be taken into consideration in the case retrieval algorithm of SLFs.

The reliability of each attribute is represented by \(R_{ij}=\{r_{i1},r_{i2},\ldots r_{iq}\}\), \(R_{ij}\epsilon [0,1]\), and \(r_{ij}(j\epsilon 1,2, \ldots ,q)\) represents the reliability of attribute j of the historical case i. In a case search, the reliability of each attribute does not change. So in this case, the value of \(r_{ij}\) depends only on j, not on i. Next, we give a description of SLFs case retrieval algorithm considering reliability (Yager et al. 2017).

Table 4 Probability reliability
Table 5 Probability products

The total reliability is \(R_i = \sum _{j=1}^qR_{ij}\), and then we use this to obtain the normalized reliability \(r_{ij} = \frac{R{ij}}{R_i}\). Obviously, \(\sum _{j=1}^qr_{ij} = 1\).

We need to consider the products of the probability and the normalized reliability associated with target case \(C_i\). We define an index function \(\sigma _i\) and \(\sigma _i(k)\) is the kth largest index of these products. \(sim_{i\sigma _i(k)}\times r_{i\sigma _i(k)}\) is the kth largest of the \(sim\times r\), where \(sim_{i\sigma _i(k)}\) is the probability corresponding to the kth largest of the \(sim\times r\) products and \(r_{i\sigma _i(k)}\) is its associated reliability.

The order of local similarity for a certain \(C_i\) is depending on the product of compatible probability of the local similarity of each attribute and the reliability of each attribute. Either a small compatible probability or a small reliability can lead to a lower ordering. If reliability of all the attributes is identical, then index \(\sigma _i(k)\) depends only on the probabilities. We have

$$\begin{aligned} Prod_i(j)=\prod _{k=1}^jsim_{i\sigma _i(k)} \end{aligned}$$
(24)

where \(Prod_i(j)\) is the product of the first j ordered probabilities and \(\sigma _i\) induces the order.

$$\begin{aligned} N_{ij}= \sum _{k=1}^jr_{i\sigma _i(k)} \end{aligned}$$
(25)

where \(N_{ij}\) is the sum of the normalized reliability associated with the j largest \(sim\times r\) products for the target case \(C_i\).

We define f(x) as the weight generating function, then for \(j=1\ldots q\) we calculate the OWA weights:

$$\begin{aligned} w_{ij}=f(S_{ij})-f(S_{i(j-i)}) \end{aligned}$$
(26)

Then, the soft likelihood function of the target case \(C_i\) considering reliability is

$$\begin{aligned} SIM_{i,f} = \sum _{j=1}^qw_{ij}Prod_i(j) \end{aligned}$$
(27)

If the reliability of \(r_{i\sigma _i(k)}\) is 0, \(S_{ij} = S_{i(j-1)}\) and \(w_{ij} = S_{ij}-S_{i(j-1)} = 0\). If all the reliabilities are \(r_{ij} = \frac{1}{q}\), \(S_{ij} = \frac{j}{q}\) and \(w_{ij} = f\left( \frac{j}{q}\right) -f\left( \frac{j-1}{q}\right) \). This is the same situation as not considering reliability.

When \(f(x) = x^m\) and \(m = \frac{1-\alpha }{\alpha }\), we get \(f(x) = x^{\frac{1-\alpha }{\alpha }}\) and the weight is

$$\begin{aligned} w_{ij}=S_{ij}^{\frac{1-\alpha }{\alpha }} - S_{i(j-1)}^{\frac{1-\alpha }{\alpha }} \end{aligned}$$
(28)

Next, we give an example to illustrate our case retrieval algorithm.

Example 2

Let’s have \(q=6\) primary attributes. Local similarity with the 6 attributes between source case and target case is (the same as Example 1): \(C = \{sim_{i1}=0.7, sim_{i2}=0.4, sim_{i3}=0.9, sim_{i4}=1, sim_{i5}=0.5, sim_{i6}=0.8\}\). The associated non-normalized evidence reliability is: \(R = \{R_{i1}=1, R_{i2}=0.7, R_{i3}=0.4, R_{i4}=0.5, R_{i5}=0.9, R_{i6}=0.6\}\). The normalized reliability is: \(r_{ij} = \frac{R_{ij}}{\sum _{k=1}^qR_{ik}} = \frac{R_{ij}}{4.1}\)

We calculate the probability reliability products, as given in Table 4.

Then, the index function \(\sigma _i(k)\) is: \(\{\sigma _i(1)=1, \sigma _i(2)=4, \sigma _i(3)=6, \sigma _i(4)=5, \sigma _i(5)=3, \sigma _i(6)=2\}\).

We can calculate \(Prod_i(j)=\prod _{k=1}^jsim_{i\sigma _i(k)} = Prod_i(j-1)sim_{i\sigma _i(j)}\) as shown in Table 5.

We can use \(N_{ij} = \sum _{k=1}^jr_{i\sigma _i(k)} = N_i(j-1)+r_{i\sigma _i(j)}\) and calculate the normalized reliability based on the index \(\sigma _i\) as shown in Table 6.

Table 6 Sum of normalized probabilities

For different \(\alpha \), we can use \( SIM_{i,\alpha } = \sum _{j=1}^qw_{ij}Prod_i(j)\) to calculate the \(SIM_{i,\alpha }\) with different reliabilities associated with the attribute and \(w_{ij}=S_{ij}^{\frac{1-\alpha }{\alpha }} - S_{i(j-1)}^{\frac{1-\alpha }{\alpha }}\). Now, we calculate some typical \(SIM_{i,\alpha }\).

(1) For an optimistic attitude, \(\alpha = 0.8\): \(m=\frac{1-\alpha }{\alpha }=0.25\). We can get Table 7. So \(SIM_{i,\alpha } = 0.617\) when \(\alpha = 0.8\).

Table 7 Numerical example of \(\alpha = 0.8\)

(2) For a neutral attitude, \(\alpha = 0.5\): \(m=\frac{1-\alpha }{\alpha }=1\). We can get Table 8. So \(SIM_{i,\alpha } = 0.441\) when \(\alpha = 0.5\).

Table 8 Numerical example of \(\alpha = 0.5\)

(3) For a pessimistic attitude, \(\alpha = 0.2\): \(m=\frac{1-\alpha }{\alpha }=4\). We can get Table 9. So \(SIM_{i,\alpha } = 0.202\) when \(\alpha = 0.2\).

Table 9 Numerical example of \(\alpha = 0.2\)

It can be clearly observed from Table 10 that soft likelihood value increases with the increase in attitude value \(\alpha \).

Table 10 As \(\alpha \) increases

The function representing the attitude characteristic \(\alpha \) of the DMs is \(\alpha =\int _0^1f(y)dy\). The closer \(\alpha \) is to 1, the more optimistic he/she is; the closer \(\alpha \) is to 0, the more pessimistic he/she is, whereas \(\alpha =0.5\) for more general behavior.

Our retrieval strategy is to combine the case retrieval algorithm based on SLFs developed above with KNN, replacing the traditional KNN strategy combined with the ordinary mean algorithm or the weight average method, thereby improving the accuracy of case retrieval.

4 Experimental verification

In this section, the proposed algorithm is simulated experimentally to evaluate the effectiveness of this case retrieval method. We selected 10 classification data sets from UCI resource base for classification experiment. The UCI database is a machine learning database proposed by the University of California Irvine, which has a lot of real data and is a common standard test data set (Arthur and David 2007). Table 11 shows the abbreviations of names, sample size, class number, attribute number, and other information of each data set. Detailed descriptions of each data set are omitted here.

Table 11 General information of used data sets

This study dedicates to develop a case retrieval algorithm and applies the proposed CBR-SLFs method to KNN to obtain a new CBR retrieval strategy. For making a fair and detailed comparison, it is able to contrast its performance with traditional retrieval strategies. At present, retrieval strategies generally use an average-based method.

The experimental process is as follows. We use a tenfold cross-validation method to divide data set into training set and test set. We use the training set as a case base and every case in the test set as a target case. Based on the case base, we use different case retrieval strategies to calculate solutions for each target case. If calculated result is consistent with the original corresponding solution, we consider that it is effective, otherwise invalid. We use the ratio of the number of cases with valid solutions to the number of elements in the test set to indicate the effectiveness of each retrieval strategy. For each of the test data sets, the above procedure is repeated 100 times and the simple average is recorded.

For the purpose of verifying the effect of case retrieval strategy of CBR-SLFs proposed in this paper on CBR classification accuracy, the following five case retrieval algorithms were used for comparative experiments:

(1)The KNN retrieval strategy based on mean operator is used to investigate the performance of case retrieval, denoted as KNN-Mean;

(2)The KNN retrieval strategy based on trim mean operator is used to investigate the performance of case retrieval, denoted as KNN-Trim;

(3)The KNN retrieval strategy based on weighted average operator is used to investigate the performance of case retrieval, denoted as KNN-Weight;

(4)The KNN retrieval strategy based on SLFs operator proposed in this paper is used to investigate the performance of case retrieval, denoted as KNN-SLFs;

(5)The KNN retrieval strategy based on SLFs operator considering attribute reliability proposed in this paper is used to investigate the performance of case retrieval, denoted as KNN-RESLFs.

Since the reliability of the attribute is not provided in the data set, we use a random method to generate the reliability of the attribute.

For the KNN, we study the case of k values between 5 and 20. As can be seen from Fig.2, the accuracy of retrieval results with different K values fluctuates slightly, but is basically flat, indicating that the retrieval strategy is insensitive to K. In comparison test, take \(k=11\).

Fig. 2
figure 2

Performance of the retrieval strategy with KNN-RESLF algorithm under different K

The SLFs involve the DMs’ attitude parameter \(\alpha \). Figure3 shows the influence of the value of \(\alpha \) from \(0\ldots 1\), that is, the DMs’ attitude from negative to positive, on the correctness of the retrieval strategy. It can be seen that the selection of parameters and different data set types will have impact on the retrieval effect, and the value of \(\alpha \) needs to be obtained on the basis of the characteristics of the actual decision-maker and the field in which the case is located. In the comparison test, take the DMs’ attitude as neutral, i.e., \(\alpha =0.5\).

Fig. 3
figure 3

Performance of the retrieval strategy with KNN-RESLF algorithm under different \(\alpha \)

Table 12 Performance of CBR with different retrieval strategies

Table 12 shows the accuracy of these five retrieval strategies in each data set. To more clearly compare the performance of each case retrieval strategy, the average accuracy of each retrieval strategy across all data sets is listed separately to make the results more clear and intuitive. As can be seen from Table 12:

  • (1)In all data sets, the retrieval strategy trim mean-based algorithm is almost the worst;

  • (2)The retrieval strategies of KNN-SLFs and KNN-RESLFs are better than other retrieval strategies;

  • (3)The ranking of average retrieval efficiency based on all data sets can be obtained by various retrieval strategies: KNN-RESLFs \(\approx \) KNN-SLFs>KNN-Weight>KNN-Mean>KNN-Trim.

The above analysis can illustrate the superiority of our retrieval strategy suggested in this paper. In the experiment, the performance of the retrieval strategy of KNN-SLFs is very similar to that of KNN-RESLFs. But in practical application, the reliability degree of each attribute is not random, but according to the importance of the attribute itself or given by experts. The accuracy of KNN retrieval strategy based on SLFs operator considering attribute reliability may be higher in practical application.

5 Conclusion

We introduce the SLFs based on OWA operator into CBR and propose a retrieval strategy based on CBR process. It can reduce the interference of small probability events and consider the attitudinal characteristics of DMs, which is more with the actual decision-making process. We mainly present a method to define global similarity for retrieving the most similar case to target case. Global similarity includes local similarity and feature similarity. Similarity between variables under feature type is represented by local similarity, and the similarity between features is represented by feature similarity. CBR-SLFs are used to aggregate local similarity and feature similarity to obtain the global similarity between the cases. Experimental results on real data sets show that the retrieval strategy proposed by us is superior to the traditional KNN method.

However, this paper also has some limitations: the method of this study is only put forward from the theoretical level and lacks practical application. Moreover, in the experimental verification of this paper, the reliability degree of attributes is generated by random method, which is very brief. In practice, this step is usually completed by decision-makers or experts.

In the future research, the CBR-SLFs retrieval strategy will be further improved. Firstly, the theoretical and experimental studies on the relevant parameters of the algorithm can be further improved to improve the adaptability and reliability of the method. Secondly, limited kinds of attribute types were included in this study. Given various data types may exist in the actual CBR process, further research can explore richer feature types. Next, the attributes of a case are not completely unrelated. We can combine the characteristics of specific research problems to study the interaction between attributes. And in the future, CBR can be applied to solve complicated problems in practice, for instance, disease diagnosis, image recognition, and so on.