1 Introduction

Data mining is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayyad et al., 1996). Many data mining systems discover the patterns and models which are descriptive or predictive. Usually, domain experts have to do a considerable amount of extra work, for these patterns possibly lead to suitable decision-making in the corresponding domain (Cao, 2012; Adomavicius & Tuzhilin, 1997). Furthermore, most of data mining systems do not consider respective domain needs and constraints to guide their mining process, so often mined patterns are not interesting to domain experts even though they are interesting to technical experts (Cao, 2010; Zhu et al., 2008). In other words, the systems suffer from two major drawbacks, namely, quantity and quality, the former states that the generated patterns are abundant while the latter indicates that they cannot be integrated seamlessly into the business domain (Paul et al., 2014). Thus, mined patterns would not be applicable in the corresponding domain. It is exactly what leads data mining systems to not being used as much as expected in real-world problems. Actionable Knowledge Discovery (AKD) is a paradigm shift from data-driven data mining to domain-driven data mining and aims to discover the knowledge which not only is of technical significance, but also satisfies business expectations, and further can be immediately applied to an operation in the corresponding domain (Cao, 2012; Alharbi & Saraee, 2016). Up to now, many studies have been done in the AKD area. However, due to the difficulty in finding comparable methods, most of the researchers did not consider some of the competing methods to evaluate their proposed methods. The problem is because of the lack of clear boundaries and definitions in the area. In other words, the AKD area lacks exact definitions for input data, actionability, expectations and so forth making it difficult to choose the similar methods which are comparable with each other. In this paper, in order to clarify AKD boundaries to make ease of detecting comparable AKD approaches, some generic viewpoints are specified. In addition, a comprehensive study on different AKD approaches in several dimensions is established, and existing methods are categorized; the methods which belong to a category are comparable to each other. To the best of our knowledge, there is no generic comprehensive study encompassing a wide range of methods as well as viewpoints to clarify comparable methods. He et al. (2005) partitioned some AKD approaches based on the data mining method which they used. It is clear that the internal data mining method by which an AKD system works cannot be useful for detecting similar methods to compare their results.

In this paper, AKD methods are categorized from the various viewpoints, such as Input Data, Extracted actions by AKD methods, Integration with mining process, Generality of actions, Level of domain dependency and also similarities and differences between various approaches are specified. The paper is concluded with a characterization table which summarizes the study of AKD methods and states the strengths and weaknesses of them.

The rest of this paper is organized as follows: Section 2 defines AKD. Section 3 contains the terminology and definitions needed for later sections. Section 4 introduces viewpoints and their importance for characterization. Section 5 summarizes evaluation measures used in AKD methods. Section 6 presents domains of AKD methods. Section 7 presents a review of AKD methods. Section 8 proposes a characterization table that can be used to study actionable knowledge algorithms and systems. Section 9 concludes the paper and presents future works.

2 Actionable knowledge discovery

Mined patterns often cannot support real user needs in decision making because the corresponding methods rarely consider business needs, constraints, and interestingness in the mining process. Thus, there is a big gap between the values of technical and business measures for a pattern. Actionable Knowledge Discovery (AKD) aims to bridging the gap, and to support business decision making (Cao et al., 2009). Particularly, AKD is concerned with discovering the knowledge that is not only interesting but also applicable, preferably with the minimal further effort required of domain experts. AKD approaches usually focus on some of the following aspects:

  • Defining interestingness metrics

  • Involving domain and prior knowledge in the searching process of actionable patterns

  • Converting and extracting actions

  • Developing new theoretical frameworks catering for and measuring the actionability of extracted patterns

AKD methods can be categorized based on type of discovered knowledge. It relates to whether a standard or novel type of knowledge is returned. Silberschatz and Tuzhilin (1995) defined a measure of interest called actionability, and filtered patterns resulting from traditional methods based on this measure. Piatetsky-Shapiro and Matheus (1994) defined actionability of a key finding (pattern) regarding the estimated benefits in taking corrective act(s) that restores the deviation back to its norm. Although actionability could be considered a standard requirement in data mining (it is roughly the same as “usefulness”, and no one wants to find useless patterns), most traditional systems do not explicitly maximize some well-defined actionability criterion. Moreover, the patterns they return do not explicitly express meaningful decision-making acts; further inference is needed to identify those. Other methods try to learn a new type of knowledge, often called actions, from data. We here focus on the second type.

3 Terminology

As terminology and definitions in the literature vary somewhat, here, we formally define concepts as they are used in this paper.

Assume O states object space and S(o) denotes the status of an object oO or a set of objects \(o \subseteq O\) (which I briefly refer as object/s). As an example, the status S(o) can be class label/s of the object/s o. Consider sg as a desirable status. An action is a type of knowledge that specifies for users exactly what they do. In other words, an action suggests some acts to the user. If Γo states an action for object/s o and S(oΓ) denotes the object/s status after applying the action, then S(oΓ) = sg. If α is an act, the action Γo is defined as follows:

$$ {\varGamma}_{o}: \{\alpha_{1}, \alpha_{2}, ..., \alpha_{k}\} \qquad \text{ if } k \geq 1 \text{ and } S(o{\varGamma}) = s_{g} $$
(1)

4 Viewpoints

As mentioned above, one of the most important problems of AKD area is the lack of clear boundaries that makes it hard to compare AKD methods. In order to solve this problem, at first, the boundaries are defined and then, the position of each method with respect to the boundaries is found. Here, some viewpoints are presented that serve as boundaries and are used to categorize AKD methods. The viewpoints are Data, Extracted actions by AKD methods, Integration with common data mining, Generality of actions and Level of domain dependency that are explained in details as follows.

4.1 Data

Here, different input data used in AKD methods are examined.

4.1.1 Types of data

Since different methods in this area use various data types, the methods cannot be categorized without taking into account their input data type. In this section, different types of input data used in AKD methods are partitioned as follows:

  • Record Data (RD): This data type is a collection of records; each consists of a set of data attributes. For the most basic form of RD, every record has the same set of attributes. For example, consider a customer information database which is a collection of customer’s records, and each contains features such as age, sex, deposit, and so on. Transactional data is a specific type of RD. RD may be object-based, contains multiple observations for each object, or temporally-based in which each instance is attached to a timestamp. Most of the methods (Tsay & Gurdal, 2013; Raś & Dardzińska, 2006; Raś et al., 2007) assume attributes are of two types: condition attributes and decision attributes. Decision attributes are attributes that profit in the domain is related to them directly. Condition attributes are attributes that can influence decision attributes in some ways.

  • Cost Data (CTD): applying each action in a domain needs to pay some cost. Cost data consists of costs of acts which are corresponded to an action (Yang & Cheng, 2002; Kalanat et al., 2011a, 2011b). Typically, cost data is denoted by cost matrixes (Yang et al., 2006) and cost functions (Kalanat & Khanjari, 2020b; Cui et al., 2015). For an example, consider cost matrix C for each feature f, in which Ci,jR is the cost of the act of changing from the ith partition value of the feature f to the jth partition value.

  • State-Action Data (SAD): This data consists of state-action sequences of objects. The set of attribute-value of an object at any instant of time is referred to its state, and the state may change as a result of an action. Each object is mapped to only one state while some objects might be in the same state. Thus, this data type consists of responses of objects to past actions (Yang & Cheng, 2003a, b). Table 1.b shows state-action sequences for some objects. For example, the state-action sequence of Jure is S0,A1,S1 which means sending an email would change Jure’s state from S0 to S1.

    Table 1 An example of state-action data
  • Influence Data (ID): This data shows the link between actions and changes of attribute values and also shows correlations between some attributes. In order to model this data, influence matrix (Wang et al., 2006; Tzacheva & Ras, 2010; Kuang & Raś, 2016; Touati et al., 2015) or ontology (Tzacheva et al., 2013) can be used. For example, Table 2. shows action A1 changes attribute value of feature F1 to be in range of [2-5] and also changes attribute value of feature F2 from e to b.

    Table 2 An example of influence data

4.1.2 Data quality

The information about objects in a database can be incomplete and contains missing, noisy, or multi-values. Assuming the data is complete may lead to the unreal estimation which may result in some of non-applicable actions as well as losing some of the possible actions. Tsay et al. introduced this problem in Tsay and Raś (2006) and presented a method, DEAR3, which handled data with missing attribute values and uncertain attribute values and pruned outliers at its earlier stage.

4.1.3 Distributed data

Most AKD approaches assume that the data can be provided from a single source. If data is provided from different physically distributed locations, these methods require a data center which gathers data from distributed locations. Sometimes, transmitting large amounts of data to a data center is expensive and even impractical. Therefore, distributed and parallel AKD methods have been proposed to solve this problem. These methods combine actions identified in different datasets. Ras et al. have introduced distributed data problem in AKD (Raś & Gupta, 2002). They have shown that it is wise to search for actions at remote sites when those extracted at the client site cannot be implemented in practice (suggested actions are too expensive or too risky). They have defined in Raś and Dardzińska (2002) the composition of two actions, not necessarily extracted from the same site. Authors have given assumptions guaranteeing the correctness of such a composition. One of these assumptions requires that semantics of attributes, including the interpretation of null values, have to be the same at both sites. They have relaxed this assumption in Raś and Tzacheva (2005) and allowed different granularities of the same attribute at involved sites.

4.2 Extracted actions by AKD’s methods

Extracted actions by AKD methods can be partitioned based on the knowledge presented by the actions. Based on this viewpoint presented methods are partitioned into three following categories: Action Discovering Methods (ADM), Action Selection Methods (ASM) and Hybrid Methods (HM).

4.2.1 Action Discovering Methods (ADM)

These methods try to extract actions from RD. In most of the methods, attributes can be categorized into two following categories based on the attribute flexibility (being changeable or not):

  • Stable attributes whose values cannot change at sensible cost like “sex”. (Some authors call them hard attribute (Yang et al., 2003; Ling et al., 2002)).

  • Flexible attributes which their values can change at reasonable cost e.g. “service level”. This change can be influenced and controlled by the user (Some authors call them soft attribute (Yang et al., 2003; Ling et al., 2002)).

  • Asymmetric attributes which changing some of their values is sensible and others is not, i.e. “experience level” that it can be changed from low to high by spending some money and time but it can’t be changed from high to low at a sensible cost (Kalanat et al., 2011a, 2011b). Some attributes such as age or height are a function of time, and undergo deterministic changes (Dardzinska, 2012).

An action discovered by the ADM method states how the value of some attributes should be changed in order to move an object or group of objects from an undesirable state to a more desirable one. Assume F denotes a feature, and f states a value of its domain Dom(F). In this case, the act α in (1) is changing the value of attribute F from f to \(f^{\prime } (\alpha : (F, f \rightarrow f^{\prime }))\).

Some of the ADM methods (Hajja et al., 2014; Hajja & Ras, 2015) deliver action as the rule which is applicable to a group of objects. The rule is called action rule (Ras & Wieczorkowska, 2000). Assume Fd states a decision feature which h is its desired value. Consider β denotes condition feature F have the value of f(β : (F,f)). Then action rule Γo is defined as follow:

$$ {\varGamma}_{o}: \{\alpha_{1}, \alpha_{2}, ..., \alpha_{n}, \beta_{1}, \beta_{2}, ..., \beta_{m}\} \Longrightarrow (F_{d} \rightarrow H) \qquad \text{if } n \geq 1, m\geq 0 $$
(2)

For example, in bank loan system an action rule could be like this: “If we can change marital status of male customers from single to married in some way then the probability of they pay back their loan will be higher”.

4.2.2 Action Selection Methods (ASM)

Unlike Action Mining Methods which extract actions from RD, Action Selection Methods extract actions by use of meta actions. Meta action is a type of action which triggers the value changes in some flexible attributes. For instance, the temperature of a patient can be lowered if he takes a drug used for that purpose. Taking aspirin is an example of a Meta action which triggers such a change. The associations Meta actions and changes of attribute values they trigger are represented by ID and specified by domain experts. Clearly, experts do not know the correlations between the attributes and status of objects.

These methods can be further partitioned into two following subgroups based on considering the sequence of Meta actions or not.

  • Meta Action Mining(MAM): Meta action mining methods try to extract correlations between attributes and status of objects and Meta actions from RD and ID. These methods are based on the idea that if a change in some attributes of object/s leads to transfer it/them to desirable status, if some Meta actions can influence those attributes, this influence will cascade to transfer them into desirable status through the correlation. Meta action mining methods (Wang et al., 2006; Jiang et al., 2005; Tarnowska et al., 2019; Kuang et al., 2015) present the action that summarize those Meta actions and objects where such cascaded influences transfer it/them to desirable status. In this case, in (1), the act α is a Meta action.

  • Plan Mining(PM): Unlike Meta action mining methods that do not consider the order of applying Meta actions on object/s, these methods present sequences of Meta actions (Kuang & Raś, 2016; Yang & Cheng, 2003a, 2003b). Each Meta action in the sequence changes the state of object/s so that after applying all of its Meta actions, corresponding object/s be converted to a desirable state. This sequence of Meta actions called plan. In this case, the act α is a Meta action and Γo is an ordered set. Plan mining methods extract plans from State-Action Data. For example, the method proposed in Pednault et al. (2002) is a PM method which uses reinforcement learning for extracting actions. It considers a customer, with all her past history of purchases and promotions, is in a certain state at any given point in time. When a retailer takes an action, the customer then makes a probabilistic transition to another state, possibly generating a reward. This process continues throughout the life of the customer’s relationship with the retailer. The reward at each state transition is the net profit to the retailer. It takes into account both the purchases by the customer in response to the retailer’s action and the cost of that action. The reward can thus be negative if the customer makes no purchases, which represents a net loss. Application of reinforcement learning to this problem maximize the net present value of profits and losses over the life cycle of a customer.

4.2.3 Hybrid Methods (HM)

Few methods are the combination of both groups or have some aspects of each of them. For example, Tzacheva et al. in (2010) have presented a method that first discovers valid action rules from RD and then applies these action rules on objects that are related to each of these rules. Applying extracted action rules to mentioned objects leads to change the status of objects. The method applies these changes to RD. Candidate action rules can be extracted from the new version of RD, and the process will be continued until the fixed point is reached (RD will not be changed). Finally, the sequence of actions without the use of preexisting State-Action Database will be found.

4.3 Integration with mining process

AKD methods are partitioned into two below groups based on the extent of integration with the mining process.

  • Post-processing (PP): In this group, at first common data mining techniques are done on data, then post-processing techniques are done on the result, and finally, the action would be extracted (Raś et al., 2007; Yang et al., 2003; Ling et al., 2002; Kalanat et al., 2011a). For example, Ras and Wieczorkowska (2000) proposed a method to produce candidate action rules by merging classification rules. Yang et al. (2006) presented a method to extract optimal action from the decision tree.

  • Integrated processing (IP): Unlike PP, in this category, real-world factors are integrated into the mining process from the beginning (He et al., 2005; Im and Raś, 2008; Raś & Dardzińska, 2008; Tsay et al., 2008). For example, the proposed method in Ras et al. (2008) considers each item set as an action set of length one. It generates candidate action sets of length k by merging action sets of length k-1. Then it prunes the candidates which have an infrequent sub action set.

As mentioned post-processing approaches work on the output of common data mining techniques. As a result, the quality and quantity of the discovered actions by these approaches strictly depend on the adopted data mining techniques. Besides, used data mining process may ignore some information so PP approaches may fail to discover all actions or do not have good precision to discover them.

4.4 Generality of actions

Since some AKD methods are aimed to find an action for each different object and others to find a single action for a group of object, AKD methods can be categorized into two following groups:

  • Transductive Methods (TM): These methods aim to find optimal action for a given object such that the best decision is taken. This is an expensive process that is only applied to valuable objects. For example, the methods proposed in Yang et al. (2006) and Ling and Li (1998) are transductive methods. They use a traditional decision tree to extract actions. For each new sample, they search optimal action moving the sample from the current leaf of the tree to a leaf with high probability of desired label node.

  • Inductive Methods (IM): Unlike TM methods that present different actions for different objects, IM methods are aimed to find a single action with a maximal chance of success for a group of objects. Essentially, this process can be considered as building a statistical model based on past data and using the model to formulate a pattern to be executed on a group of objects (Yang & Cheng, 2003a, 2003b; Jiang et al., 2005; Raś & Dardzińska, 2009).

TA is the best strategy for applying on a specific object; this strategy cannot be generalized to other objects. Unlike the TA, the IA can apply on different objects. But when an IA is used for a specific object may not be as effective as a TA for that object.

4.5 Level of domain dependency

All presented approaches are divided into two following groups based on their interestingness and generality of being applicable in different domains:

  • Domain Dependent (DD): The category includes the methods which are based on domain constraints and assumptions. Thus, a domain dependent approach may work well for the specific application, but it cannot be generalized to other application. For example, Almardini et al. (2015) used predicting the readmissions that a new patient is expected to go through according to his or her diagnoses to choose the best treatment in order to achieve the desired outcomes. They examined the problem of predicting the following readmission by proposing two approaches to personalize patients by clustering them into subgroups that exhibit similar properties. They then devised a metric system that evaluated the level of desirability for readmissions along readmission paths, followed by a mapping that transferred the scores to the extracted clusters by calculating the entropy to measure the predictability of the next readmission. This allowed them to find desired patients transitions between clusters, which would result in reducing the number of anticipated readmissions for new patients. This approach works well for the specific healthcare application, but cannot be generalized to other application domains.

  • Domain Independent (DI): The category includes the methods which are based on the Data and independent of a specific domain. Unlike a DD approach, a domain independent approach can widely adopt to different domains (Gao and Yao, 2017; Han et al., 1999; Zhang et al., 2008). But when a DI approach is used for a specific application may not be as effective as a DD approach for that domain.

5 Evaluation measures

Some researchers have introduced some interesting measures to discover effective actions. The measures are as follow:

  • Cost: Each action contains acts and for applying the acts we need to pay some costs. Therefore, an action which needs lower cost is more interesting. The cost of the action Γ is defined as:

$$ C({\varGamma}) = \sum\limits_{\alpha \in \text{ set of actions of } {\varGamma}}C(\alpha) $$
(3)
  • Validity: Assume given a training set DD, the algorithm A returns a model A(D) ∈Θ. In addition, consider 𝜃∈Θ as the target model which result in desired output. Validity estimates how likely the output of the method A transforms to the desired one when suggested action is applied to the input data D. To evaluate it, the underlying method A could be rebuilt, incorporating the action in the input data namely \(D^{\prime }\). Assume running A on the training set \(D^{\prime }\) results in \(\theta ^{\prime }\). The validity of an action can be estimated by measuring how closely \(\theta ^{\prime }\) is to 𝜃.

  • Profit: : An action suggests some acts to the user to gain a profit in his/her domain. Assume pg > 0 is the profit associated with the status sg which is specified by a domain expert and action Γ incurs some cost \(C(W,W^{\prime })\). The net profit np of applying the action on the object oO is defined as below:

$$ np({\varGamma}, o)=\left\{\begin{array}{cl} p_{g} - C({\varGamma}),& \qquad S(o) \neq s_{g} \text{ and } S(o {\varGamma})=s_{g}\\ - C({\varGamma}),& \qquad \text{otherwise} \end{array}\right. $$
(4)
  • Action rules measures: For action rules measures of support, confidence, lift, contribution and interestingness have been extended which are explained in Cao et al. (2010) and Tsay (2014).

6 Domain

Most of the proposed methods in the field of AKD summarized in Table 3 are applied in following domains:

  1. 1.

    Business includes banking, customer credit, corporate profitability, market and so on.

  2. 2.

    Health includes clinical, hospital, mental health, biology and so on.

  3. 3.

    Social includes a variety of social networks and user-generated data on the web, such as forums, queries, searches, conversations, and so on.

  4. 4.

    Art includes methods which search actions to change the emotion and to improve artist sales in the market.

Table 3 Domains of AKD methods

7 Review of AKD methods

Ras and Wieczorkowska (2000) made one of the first efforts to propose an inductive method for action mining. They defined the concept of action rules and proposed a method to produce theme using pairs of classification rules. The method is based on comparing profiles of two groups of instances that are desirable and those that are undesirable to find what changes within attributes are needed to achieve the goal. Afterward, In order to simplify the process required to compute the support of an action rule, they extended the representing power of action rules and proposed the E-Action Rule and a method to extract those from data, namely the DEAR System (Ras & Tsay, 2003). To decrease the number of compression of System DEAR, Tsay et al. presented System DEAR2 (2005) and then, Ras et al. proposed Action Tree (System DEAR2.2 (Raś & Tsay, 2008)) to reduce the search space and time complexity in comparison with DEAR2. They showed (Raś & Dardzińska, 2006) that action rules do not have to be built from pairs of classification rules and single classification rules are sufficient to achieve the same goal. However, they only proposed a theoretical framework without giving any detailed algorithm for action rules construction. Then, they proposed ARAS (Raś et al., 2007) to construct action rules from a single classification rule with a very simple LERS-like algorithm. They also showed using LERS as the pre-processing module for ARAS decreased the overall time complexity of ARAS system lower than DEAR. Trepos et al. proposed a general-to-specific algorithm, DAKAR, Trepos et al. (2005) which learns action rules using a beam search strategy approach in the space of possible actions. In order to avoid generating similar actions and generate actions which lead to the situations covered by the rules of the desired label, they enhanced DAKAR to DAKAR2 (Trépos et al., 2013). The rules of the desired label are organized with a graph structure. The algorithm browses the set of cliques in order to build actions.

He et al. in (2005) formally introduced the problem of mining action rules without pre-existing classification rules and formulated it as a search problem in a support-confidence-cost framework. They also proposed Apriori-like algorithms for learning action rules directly from the database. However, they did not take into account the dependencies between attribute values which are naturally linked with the cost of rules. Ras et al. presented ARED (Im and Raś, 2008) to identify certain relationships between granules defined by the indiscernibility relation on instances in the dataset; some of these relationships uniquely define action rules. The proposed algorithm generates a complete set of shortest action rules without using preexisting classification rules. Afterward, they proposed a bottom-up strategy which has some similarity to systems ERID and LERS (Raś & Dardzińska, 2008) to extract the action rules which have minimal attribute involvement. Tsay et al. proposed Strategy Generator (2008) which is similar to Apriori and aims to find all cause-effect rules.

Ras et al. proposed Build-and-Search method Raś and Tzacheva (2005) to extract action rules from a distributed data with autonomous sites. The method constructed an action rule by replacing one or some specific acts of a certain action rule extracted from a site with conditional part of another action rule extracted from another site which the acts are decision part of these rules. Tree-based lowest cost action (Tzacheva & Tsay, 2008) enhanced Build-and-Search. MR-AAD (Tzacheva et al., 2016) is a distributed processing algorithm based on Map-Reduce framework to produce action rules from a distributed massive data. The method allows parallel discovery of Action Rules in a distributed environment, based on MapReduce paradigm through count distribution. They used Hadoop, as a scalable and distributed framework for implementing this method. The method uses an Apriori strategy for action rules extraction. Apriori algorithm finds all possible Action Set combinations by scanning the database time after time, which consume a lot of time and memory space for massive data. With Count Distribution stated as the best way to parallelize the Apriori algorithm, they proposed an implementation of MR-Apriori Count Distribution Algorithm for parallel action rules discovery in this paper. The computations involve the construction of action rules and finding their support and confidence. Bagavathi et al. (2017) presented an approach based on Grabbing Strategy to build Action Rules for large datasets using Apache Spark framework.

Hajja et al. in (2012a) proposed a method to extract action rules from temporal and object-based data. To avoid extracting action rules from instances come from different distributions, the method divided entire data into multiple independent sub-data from which action rules are extracted. They extended the method in Hajja et al. (2012b) to find valid action rules by dropping the assumption that the instances within one object are observed independently. Although this approach leads to find more accurate action rules, when the data does not contain many observations for unique objects, the actions may be lacking high degrees of diverseness. To overcome the limitation, they proposed a hybrid approach (Hajja et al., 2014) which builds a hierarchy of clusters of sub-data. Su et al. in (2012) introduced actionable behavioral rule mining which aims to extract action rules for influencing an entity behavior. In their approach, objects represent observations of an entity while in previous approaches objects are members of an entity. In addition, there is a current observation where each change of a proposed action rule needs to change an attribute value of the entity from the current observation. Su et al. presented MABR (Su et al., 2012) to find action rules from the object-based data in a framework of support and profit. Zeng et al. proposed a liner-function-based observation-weighting method (Su et al., 2015) which handled the problem of non-uniform contribution for different instances to support an action rule. They proposed a power-function-based observation-weighting method (Su and Mao, 2015) to find the best one of a family of power-function-based models which includes the liner-function-based model. In mining such action rules, it often occurs that different rules may suggest the same acts with different expected profit called conflicting rules. To resolve the conflicts, Su et al. (2014) utilized rule ranking procedure for selecting the rule with the highest profit. Gao presented four algorithms in Gao and Yao (2017) to search action rule with the maximum profit without cost limitation, action rule with the minimum cost while the maximum profit, action rule with the maximum profit under limited action cost and action rule with the minimum cost to obtain the desired profit. These algorithms have polynomial time complexity. To guarantee the reliability of the actionable behavioral rules, mentioned approaches need to find frequent action sets. However, this will result in high time complexity. In Su et al. (2017), to handle this problem, a decision-tree-classifier-based mining method is proposed. Firstly, with each decision attribute, decision tree classifiers are build. Then, action sets are generated by recursive traversal of each tree. Finally, the actionable behavioral rules are constructed based on the action sets.

Yang et al. presented three approaches (Yang and Cheng, 2002) for mining the transductive actions: 1- Instance-based approach which is time-consuming due to using the entire positive population. 2- cluster-based approach which considers only the positive class distribution and needs to be specified the number of clusters. 3- SVM-based approach which overcome to problems of cluster-based.

Yang et al. (2006) presented a transductive method for action mining. This method first learns a decision tree (DT) and then analyzes the tree to find the optimal actions for each object. They inferred the optimal action in the sense of profitability. Alam and Alam (2012) proposed post-processing of some DTs for discovering more profitable actions. Cui et al. (2015) presented a framework to post process any Additive Tree Model classifier to extract the optimal action. They formulated the problem in an integer linear programming. Their formulation encompasses several models for classification and regression, including random forests, adaboost, and gradient boosting trees. In Lu et al. (2017), they presented a state space graph formulation to model the problem as a optimization problem that can be solved by graph search (Lu et al., 2017). They first proposed an optimal heuristic search method which intends to find an optimal solution. To take a good balance between the search time and action quality, they also presented a sub-optimal heuristic search. The approach is applied on bank data and customer credits data. Tolomei presented Feature Tweaking algorithm (Tolomei et al., 2017) which operates on top of any tree-based ensemble binary classifier and discovers the exact optimal action by using the internals of the ensemble to derive a feedback loop from building a set of candidate acts. In ensemble decision tree methods, overlaps of attributes among the multiple trees may significantly affect derived actions in the latter stage. To solve this problem, Subramani et al. proposed RDADT method (Subramani et al., 2016) to construct trees in a reordered fashion which forces all the attributes to participate in the decision committee based on ranking order. This method guarantees the trees are diversified and unique at both the parent and children node level. The methods did not consider attribute’s relations. Shamsinejadbabaki et al. (2013) used causal networks to extract actions using cause and effect relationships between attributes. They argued that using causal relationships can result in more applicable actions in real-world problems.

Ranganathan et al. (2018) proposed an efficient system, to generate meta-actions by implementing specific action rule discovery based on Grabbing Strategy and applying it on Twitter data for semantic analysis. The proposed Spark-based approach has six steps of data collection, data preprocessing, classification, semantic analysis, generating action rules and summarizing. Stanford Natural Language Analysis Core (Java) is used for semantic analysis. In the classification stage, the LERS algorithm (Grzymała-Busse et al., 2013) is used to classify tweets. In the action exploration phase, the algorithm (SARGS) is used which is similar to AroGS (Ras and Wyrzykowska, 2007) (the difference is that the algorithm fills missing data at an optimal time). Kuang and Raś (2016) proposed a hierarchical strategy to sort meta actions. The purpose of the proposed method is to build a forest of trees, each tree consisting of meta nodes. The algorithm is designed to organize the advanced matrix in a hierarchical manner and generate a list of meta-nodes T, so it is obvious that an advanced matrix M should be given. At the beginning of the algorithm, meta-list and rules are generated, which are a list of meta actions descendingly sorted by their popularity in M and a set of all action rules in M respectively. Meanwhile, a map Meta-Map and a list T are initialized as well. Meta-Map is created to store the mappings (entries) from sets of meta-actions to their sets of action rules during the entire process. Generally speaking, the content in an entry or mapping in Meta-Map is identical to a meta-node at the end of the algorithm and each action rule can be involved with only one mapping. The list T will be used to store the continually generated meta-nodes which is the extracted action.

Touati et al (Touati et al., 2014) strive to extract negative side effects patterns from multivalued features. They also present a patients clustering scheme based on similar negative side effects (negative action sets). Negative side effects are represented by action sets which model the appearance of certain diagnoses when applying a meta-action on specific patients. The negative action sets are part of the meta-action effects, and they are best captured by the reverse set difference between the prior and posterior state of the patient. Side-effects based on action terms in the context of action rules are the unintended changes in the values of some flexible features that meta-actions trigger on objects. Almardini et al. (2015) defined procedure paths as the sequence of procedures that a given patient undertakes to reach a desired treatment. In this paper, they first introduced two approaches for anticipating the state of the patient that he or she will end up in after performing some procedure. By clustering patients into subgroups that exhibit similar properties, they improved the predictability of their procedure paths, which they evaluated by calculating the entropy to measure the level of predictability of following procedure. The clustering approach used is essentially a way of personalizing patients according to their properties. Then, they further devised a metric system that will evaluate the level of desirability for procedures along procedure paths, which they would subsequently map to a metric system for the extracted clusters. This will allow us to find desired transitions between patients in clusters, which would result in reducing the number of anticipated readmissions for new patients.

8 Summary of AKD’s researches

Here a characterization table (Table 4) is presented which summarizes the study of AKD methods. The table can be used to understand differences and similarities between various presented AKD methods and the strengths and weaknesses of them. The presented table can be used as a clustering tool for those who want to choose similar methods to their method to evaluate its results. This table contains various columns such as method, Data, Method functionality, actions generality, Integration, Domain dependency, Advantage, Disadvantage where each of them is described in the following. Method column is used to mention the presented methods or systems. Input data column is portioned to three following sub-columns: type, quality and distributed where type sub-column is used to specify the data type which is used by each method, quality sub-column is used to specify whether the method assumes that data is complete (CD) or not (ICD), distributed column is used to specify whether the method is distributed () or not (×). Extracted action column is used to specify action preexistence category which the method is belong it, Action Discovering Methods (ADM), Action Selection Methods (ASM) or Hybrid Methods (HM). Besides, if the method is an ASM, in this column is specified whether it is a Meta Action Mining (MAM) or Plan Mining (PM) method. Actions generality column is used to specify generality of discovered actions by the method, inductive or transductive action. Integrity column is used to specify the time of considering real-world factors in the mining process of the method, post-process (PP) or integrated (IP). Domain dependency column is used to specify that the method is domain dependent (DD) or domain independent (DI). Advantage, Disadvantage columns are used to state advantage and disadvantage of presented approaches.

Table 4 Characterization table

9 Conclusion and future works

Actionable knowledge discovery is almost a new and quite necessary concept in knowledge engineering and has attracted a lot of attention recently. Until now many works have been proposed in this field, but because of ambiguity and broadness of actionable knowledge discovery area, AKD boundaries are not clear and comparing AKD methods is hard. Up to now, there is neither comprehensive survey about AKD techniques nor a precise and a fair categorization of them. Thus, in this paper, the various AKD techniques have been reviewed and classified to make it possible to compare them fairly.

Here various viewpoints are described by which different techniques have been partitioned as well as the importance of considering them. Finally, a characterization table has been proposed that can be used to study actionable knowledge algorithm and systems. The table describes similar works, states differences between various approaches and highlights the strengths and weaknesses of them.

AKD is an open area for groundbreaking research works. The future works can be listed as follow:

  • Few presented approaches are the integrated process (IP), and most of them are post-processing. Since PP approaches depend on the results of a data mining technique, they may fail to discover some useful actions so IP approaches may work better than PP approaches. Thus researches on mining actionable knowledge should pay more attention to integrated techniques.

  • Until now few researches have considered incomplete data while this is a real-world constraint, and the goal of domain driven data mining is developing data mining for real-world problems. Therefore, further research on mining actionable knowledge should pay attention to these settings.

  • Most of domain dependent approaches usually work well for the specific domain but cannot be generalized to other application domains. While a domain independent approach is flexible and widely adopted domains, it may do not work as well as a domain dependent approach. We have to tradeoff between our aims and select one of these approaches.

  • More investigations are needed for applying AKD methods in real domains such as web, text, social networks and so forth.

  • Parallel and distributed AKD algorithms have to be developed to overcome the huge size of many databases, the wide distribution of data, and the computational complexity of some data mining methods. Such algorithms divide the data into partitions, which are processed in parallel. The results from the partitions are then merged. Moreover, the high cost of some AKD processes promotes the need for incremental algorithms that incorporate database updates without having to mine the entire data again. Such algorithms perform knowledge modification incrementally to amend and strengthen what was previously discovered.

  • Databases may contain complex data objects, hypertext and multimedia data, spatial data, temporal data, or transaction data. Actionability should be defined for each specific kind of data and AKD systems should be constructed for them.

  • Another topic for further work is the evaluation of action miners. This evaluation requires experimental control over the data, which means it cannot be performed using existing datasets. Ideally, one should have access to real-life situations that allow for active experimentation. This may be difficult to obtain in practice. Advanced and well-studied methods for simulating such situations would be a welcome alternative.

There are few software packages available for discovering action rules. For instance, Action4ft-Miner module of the Lisp-Miner project developed by Jan Rauch’s group discovers action rules under different constraints (Rauch, 2012). More software packages are needed to actionable knowledge discovery.