1 Introduction

The most common types of information systems in the enterprise computing environment are transaction processing systems (TPS), management information systems (MIS), decision support systems (DSS), executive information systems, enterprise information systems (EIS) or called enterprise systems (ES) (Enterprise Resource Planning, ERP) [9]. Together, these systems help enterprises to accomplish both routine and special tasks–from recording sales, to processing payrolls, to supporting decisions in various departments, to providing alternatives for business operations. However, as businesses continue to use these systems for a growing number of functions in today’s competitive environment, most enterprises are facing challenges processing and analyzing huge amounts of data and turning it into useful information. They have too much detailed operational data, yet they cannot get the satisfactory answers they need from large volumes of information to enable them to react quickly to changing circumstances because of the scattered nature of the data.

For delivering the correct information in the correct format to the correct people at the correct time for decision-making purposes, Business Intelligence (BI) is presented. This set of techniques, technologies, tools, and solutions is designed to enable users to efficiently extract useful business information from huge amounts of data. The concept of BI was first introduced in 1990s, and referred to tools and technologies including data warehouses. Now business intelligence is regarded as a powerful solution, an extremely valuable tool, and a key approach to increasing the value of the enterprise. More and more business enterprises are deploying advanced business intelligence systems to enhance their competitiveness.

2 Information systems in enterprises

No one can deny the fact that information systems are widely used in business operations in more organizations than ever before. For example, information systems are used in finance or accounting divisions to forecast revenues [49], to monitor business activity, to determine the best uses of funds, to manage financial resources [102, 103], to analyze investments, and to perform audits in order to make sure that the financial reports and documents are accurate. In the sales or marketing departments, information systems are used to check the inventory, to seek the best approaches for sales, and to set appropriate product prices. In general, the types of information systems used within organizations can be classified into: [1] TPS; [2] MIS; [3] DSS, OLAP (Online Analytical Processing), and BI; [4] EIS/ES; and [5] other special-purpose systems.

As one of the most fundamental information systems in many enterprises, TPS is used to handle the large volume of business transactions that occur daily within an organization. Information systems that not only support business processes and operations, but also help competitive tactical decision-making are MIS. MIS uses the data from a TPS to generate useful information for management at the tactical level.

Decision support systems are a class of computer-based information systems or knowledge-based systems that support strategic decision-making activities [18, 19, 53, 81, 83, 89, 94, 97]. DSS is a collection of people, procedures, data, and models used to support specific business decision-making tasks. Distributed DSS, intelligent DSS, and web-based DSS appeared through integrating with networking technology, artificial intelligence, and the Internet. DSS differs from an MIS in the support given to users, the decision emphasis, the development and approach, the system components, and the outputs. DSS marked the beginning of information system specifically designed for decision support in complex environments.

Business intelligence focuses not only on real-time data, but on real-time analysis that can be performed and instantaneously changes parameters of business processes. BI does not provide the same functionality as the traditional information systems, but instead operates on data that is extracted from operational data sources, and provides an effective means to propagate actions back into business processes and operations.

Enterprise systems, Enterprise information systems or Enterprise Resource Planning is a set of integrated programs that is capable of managing a company’s vital business operations for an entire enterprise [80, 95, 104]. ERP is a term originally derived from Manufacturing Resource Planning (MRP). MRP evolved into ERP when routings and company’s capacity planning activity became major part of the standard software activity [46]. ERP systems typically handle the manufacturing, logistics, inventory, invoicing, accounting, and distribution for a company. ES or EIS software commonly aids in the control of many business activities, such as production, quality management, inventory, sales, marketing, delivery, and human resources management. As a component of ES, a workflow system is often times a rule-based management software that directs, coordinates, and monitors the execution of an interrelated set of tasks arranged to form a business process [99]. More specifically, workflow is the operational aspect of a work procedure, i.e., how tasks are structured, who performs then and how they are performed, what their relative order is, and how they are synchronized. Its primary purpose is to provide users with tracking, routing and other capabilities designed to improve business processes. Graph-based formalisms such as Petri nets are used to model and analyze workflow issues [13, 55].

Other special-purpose information systems include expert systems, knowledge-based systems, virtual systems, e-business systems, and other systems such as social networks that will not be discussed in this paper [8, 2022, 44, 47, 52, 54, 74, 76, 88, 93, 100]. Electronic Business (e-business) is also referred to as E-Commerce [23, 45, 60, 78]. It mainly consists of the distributing, buying, selling, marketing, and servicing of products over electronic systems such as the Internet or other computer networks. E-commerce involves business transactions executed electronically between entities such as business-to-business, business-to-consumers, and others. E-commerce offers more opportunities to enterprises by enabling them to market and sell at a low cost worldwide, thus enabling those small enterprises to enter the global market right from start-up. However, many large companies and major retailers also offer their products online.

The BI market has been continusely growing and reached 10.7 billion dollars through 2011 [17]. Meanwhile we can see the significant boosting BI research in the area of information technology. Figure 1 and Table 1 show the number of BI related papers published in Information Technology and Management journal for the period of 2000–2011 and the sources of these papers. This paper is intended as a brief review of BI with the emphasis on their relevancy to information technology and management.

Fig. 1
figure 1

The trend of BI related papers published in IT&M

Table 1 Papers related to BI published in IT&M

3 Technical framework of BI in enterprise computing environment

It is normally accepted that the technology categories of a business intelligence system mainly encompass data warehouses, data marts, OLAP, and data mining. Figure 2 shows the architecture of BI systems in enterprise computing environment. Data warehouse is the fundamental infrastructure of business intelligence systems, and data mining is its core component—one that allows users to analyze data, identify data patterns, and detect trends. OLAP, on the other hand, is a set of front-end analyzing tools. Those who wish to construct benign enterprise intelligence computing environment should consider at least the following: the delivery of accurate, valid, integrated, and in-time data, and the means by which the data can be transformed into decision information. However, neither high quality data nor an effective means is easily acquired. An effective technical framework can be used to solve two issues above. The framework consists of an operational applications tier, a data acquisition tier, a data warehouse tier, a platforms-and-enterprise BI suites tier, and an extended corporate performance management tier. The operational applications tier includes, for example, systems such as legacy systems, CRM, ES, and SCM. Extraction, Transformation, and Loading belong in the data acquisition tier. Besides data warehousing, the data warehouse tier includes data marts and an operational data store. Data warehousing, OLAP, and data mining are three of the most significant technologies in the BI arena. A data warehouse can be defined as a large repository of historical data pertaining to an organization. OLAP refers to the technologies of performing complex analysis over the information stored in a data warehouse. The complexity of queries required to support OLAP applications makes it difficult to implement OLAP using standard relational database technology. Data mining is the process of identifying and interpreting patterns in data to solve a specific business problem.

Fig. 2
figure 2

Architecture of BI systems in enterprise computing environment

A data warehouse can be viewed as a database that holds business information such as sales, products or other data about day-to-day operations, covering the aspects of the company’s processes, production, and customers, or other data sources in the enterprises. This data is poured into a data warehouse on a regular schedule, and, after that, management can perform complex queries and analysis (normally data mining or OLAP) on the information without slowing down operational systems. The data warehouse provides users with a multi-dimensional view of the data they need to analyze business conditions. It is designed specifically to support managerial decision-making, rather than simply meeting the needs of transactions processing systems. A data warehouse typically starts out as a very large database, containing millions and even hundreds of millions of data records. To remain current and accurate, the data warehouse receives regular updates. The updating process must be efficient, automated or semi-automated, and as fast as possible owing to the colossal amount of data involved. It is common for a data warehouse to contain several years of current and historical data. Web warehousing [75] is the combination of data warehousing and the World Wide Web technology. The Internet has made it possible to apply web technology to traditional data warehousing, which has resulted in improved cost savings and productivity. According to Nemati et al. [57], the basic purpose of a data warehouse is to empower knowledge workers with information that allows them to make decisions based on a solid foundation of fact. However, only a fraction of information exists on computers, and the vast majority of a firm’s intellectual assets such as institutional knowledge assets exist in the minds of people [100]. Therefore, the new generation of knowledge system is required to provide the capacity to capture, cleanse, store, organize, leverage, and disseminate not only data and information but also the knowledge of the firm. As an extension to the data warehouse model, the knowledge warehouse is likely to propose a new direction.

A data mart is a subset or a specialized version of a data warehouse. A data mart contains a subset of the data for a single aspect of company’s business, instead of storing all of its enterprise data in one database, e.g., finance, inventory, or personnel. A data warehouse is used for summary data that can be accessed by an entire enterprise, whereas a data mart is helpful for small groups who want to access detailed data. Much like a data warehouse, the data marts typically can be deployed on less powerful hardware with small storage devices and helps business people to strategize based on analyses of past trends and experiences. However, the key difference between a data warehouse and a data mart is that the creation of a data mart is based on a specific, predefined need for a certain grouping and configuration of selected data. Since a data mart emphasizes easy access to relevant information, the star schema or multi-dimensional model is a fairly popular design choice, because it enables a relational database to emulate the structure and analytical functionality of a multi-dimensional database.

The notion of OLAP which was introduced in 1990s refers to the techniques of performing complex analysis of the information stored in a data warehouse. In general, OLAP applications are characterized by the rendering of enterprise data into multi-dimensional perspectives. This is achieved through complex queries that aggregate and consolidate data on a frequent basis, often using statistical formulae. For example, a supermarket may be interested in comparing its total sales for this year, or identifying sequences of 3 years or more when its sales have increased or decreased. It has been claimed that relational database technology is well suited to fulfilling the needs of OLAP. However, the major use of relational technology so far has been in traditional transaction management. Conversely, OLAP provides a quick approach to the answers to analytical queries that are dimensional in nature. OLAP can provide on-line analytical support, for which the relational model is ill-equipped. OLAP is part of the broader category of business intelligence, which also includes Extract Transform Load (ETL). As a matter of fact, readers can easily gauge the limitations of the relational model by trying to answer the queries in relational language such as SQL.

Data mining is the process of identifying and interpreting patterns in data to solve a specific business problem [10, 14, 24, 87]. It is an information analysis tool that involves the automated discovery of patterns and relationships in a data source. Data mining makes use of advanced statistical techniques and machine learning to discover facts in data warehouses or data marts, including in databases on the Internet. Unlike query tools, which require users to formulate and test hypotheses, data mining uses analysis tools to automatically generate a hypothesis about the patterns found in the data and then to predict future behavior. The objective is to discover patterns, trends, and rules from data warehouses to evaluate business operations, tactics, or strategies, which in turn should improve the competitiveness and profitability of enterprises, and should optimize business processes. BI vendors such as Oracle, SAS, and others are all incorporating data mining functionality into their products. Data mining strategies for BI include classification, time series analysis, clustering, association analysis, decision tree induction, support vector machine [106], k-nearest neighbor, genetic algorithms [33, 37, 38, 85], rough set [108], fuzzy sets [96, 109], k-means, case-based reasoning [25, 26, 34, 71, 9092], feature space theory [41, 42], Bayesian networks [84], particle swarm optimization [83].

4 Business intelligence algorithms

The algorithms of data mining are the major components for business intelligence systems. Data mining strategies include classification, clustering, association analysis and many others.

4.1 Association rule

Association rule learning is also known as market basket analysis. Market basket analysis is used to determine those items most likely to be purchased by a customer during a shopping experience. The questions like “Supposing a customer purchases product A. How likely is the customer to purchase product B?” or “What kinds of items is the customer likely to purchase together?” are answered by association-finding algorithms. The output of the market basket analysis is generally a set of associations about customers’ purchasing behavior. These associations are given in the form of a special set of rules known as association rules. Association rules are used to help to determine appropriate product marketing strategies. Association rules are of the form wherein a set of n items appear in a group along with a set of m items in the same group. For example, association rules can help us determine that if saving and checking accounts are owned by a customer, the customer will own a certificate of deposit with a certain frequency. While association rules do not warrant inferences of causality, they may point to relationships among items or events that could be studied further using more appropriate analytical techniques to determine the structure and nature of causalities that may exist. Unlike traditional classification, association rule generators allow the consequence of a rule to contain one or several attribute values, whereas traditional classification rules usually limit the consequent of a rule to a single attribute. In addition, using an association rule generator, an attribute may appear as both precondition and consequent of different rules in traditional classification. However, when attributes are present after generating association rules, this process becomes unreasonable, owing to large number of possible conditions for the consequent of each rule.

Some candidate-generation-and-test algorithms such as the Apriori algorithm [2] have been developed to generate association rules efficiently. This influential algorithm is used to find or mine frequent item sets which have attribute value combinations that meet a specified coverage requirement. Those attribute value combinations that do not meet the user’s requirements are discarded. By this, the rule generation process can be completed in a reasonable amount of time. Apriori association rule generation is often a two-step process. The first step is to generate item sets, so that the second step can use the generated item sets to create a set of association rules.

However, candidate-generation-and-test algorithms suffer from both the generation of huge numbers of candidates and the scanning of the database time and time again. Thus, another approach called pattern-growth has been proposed. FP-Tree algorithm is such a method, used to mine the complete set of frequent item sets but without candidate generation. It does not need to generate a huge number of candidate sets, but retains the item set association information for the sake of mining separately; it scans the data set only a few times. As a result, in most cases, algorithms based on the pattern-growth approach find frequent patterns faster than those based on the candidate-generation-and-test approach. Hirate proposed a new mining algorithm, called “TF2P-growth” [27], which does not require any thresholds. This algorithm mines patterns in the descending order of their support values without any thresholds and returns frequent patterns to users sequentially, with short response time. Contrast set learning [56] is another form of associative learning. Contrast set learners use rules that meaningfully differ in their distribution across subsets.

Association rules are particularly popular because of their ability to find relationships in large databases without having the restriction of having to choose a single dependent variable. However, it is still important to minimize the work required by an association rule algorithm since volumes of data are often stored for market basket analysis.

4.2 Classification and prediction

Classification algorithm is simply a model for predicting a categorical variable that assumes one of a predetermined set of values. These values can be either nominal or ordinal, though ordinal variables are typically treated the same way as nominal ones in these models. When a problem is easy to classify and its boundary function is more complicated than it needs to be, the boundary is likely over-fitting. Analogously, when a problem is hard and the classifier is not powerful enough, the boundary becomes under-fitting. Classification describes the assignment of data records into predefined categories and discovers the relationship between the other variables and the target category. When a new record is inputted, the classifier determines the category and the probability that the record belongs to. Examples of classification algorithms include: linear classifiers (e.g. Fisher’s linear discriminant, logistic regression, naive Bayesian classifier), quadratic classifiers, k-nearest neighbor, boosting, decision trees, neural networks [11, 39, 40, 67, 82, 110112, 114], Bayesian networks, support vector machines, hidden Markov models, and so on.

However, no classification method stands out over the others with regards to all data types and domains. Empirical comparisons of classification methods were introduced by Lim [50] and Shavlik [65]. Classification and prediction methods can be compared and evaluated according to the several kinds of criteria [24]: predictive accuracy, robustness, scalability, and interpretability. Predictive accuracy refers to the ability of the model to correctly predict the target category. Robustness indicates the ability of the model to make correct predictions given noisy data or data with missing values. Scalability refers to the ability to construct the model efficiently given large amounts of data. Interpretability denotes the ability of interpretation or visualization provided by the classified model.

The decision tree model is a flow-chart-like structure, in which leaves represent classifications, each inner node denotes a test on an attribute, and branches represent conjunctions of features that lead to those classifications. It is the ability of decision trees not only to predict the value of a categorical variable, but also to directly use categorical variables as input or predictor variables. This is perhaps the decision tree’s single greatest advantage. Decision trees are by their very nature well-suited to deal with large numbers of input variables, to handle a mixture of data types, and to handle data that is not homogeneous, i.e., whose variables do not have the same interrelationships throughout the data space. They also provide insight into the structure of the data space and the meaning of a model, a result at times as important as the accuracy of a model. It should be noted that a variation of decision trees called regression trees can be used to build regression models rather than classification models, enjoying the same benefits just described. The basic algorithm for decision tree induction is a greedy algorithm that constructs decision trees in a top-down recursive divide-and-conquer manner [24]. This algorithm is known as a version of ID3 [7], a well-known decision tree induction algorithm. C4.5, a later version of the ID3 algorithm, uses the training samples to estimate the accuracy of each rule. Since this use results in an optimistic estimate of rule accuracy, C4.5 uses a pessimistic estimate to compensate for the bias. Alternatively, a set of test samples independent from the training set can be used to estimate rule accuracy.

Novel in the field of data mining, support vector machine (SVM) and kernel skill have been successfully applied to a variety of domains. SVM is a promising method for classification and regression analysis due to its solid mathematical foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However, despite these two distinguishing properties, SVM is usually not chosen for large-scale data mining problems because its training complexity is highly dependent on the size of the data set. Unlike traditional pattern recognition and machine learning, real-world data mining applications often involve huge numbers of data records. Thus, it is too expensive to perform multiple scans on the entire data set, and it is also unfeasible to put all of the data set into memory.

But SVM is good at supervised learning that tries to maximize the generalization by maximizing the margin while supporting nonlinear separation using advanced kernels, by which SVM tries to avoid over-fitting and under-fitting. The margin in SVM denotes the distance from the boundary to the closest data in the feature space. In SVM, the problem of computing a margin maximized boundary function is specified by the following quadratic programming (QP) problems [79]:

$$ \begin{array}{*{20}l} {\mathop {\min }\limits_{\alpha}} & {\frac{1}{2}\sum\limits_{i = 1}^{l} {\sum\limits_{j = 1}^{l} {y_{i} y_{j} \alpha_{i} \alpha_{j} k(x_{i} \cdot x_{j} )} - \sum\limits_{j = 1}^{l} {\alpha_{j} } } } \\ {s.t.} & {\sum\limits_{i = 1}^{l} {y_{i} \alpha_{i} } = 0} \\ {} & {\forall i:\,0 \le \alpha_{i} \le C,i = 1, \ldots l} \\ \end{array} \, $$

where l denotes the number of training data, and α denotes a vector of l variables, and each α i corresponds to training data (x i , y i ). C is the soft margin parameter, controlling the influence of the outliers (or noise) in training data. The kernel k (x i , y i ) for linear boundary function is (x i , y j ), a scalar product of two data points. The nonlinear transformation of the feature space is performed by replacing k (x i , y i ) with an advanced kernel, such as the polynomial kernel \( (x^{T} x_{i} + 1)^{p} \)or the RBF kernel \( \exp \left( { - \frac{{|x - x_{i} |^{2} }}{{\sigma^{2} }}} \right) \). The use of an advanced kernel is an attractive computational short-cut. An advanced kernel is a function that operates on the input data but has the effect of computing the scalar product of their images in what is usually a much higher dimensional feature space, or even an infinite dimensional space, which allows one to work implicitly with hyper-planes in highly complex spaces.

However, as mentioned before, most of the existing support vector machines are not feasible to run very large data sets due to their high complexity on the data size or to the frequent accesses on such large data sets causing expensive I/O operations. Yu [105] presents a novel method, called Clustering-Based SVM (CB-SVM), which maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples. These samples carry statistical summaries of the data and maximize the benefit of learning. The analyses show that the training complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much fewer than that of the entire data set. The experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large data sets and very accurate in terms of classification. However, CB-SVM is currently limited to the usage of linear kernels since the hierarchical micro-clusters would not be isomorphic to a new high-dimensional feature space once the space is transformed by a nonlinear kernel. That is, the statistical summaries of data such as radius and distances computed in the input space will not be preserved in the transformed feature space. Constructing effective indexing structures for a non-linear kernel is an interesting direction for future work since it has high practical value, especially for classifying large business data sets. Hong and Weiss [28] reviewed the key theoretical developments in PAC and statistical learning theory that have led to the development of support vector machines and to the use of multiple models for increased predictive accuracy. Training support vector machines involves a huge optimization problem. Boley and Cao [4] proposed an algorithm called Cluster SVM that accelerates the training process by exploiting the distributional properties of the training data.

In order to improve the performances of traditional SVM on a dataset with unbalanced class distribution, an improved SVM was presented. Genetic algorithm-SVM (GA-SVM) was constructed by combining the genetic algorithm and the simple support vector machine. The parameters of SVM were coded into chromosomes with gray coding strategy. The results by Huang et al. [29] indicate that GA-SVM can gain higher classification accurately and with a faster learning speed, and that it works well with a faster learning speed on a perfectly constructed dataset.

The key differences between prediction and classification are that we use prediction to predict a continuous value, rather than a categorical label. The prediction of continuous values can be modeled by statistical techniques of regression. For example, we might like to develop a model to predict the salary of college graduates with 10 years of work experience, or the potential sales of a new product given its price. Many problems can be solved by linear regression, and even more can be tackled by applying transformations to the variables so that a nonlinear problem can be converted to a linear one. Regression models include linear and multiple regressions, and nonlinear regression. Other regression models include generalized linear models and log-linear models. Generalized linear models represent the theoretical foundation on which linear regression can be applied to the modeling of categorical response variables. Logistic regression and Poisson regression are two examples of generalized linear models. Log-linear models approximate discrete multidimensional probability distributions. They may be used to estimate the probability values associated with data cube cells. Some commercial BI software packages are devoted to solving regression problems.

4.3 Clustering analysis

Clustering analysis is a common technique used in many fields, including machine learning, data mining, pattern recognition, image analysis, bioinformatics, and market research [15, 16, 43]. Clustering is a typical form of unsupervised learning which classifies similar objects into different groups, or more precisely, partition a data set into clusters, so that the data in each subset ideally share some common trait. In other words, clustering analysis is “the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other cluster” [24]. Cluster analysis is a statistical process used to identify homogeneous groups of data objects. By clustering, one can identify dense and sparse regions and therefore, can discover overall distribution patterns and interesting correlations among data attributes. Due to the massive sizes of enterprise data today, implementation of any clustering algorithms must be scalable to complete analysis within a reasonable amount of time. Analogously, most clustering statistical algorithms do not work well with large databases due to memory limitations and to the execution times required, as well as to classification algorithms.

In business applications, clustering helps marketers discover distinct groups and characterize customer groups based on purchasing patterns. As a data mining function, cluster analysis can be used as a stand-alone tool to gain insight into the distribution of data, to observe the characteristics of each cluster, and to focus on a set of clusters for further analysis. Alternatively, it may serve as a preprocessing step for other algorithms, such as characterization and classification, which then operate on the detected clusters. As a branch of statistics, cluster analysis has been studied extensively for many years, focusing mainly on distance-based cluster analysis. Cluster analysis tools based on k-means, and several other methods have also been built into many statistical analysis software packages or systems. In machine learning, clustering is an example of unsupervised learning. As opposed to classification, clustering and unsupervised learning do not rely on predefined classes and class-labeled training examples. For this reason, clustering is a form of learning by observation, rather than learning by examples. In conceptual clustering, a group of objects forms a class only if it is describable by a concept. This differs from conventional clustering; which measures similarity based on geometric distance. In general, major clustering analysis methods can be classified into several categories [24] such as partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods.

The k-means algorithm takes the input parameter k, and partitions a set of n objects into k clusters so that intra-cluster similarity is high but the inter-cluster similarity is low. Cluster similarity is measured in regard to the mean value of the objects in a cluster, which can be viewed as the cluster’s center of gravity. The k-means algorithm, however, can be applied only when the mean of a cluster is defined, but this may not always be the case. Similar to the k-means algorithm, the k-medoid algorithm represents each cluster by the center of the objects of the cluster located near the center. Ng and Han proposed the CLARANS algorithm [66], which is an improved k-modoid method.

Wavelet-based Clustering (or Wave Cluster) [66] can be efficiently applied to detect clusters of arbitrary shape. A good clustering analysis approach should be insensitive to the noise, the outliers, and the input order of data. What is more, it should be efficiently used by both low dimensional and high dimensional large datasets. Wave Cluster is a grid-based and density-based algorithm, which uses the multi-resolution property of wavelet transform. It can handle large datasets efficiently and identify arbitrary shaped clusters at varying degrees of detail; furthermore, it can efficiently perform on very large databases. This approach meets most of the desirable properties of a good clustering technique as mentioned above. Here is an example of arbitrary shape data distribution. Figure 3 presents the clustering result produced by Wave Cluster. From this, it is evident that WaveCluster is powerful in handling any type of sophisticated patterns and removing noise.

Fig. 3
figure 3

a Original space. b WaveCluster results

4.4 Time-related analysis and mining

Time-series databases are popular in many applications, such as studying daily fluctuations of the stock market, traces of a dynamic production processes, and the like. Time-related analysis and mining comprises of mining techniques that are applied to the analysis of time-ordered data records. These data mining techniques attempt to detect similar sequences or subsequences in the ordered data.

Time-series databases and sequence databases include two typical time-related data. A time-series database consists of sequences of values or events changing with time. The values are typically measured at equal time intervals. A time-series database is also a sequence database. However, a sequence database is any database that consists of sequences of ordered events, with or without concrete notions of time. Trend analysis, similarity search, and the mining of sequential patterns and periodic patterns are several important aspects of time-related analysis and mining.

There are four major components or movements that are used to characterize time-series data [24]: long-term or trend movements, cyclic movements or cyclic variations, seasonal movements or seasonal variations, and irregular or random movements. Similarity searches in time-series analysis are typically helpful for the analysis of financial markets (like stock data analysis). Sequential pattern mining is the discovery of frequent patterns related to time or other sequences. Since many business transactions, telecommunications records, and production process are time sequenced data, sequential pattern mining is useful in the analysis of such data for understanding marketing, developing customer retention strategies, and so on.

5 Analysis process of business intelligence

Business intelligence generally follows a continuous cycle that begins with a precise description of the business issue. With the data and the mining techniques selected, data miners will conduct mining and evaluate the results. It is likely that further iterations of the data selection and an application of different mining techniques may be necessary to provide a satisfied solution. If the mining effort effectively addresses the original business problem, it becomes necessary to deploy the results so that the work leads to concrete actions taken.

As users use standard statistical techniques or reporting tools to explore the data in databases, they are making a hypothesis about a business issue that they were addressing and then attempting to prove or disprove their hypothesis by looking for data to support or contradict their hypothesis.

Data mining uses an alternative approach that begins with the premise that we do not know what patterns of data exist. Many business and research fields have been proven to be excellent candidates for data mining; for example, banking, insurance, retail, telecommunications, manufacturing, pharmaceuticals, biotechnology and the like, where significant benefits have also been derived. Well-known applications are customer profiling in retail, loan delinquency and fraud detection in banking and finance, customer retention in telecoms, and patient profiling in health care. Data mining is about the discovery of patterns and relationships in data. All of the different applications are using the same data mining concepts and applying them in different ways. That is not to say that data mining is magic and omnipotent. We still have to understand the overall business process. The process starts with defining the business problem that we want to solve. Then a mining expert can concentrate on the right solution. This involves gathering relevant data and discovering hidden patterns using mining algorithms. Once the analysis is complete, the new knowledge extracted from the data can be put into action. The process is depicted in Fig. 4.

Fig. 4
figure 4

Analysis process in applying business intelligence

5.1 Step1: Create a precise description of the business issue

The first step is to identify the business issue that we want to address and then determine how the business issue can be translated into a question, or set of questions, that data mining can tackle. As we are formulating the business issue, we need to also think about whether we have access to the right data. It is important to recognize that the data we hold may not contain the information that is required to answer the question.

5.2 Step2: Map the business issue to model

When the data is being used routinely to support a specific business application, the data and meta data together form what we call data model that supports the application. It is a complex task to define data models for any application. The challenge is that very often we are not sure at the outset which variables are important and therefore exactly what is required. Mapping the business issue to a data model can therefore become a time-consuming activity. The alternative is to use common data models that have been developed to solve business issues similar to the ones we are trying to address. While these types of models may not initially provide us with all of the information we require, they are usually designed to be extendable to include additional variables.

The main advantage of using a common data model is that it provides us with a way of quickly seeing how data mining can be used.

5.3 Step3: Data preprocessing

Most data preprocessing comes in the form of data cleaning, which involves dealing with missing information. Ideally, the majority of data preprocessing takes place before data is permanently stored in a structure such as a data warehouse. Common concerns with noisy data, often represented by random error, include incorrect attribute values, duplicate records, and data smoothing. In very large datasets, noise can come in many shapes and forms. Though some automated graphical tools may assist with data cleaning, the responsibility for data transition still lies in the hands of the data warehouse specialist. For example, a numeric value of -1 for an attribute such as weight or blood pressure is an obvious error. Such errors often occur when data is missing and when default values are assigned to fill in for missing items. If the dataset is large and only a few incorrect values exist, finding such errors can be difficult. Some data analysis tools allow the user to input a valid range of values for numerical data. Data smoothing is both a data cleaning and a data transformation process. Several data smoothing techniques attempt to reduce the number of values for a numeric attribute. Some classifiers, such as neural networks, use functions that perform data smoothing during the classification process. Another data smoothing process which takes place prior to classification is external data smoothing. Rounding and computing mean values are two simple externals. Mean value smoothing is appropriate when we wish to use a classifier that does not support numerical data. In this case, all numerical attribute values are replaced by a corresponding class mean. Another common data smoothing technique attempts to find the possibility of removing atypical instances from the dataset. In most cases, missing attribute values indicate lost information.

For example, a missing value for the attribute “age” certainly indicates a data item that exists but is unaccounted for. However, a missing value for “salary” may be taken as an un-entered data item, but it could also indicate an individual who is unemployed. Some data mining techniques are able to deal directly with missing values. However, many classifiers require all attributes to contain a value. Possible options for dealing with missing data before the data is presented to a data mining algorithm include: discarding records with missing values, replacing missing values with the class mean for real-valued data, and replacing missing attribute values with the values found within other highly similar instances.

5.4 Step4: Apply matched data mining algorithms

When it comes to solving a particular problem, we have several techniques to choose from. The question becomes, how do we know which data mining technique to use?

We can determine an appropriate data mining technique given a set of data containing attributes and values mined alongside information about the nature of the data and the problem to be solved. For a given business issue, the step of selecting mining techniques or algorithms not only includes defining the appropriate technique or mix of techniques to use, but also the way in which the techniques must be applied. Data mining techniques can be generally divided into these two broad categories: discovery mining and predictive mining. Discovery mining applies to a range of techniques whose primary objective is to find patterns inside the business data without any prior knowledge of what patterns exist. Clustering and sequence analysis are typical examples of discovery mining techniques. Predictive data mining is applied to a range of techniques that find relationships between a specific variable, called the target variable, and the other variables in the data. Classification and regression are examples of predictive mining techniques.

5.5 Step5: Visualization, interpretation, and evaluation

Visualization, interpretation, and evaluation are to determine whether a learning model is acceptable and robust enough to be applied to problems outside the test environment [48]. If acceptable results are achieved, it is at the stage where acquired knowledge is translated into terms understandable by users. It should be noted that, vendors and products like IBM Intelligent Miner Visualization, NCR Teradata, and MSMiner [68, 98] are extremely good for this.

Performing any type of data mining can provide a wealth of information that can be difficult to interpret. This interpretation step often requires assistance from a business expert who can translate the mining results back into the business context, since it is unlikely that the business analyst will be a mining expert. Therefore, it is important that the results be presented in such a way that they are relatively easy to interpret. Users need a range of tools that enable them to visualize the results in order to provide the necessary statistical information necessary for facilitating the interpretation.

5.6 Step6: Act on the analysis results and reach goals

We create mathematical representations of the data and call them models. They contain the rules and patterns found by the mining algorithm. These models provide us with a deeper insight into our business; and can be deployed or used by other business processes. A number of possible actions may result from successful application of the knowledge discovery process. To apply what has been learned or mined is the ultimate goal of business intelligence. The deployment of the results of data analysis and mining is possibly the most important of all.

6 Making BI solutions more effective

As mentioned above, business intelligence might give users the ability to gain insights into business or organization by helping them understand the company’s information assets. These data assets can include customer data; supply chain data; manufacturing, sales, and marketing data as well as any other sources of data critical to operation. It also allows users to integrate disparate data sources into a single coherent framework for real-time reporting and detailed analysis. Here are some trends which are dramatically driving the market need for better business intelligence tools: daily rising data volumes, geographically dispersed users, and the existing tools that are difficult to use. The existing business intelligence systems still lack the maturity and breadth of deployment which is need to meet business demands. Broader deployment of business intelligence systems throughout the enterprises will only occur if users can learn an application, deploy it, and manage it effectively. These are some of the reasons for the difficulties in the broad delivery of business intelligence systems in every enterprise.

Very often, business intelligence systems take a long time to install, build, and deploy. The average implementation time for some larger BI solutions even reaches about 6 months. Requirements and budgets often change after a long installation and implementation cycle. Many business intelligence applications are still difficult to use. A majority of BI projects are focused on the implementation, and adequate user training is often overlooked. As a result, a lack of end user acceptance is an critical factor that hampers business intelligence systems. In most cases, business intelligence systems have actually increased the workload, even though they were originally conceived as a means to relieve workload through intuitive reporting and analysis. This can eventually limit the wider deployment of BI throughout the enterprise. Furthermore, the cost and benefit can also be questioned. Often, after the completion of a rather lengthy and costly implementation, demands have changed. Once the applications cannot demonstrate a return on investment in time, or once few benefits are realized, the end users are likely to be disenchanted with business intelligence.

True “enterprise-wide” intelligence would definitely ensure that users who need access and analysis of information to support a business process or a decision will have a powerful yet intuitive solution or platform at their fingertips. Thus, business intelligence solutions should be broadly distributed to all users who need access to information. The more people using a technology, the more valuable the technology becomes. Once the business intelligence system is easier to use and to understand, and allows the user to evaluate alternatives, to draw conclusions, and to make decisions, it will be more broadly implemented. The tools and techniques used to access and analyze the information must be powerful, yet easy both to learn and to use. This can only happen if the information is easy to understand, is timely, and is relevant to the user. In addition, the user should have access to the resources necessary to perform his or her job. For true insight and effectiveness, understanding of the data across boundaries will help the user make business more productive. Analysis tools of business intelligence should be powerful, yet simple for users to learn, to deploy, and to maintain. These solutions need to be more flexible and adaptable to changes in the on-demand competitive business environment.

7 Summary

Although we are convinced that business intelligence is the proper road for business enterprises to follow, it will still be quite a journey. The work described in this paper is only a part of related research. Business intelligence systems should provide not only the capabilities to analyze what has occurred, but more importantly to tell the users what is going to happen in their enterprises, by using intelligent information processing techniques [107]. As ubiquitous computing technology increases the demand for enterprise computing, it is expected that this trend will apply to BI as well [34]. BI will continue to embrace cutting-edge technology and techniques, and will open new applications that will impact industrial sectors [73, 101].