Keywords

1 Introduction - Background

The stock and bond markets are critical components of a capitalist economy. The efficiency, liquidity, and resiliency of these markets depend on the ability of investors, lenders and regulators to assess the financial performance of businesses that raise capital. Financial statements prepared by such organizations play a very important role in keeping capital markets efficient. They provide meaningful disclosures of where a company has been; where it is currently and where it is going. Most financial statements are prepared with integrity and present a fair representation of the financial position of the organization issuing them. These financial statements are based on generally accepted accounting principles (GAAP), which guide the accounting for transactions.

Unfortunately, financial statements are sometimes prepared in ways that intentionally misstate the financial position and performance of an organization. Such misstatements can result from manipulating, falsifying, or altering accounting records. Misleading financial statements cause serious problems in the market and the economy. They often result in large losses for investors, lack of trust in the market and accounting systems, and litigation and embarrassment for individuals and organizations associated with financial statement fraud.

Specifically, according to Wells (2005), financial statement fraud is harmful in many ways. These cases are: Firstly, undermines the reliability, quality, transparency and integrity of the financial reporting process, secondly jeopardizes the integrity and objectivity of the auditing profession, especially auditors and auditing firms for example Andersen, thirdly, diminishes the confidence of the capital markets, as well as market participants, in the reliability of financial information, fourthly makes the capital markets less efficient, fifth adversely affects the nation’s economic growth and prosperity, sixth results in huge litigation costs, seventh destroy careers of individuals involved in financial statement fraud, eighth causes bankruptcy or substantial economic losses by the company engaged in financial statement fraud, ninth encourages regulatory intervention, tenth causes devastation in the normal operations and performance of alleged companies, eleventh raises serious doubt the efficacy of financial statement audits and finally erodes public confidence and trust in the accounting and auditing profession.

According to the Association of Certified Fraud Examiners’ (ACFE’s) in its report to the nation on occupational fraud and abuse (2014), the average financial statement fraud by survey respondents is over US $1 million. Financial statement frauds, such as the WorldCom and Enron frauds, can overstate income by billions of US dollars.

Furthermore “public statistics on the possible cost of financial statement fraud are only educated estimates, primarily because it is impossible to determine actual costs since not all fraud is detected, not all detected fraud is reported, and not all reported fraud is legally pursed” (Rezaee 2002). Therefore, financial statement fraud combined with audit failure, increase the interest of investors, lenders and regulators.

As a result, there is the requirement of investors, lenders and regulators to learn how to detect financial statement fraud more effectively. Therefore, this research aims to investigate how the investors, lenders and regulators can detect financial statement fraud. Section 2 refers in details the specific efforts of previous researchers in detecting financial fraud. Section 3 refers to the proposed methodology. Section 4 is a discussion of our findings. Section 5 provides a conclusion to our research

We employ well-established machine learning techniques to identify the factors which are actually connected with the financial statement fraud. Moreover, we provide intelligent, non-parametric models for the identification of financial fraud observational financial data of any company. Also, this research compares the effectiveness of different tools to detect fraud and find out the gaps existed between the judgments of the experts and different prediction model.

2 Review of Related Literature

There are many different types of fraud, as well as a variety of data mining, and research is continually being undertaken to find the best approach for each case (West 2015). Data mining refers to any method that processes large quantities of data to derive an underlying meaning. Within this classification (West 2015) will consider two categories of data mining: statistical and computational. The statistical techniques are based on traditional mathematical methods, such as logistic regression and Bayesian theory. Computational methods are those who use modern intelligence techniques, such as neural networks and support vector machines. Also (West 2015) consider that these categories share many similarities, but the main difference between them is that computational methods are capable of learning from and adapting to the problem domain, while statistical methods are more rigid. In this research, we examine both types of data mining. Specifically, in this research, we compare the performance of two data mining methods including Naves Bayes, and K-nearest neighbours.

The first researchers (Zhang et al. 1998,) who investigated the fraud detection focused heavily on statistical models such as logistic regression and neural networks. Recent fraud detection research has been far more varied in methods studied, although the former techniques are still popular (West 2015). The most recent studies like Kirkos et al. (2007), Ravisankar et al. (2011), which have examine the financial statement fraud used classification methods to detect fraud. Classification is a data mining method that separates a list of unknown samples into one of several discrete classes (Ngai et al. 2011). Binary classification is a simplified case in which there exists only two possible categories (such as fraudulent and non-fraudulent). In contrast, regression is a traditional statistical method that has been used extensively in data mining for many years. It aims to expose relationships between a dependent variable and a set of independent variables (Ngai et al. 2011).

Kirkos et al. 2007 compared statistical methods with neural networks to identify fraudulent Greek manufacturing companies. Also in 2011, Ravisankar et al. 2011 compared a large range of methods to identify financial statement fraud within Chinese companies. In addition, to supporting vector machines they looked at genetic programming, logistic regression, group method of data handling, and variety of neural networks Ravisankar et al. Also Bose and Wang (2007) compared neural network and decision tree to explore financial statement fraud with financial items from a selection of public Chinese companies. Furthermore, Humpherys et al. (2011) used text mining techniques to investigate the financial statement fraud with managerial statements for US companies. Zhou and Kapoor (2011) looked at common behaviours that are frequently present for financial statement fraud and created a framework to be used for designing detection methods.

The identification of financial fraud is difficult or even impossible by using first principles approach. According to the Institute of Internal Auditors (2001) a fraud examiner commonly uses the following techniques to identify the relationships among the financial data that do not appear reasonable:

  • Comparison of current period information with similar information from prior periods. Prior period amounts normally are assumed to be the expectation for the current period. A modification of this comparison is the incremental approach whereby prior period numbers are adjusted for known changes, such as significant purchases or sales of assets and changes in lines and volumes of business.

  • Comparison of current period information with budgets or forecasts. This comparison should include adjustments for expected unusual transactions and events.

  • Study of relationships among elements of information. Certain accounts vary in relation to others, both within a financial statement and across financial statements. For instance, commissions are expected to vary directly in relation to sales.

  • Study of relationships of financial information with the appropriate non-financial information. Non-financial measures are normally generated from an outside source. An example would be retail stores where sales are expected to vary with the number of square feet of shelf space.

  • Comparison of information with similar information from the industry in which the organization operates. Industry averages are reliable in stable industries. Unfortunately, industry trade associations require months to compile, analyze, and publish information; therefore, the data may not be timely.

  • Comparison of information with similar information from other organizational units. A company with several stores might compare one store with another store. The “model” store should be sufficiently audited to assure that it is an appropriate standard.

As we can conclude for the above procedure about the techniques which a fraud examiner uses to detect financial fraud appear many gaps. On the other hand computational intelligence and statistics help to anticipate and quickly detect fraud and take immediate action to minimize costs.

However, we assume that there exists a relationship between specific financial attributes and the existence or absence of financial fraud (outcome). This potential relationship between these factors and the outcome is not exactly known due to the inherent uncertainty Parsons (1996), Ren et al. (2009), of the financial data. As a consequence, we are dealing with the problem as a ‘black box’ system. The input of the system is a set of specific attributes (factors), while its output is the outcome of these attributes, caused by the system in a way which is not exactly known. The only knowledge we have about the operation of the system arises from specific observations regarding what outcome causes specific inputs (attributes). The target of modeling is building a model (i.e. a mathematical function) for simulating the unknown system. That is a model that delivers the same outcome as the unknown system on the given data set of observations.

Over the years, various computational methods have been used for fraud detection and, like other similar problems; successful implementation of the detection methods depends on having a clear understanding of the problem domain. While some prior researchers have focused on the common issues such as problem representation for machine learning techniques problems, in general, there has been almost no analysis from the perspective of fraud detection which we aim to address here. The implementation of these techniques follows the same information flow of machine learning techniques processes in general.

3 The Proposed Methodology

We formulate the problem of financial fraud detection as a classification problem, assuming that the existence or the absence of financial fraud depends on specific quantitative financial attributes. These attributes, listed in Table 1, are the input to the classifier. The output of the classifier is either ‘1’ = FFSs (Financial Fraud Statement) or ‘0’ = Non-FFSs, indicating the existence or the absence of fraud, respectively. If sufficient historical data (instances, in the form attribute-label) exist, then the classifier’s workflow can be directed at increasing the chances to capture the opportunities for preventing loss by identifying and verifying potential financial fraud.

Table 1. The number of firms per sector

In this research, we follow CRISP-DM approach which follows the following steps: (i) Business Understanding, (ii) Data Understanding, (iii) Data preprocessing, (iv) Modeling, (v) evaluation, and (vi) Deployment. Business understanding phase was presented in Sect. 2. In this section, data collection, data understanding and modeling are discussed. Section 4; explain the findings, evaluation and deployment phases.

3.1 Data Collection/Description

A sufficient number of samples should be collected after the definition of candidate attributes. These samples are raw data and usually needs preprocessing for detecting potential outliers and missing values. Another important preprocessing data step is the normalization of attributes.

The selection of data sample is aimed to create models which will be capable of detecting the falsification in financial statements. For this reason, several factors have been examined. One of the most important factors is the sector of enterprises because the sector of enterprises affects their financial profile. Our main sources for data were the published financial statements and their notes from the Athens Stock Exchange database.

Initially, our sample contained data from 231 Greek listed on the Athens Stock exchange since 2002–2015. Our sample contains 2469 observations. We analyze the number of firms per sector in Table 1, after excluding the sectors of banking, utilities, and financial services, from the sample.

According to Spathis et al. (2002b), and Kirkos et al. (2007), the classification of the financial statement as fraud was based on the following parameters:

  • The inclusion in the auditors’ reports of opinions of serious doubt as to the correctness of accounts,

  • The observations by the tax authorities regarding serious taxation intransigencies which seriously alter the company’s financial statements,

  • The application of Greek legislation regarding negative net worth,

  • The inclusion of the company in the Athens Stock Exchange categories of “under observation” and “negotiation suspended” for reasons associated with falsification of the company’s financial data and

  • The size of the auditor firm.

After the selection of the fraud sample, we searched for a non-fraud sample from the same sources. The choice of the non-fraud enterprises was carried out by using the matching method Hunt and Ord (1988), Sibley and Burch (1979). The matching method is a common practice in financial classification researches such as bankruptcy, mergers, acquisitions, etc. Beaver (1966),. There are two main reasons which we use the method of matching. The first reason is the high cost and the time which is needed for the selection of sample Bartley and Boardman (1990) and the second reason is the higher information which contained in this sample in compare of a random sample Cosslett (1981), and Palepu (1986).

Therefore, the main criterion for the similarity of the two samples is the period Stevens (1973). The criterion of period refers to the changes in a country’s macroeconomic environment and has an impact on economic conditions and business decision making. Also, there is one more main criterion which is the sector and the total assets. Stice (1991) referred that the sector and the size are the most important factors for the matching method.

On the other hand, the matching method has accepted criticisms. Ohlson (1980) refers that the criteria which used for the matching method tend to be arbitrary. Also, Ohlson (1980) refers that there is not absolutely clear the advantages process of the matching method. Ohlson (1980) suggests that is more preferable to use the different factors as independent variables of the sample than to use for the purpose of matching method.

3.2 Candidate Attributes

This paper adopted the related attributes based on prior researchers, who study the FFS. Such work carried out by Spathis et al. (2002a, b), Fanning and Cogger (1998), Persons (1995), Stice (1991), Feroz et al. (1991), Loebbecke et al. (1989), and Kinney and McDaniel (1989) contained suggested indicators of FFS. So there are a number of attributes which considered more possible to lead in the falsification of the financial statement. The financial ratios, examined in this research appear in Table 2.

Table 2. The list and description of candidate attributes.

3.3 Data Preprocessing

Data preprocessing involves several steps, for preparing cleansing and normalizing the raw data before being used for modeling. Missing values is one of the most common issues that the data preprocessing should face. In this work, we entirely remove a sample from a data set if one or more attributes of the sample have missing values. In addition, in this work, we performed the normalization step by linearly mapping each attribute’s value from its actual range within the interval \( \left[ {0,1} \right] \). In the next step, we considered as outliers those instances (companies) having extreme or out of feasible range values for some attributes. Outliers were removed from the data set before applying any modeling technique.

We use wrapper based methods as they tend to deliver more accurate results than filter based ones Monroe and The (1993). A particular model is used as wrapper and different subsets of attributes are sequentially presented to it according to forward inclusion approach.

3.4 Description of Employed Models (Wrappers)

We use particular models from established paradigms of machine learning and from statistics. More specifically, we use K-Nearest Neighbor as a representative from “instance-based learning”; From statistics, we use Naïve Bayes method from the “Bayesian paradigm”. Although, a lot of variations of each model exist, however, we apply the “principal” model which we consider as “representative” of each paradigm.

The main advantage of K-Nearest Neighbor Classifier is a very simple classifier that works well on basic recognition problems. The main disadvantage of this approach is that the algorithm must compute the distance and sort all the training data at each prediction, which can be slow if there are a large number of training examples. On the other hand, the first advantage of Naïve Bayes Classifier is fast to train fast to classify, Second in not sensitive to irrelevant features. Thirdly it handles real and discrete data and finally the Naïve Bayes Classifier handles streaming data well. Also the main disadvantage is that it assumes strong feature independence assumption.

4 Experimental Results

4.1 Comparison with Factor Importance

Overall, in Table 3 appears the comparison results from all the methods of machine learning techniques. Also, in Table 3 shows the fraud factors in different methods and the comparison of empirical data result. In addition, Table 3 indicates the importance of attributes included in prediction models. The most important category of fraud detection is “poor performance”. All factors effects are consistent with prior researches. The top seven fraud factors are a log of Total Debt, Equity, Debt to Equity, a log of total assets, net fixed assets to total assets, cash to total assets and sector. Furthermore, the Profitability, Liquidity, Solvency, Activity and Structure ratios are significant predictors for fraud detection. Specifically, the significant ratios which are the most important for fraudulent financial statements appeared in Table 3 and analyzed following.

Table 3. Comparison with factor & predict the importance

Leverage proxies is a significant result as an indicator for fraud analysis. These ratios are consistent with Spathis et al. (2002b) while and Fanning and Cogger (1998) which suggest that firms with higher debt to equity ratios would be a good indicator for fraudulent firms. Furthermore, it means that firms with a high total debt to total equity value have an increased probability to be classified as fraudulent firms. Previous studies such as Persons (1995) supported that the high debt structure it is possible to motivate in the FFS. In addition, Loebbecke et al. (1989) concluded in their research that 19% of firms of their sample appeared solvency problems.

Lower liquidity may be an incentive for managers to engage in fraudulent financial statements. This argument is supported by Kreutzfeldt and Wallace (1986) who discovered that firms with liquidity problems have significant more errors in their financial statement than firms without liquidity problems. In this research, the most important liquidity ratios which associate with the fraudulent financial statement are the Working capital, Current assets to Current liabilities and Cash to Total Assets.

Furthermore, lower profit may give management incentive to overstate revenue or understate expenses. Kreutzfeldt and Wallace (1986) discovered that firms with profitability problems have significantly more errors in the financial statement than firm without profitability problems. This approach is based on the expectation that management will be able to maintain or improve past levels of profitability Summers & Sweeney (1998). If this expectation is not met by actual performance, then it motivates the fraudulent financial statement. Financial distress is a motivation for fraudulent financial statements Loebbecke et al. (1989), Kreutzfeldt and Wallace (1986). In this research, the most important profitability ratios for FFS are gross profit to total assets, net profit to total assets, net income to fixed assets and EBIT to total assets.

Capital Turnover proxies by receivables to revenue also have significant results. High ratios of account receivables to sales and inventory to sales are consistent with research suggesting that accounts receivables are an asset with a higher incidence of manipulation. Also, asset composition proxies by inventory to total assets indicate significant results. In addition, our research concludes that the size of the firm is statistically significant and measured by total assets. Finally, ratios sales growth, sales to total assets sales minus gross margin inventory net fixed assets to total assets equity to total liabilities, and P/E are significant in the detection of the fraudulent financial statement.

This result supported by the result of the research with the rate of correct classification which analyzed in the next section.

4.2 Comparison with Predict Performance

Performance evaluation is the final step of the framework which is used for measuring the performance and judging the efficacy of machine learning techniques.

The pre-processed dataset was further randomly divided into training and testing sets via K-fold cross-validation. A typical experiment uses K = 5. The sample was divided 5-fold via stratified 5-fold cross-validation. Each fold contained equal numbers of fraud and non-fraud cases. Each fold of the sample was used individually to define parameters and train classifiers, while the remaining five folds were used as test sets to assess the sample performance. After the parameters were set and the classifiers have trained the methods were evaluated by applying them to the test sets. Finally, the average classification accuracy of the test sets was calculated. After preparation of the 5-fold cross validation datasets these datasets were used by the two classifiers. The proposed ensemble of classifiers was developed and validated based on the classifier results.

Besides classification accuracy, this research also used misclassification cost. Generally, misclassification cost is associated with two error types. A type error I occur when a non-fraud case is classified as a fraud class. Meanwhile, a type II error is committed when a fraud case is classified as a non-fraud class. The misclassification costs associated with type II errors are reportedly much higher than those associated with type I error West et al. (2014). Classifying a fraud case into a non-fraud class may result in incorrect decisions about economic damage. Moreover, classifying a non-fraud case into a fraud class may result in expenses and excess time associated with the additional investigation.

The 5-fold cross validation performances of the six classification methods were calculated and compared. The KNN has the higher average accuracy (89,11%), and Naives Bayes has the lowest accuracy (68,29%) respectively.

The confusion matrix for KNN and NB are presented in Tables 4 and 5. Also, performance matrix indicating the sensitivity (type I error) and specificity (type II error) of the two methods which are used in this research. Sensitivity (type I error) and specificity (type II error) have been used as a metrics for performance evaluation. The sensitivity is the measure of the proportion of the number of fraudulent companies predicted correctly as fraudulent by a particular model to the total number of actual fraudulent companies. The specificity is the measure of the proportion of the number of non-fraudulent companies predicted as non-fraudulent by a model to the total number of actual non-fraudulent companies. In both cases, we presented the average accuracies, Sensitivity, specificity, accuracy, error rate, precision (Table 6).

Table 4. Confusion matrix for KNN
Table 5. Confusion matrix for Naive Bayes
Table 6. Sensitivity, specificity, accuracy, error rate, precision

5 Conclusion

Reasons for committing financial statement fraud include improving stock performance, reducing tax obligations or as an attempt to exaggerate performance due to managerial pressure Ravisankar, et al. (2011). Financial statement fraud can be difficult to diagnose because of a general lack of understanding of the field, the infrequency in which it occurs, and the fact that it is usually committed by knowledgeable people within the industry who are capable of masking their deceit Maes et al. (2002). This research studied intelligent approaches to fraud detection, both statistical and computational. There is also the opportunity to examine the performance of existing methods by adjusting their parameters, as well as the potential to study cost-benefit analysis of computational fraud detection. Finally, further research into the differences between each type of financial fraud could lead to a general framework which would greatly improve the accuracy of intelligent detection methods.