1 Introduction

Data mining or big data analysis has been recognized as an important and challenging task for many problems in daily life. To perform big data analysis or data mining, a specific dataset for a chosen target problem is collected. However, in practice, the collected dataset usually contains some proportion of incomplete data that have one or more missing attribute values. There are many reasons for incompleteness in datasets, arising from a variety of sources, the database system per se, the network, improper, mistaken data entries, and so on.

According to Strike et al. (2001) and Raymond and Roberts (1987), when the dataset contains a very small amount of missing data, e.g. the missing rate is less than 10% or 15% for the whole dataset, the missing data can simply be removed from the dataset without having a significant effect on the final mining or analysis result. However, when the missing rate exceeds 15%, careful consideration needs to be given to dealing with these missing data (Acuna and Rodriguez 2004). It should be noted that this does not mean that every domain problem dataset follows this kind of rule. Often small amounts of missing data may contain important information that cannot be ignored, such as the records containing very high amount of money spent by some consumers but some of their personal information is missing, e.g. age, income, education, etc.

Unlike the case deletion strategy, missing value imputation (MVI) is the solution method most commonly used to deal with the incomplete dataset problem. In general, MVI is a process in which some statistical or machine learning techniques are used to replace the missing data with substituted values. Statistical techniques, such as mean/mode and regression, have been applied for this purpose, for several decades (Little and Rubin 1987), with machine learning techniques, such as the k nearest neighbor, artificial neural network, and support vector machine techniques being employed in the last 10 years (Garcia-Laencina et al. 2010).

There is variety of MVI techniques suitable for application to different domain problems. A large number of surveys of MVI from different perspectives have already appeared in the literature, such as for operation management (Tsikriktsis 2005), medical problems (Aittokallio 2009; Donders et al. 2006; Harel and Zhou 2007; Liew et al. 2011), pattern classification (Garcia-Laencina et al. 2010), and questionnaires and surveys (Baraldi and Enders 2010; De Leeuw 2001).

Most of these surveys have mainly focused on describing the basic concepts of the relevant MVI techniques. However, from the experimental design procedure viewpoint, there are many technical issues that have not been adequately reviewed and analyzed. For example, it is not known which technique(s) are the most widely used, what kinds of domain problem datasets have been studied, how many dataset missing rates are considered in the simulations, and so on.

Therefore, unlike previous surveys, this survey provides statistical analyses of technical questions related the experimental design procedure. Specifically, 111 journal papers published over the past decade, from 2006 to 2017, are reviewed and analyzed. In addition, some limitations of related works are also discussed for indication of possible future research directions.

The rest of this paper is organized as follows. Section 2 describes the commonly used experimental design procedure for MVI. The related literature for each of the major components of the procedure is analyzed, with Sects. 3, 4, and 5 focused on the datasets used, as well as the related information including missing rates and missingness mechanisms, MVI techniques, and evaluation metrics, respectively. Section 6 discusses the limitations of related work and Sect. 7 offers some conclusions.

2 The experimental design procedure for missing value imputation

There are three technical issues that need to be considered in the experimental design procedure for MVI outlined in Fig. 1. The first one is the chosen datasets for related experiments. The experimental dataset may contain a number of missing data or it may be a complete dataset. For complete datasets, a missing value simulation is performed. That is, the chosen dataset is simulated with different missing rates (e.g., 10% or 20%) using three different missingness mechanisms: missing completely at random (MCAR), missing at random (MAN), and missing not at random (MNAR). This results in different incomplete datasets having different proportions of missing data.

Fig. 1
figure 1

The experimental design procedure for MVI

The second technical issue is the techniques used for missing value imputation. During the imputation process, each incomplete dataset can be divided into a set of complete data and a set of missing data. The former is used for the ‘estimation’ of suitable values by different imputation methods to replace the missing values in the set of missing data. Consequently, this produces a ‘pseudo’ complete dataset for later data mining or analysis tasks, if any.

The third technical issue is performance evaluation of imputation results. The most straightforward method to evaluate the performance of the imputation method is to assess the differences between the real values in the original dataset and the estimated values in the ‘pseudo’ dataset. Another method involves using the ‘pseudo’ dataset to perform some data mining task, such as classification or clustering, then observing the final mining performance.

According to the three technical issues, related literatures are reviewed and analyzed by their experimental datasets collected, MVI techniques used, and performance evaluation methods considered, which are discussed in Sects. 3, 4, and 5, respectively.

3 Analysis of experimental datasets

3.1 Dataset domains

Table 1 shows the number of works using each type of dataset from the 111 related studies. As can be seen, most researchers use the UCI (University of California at Irvine) datasetsFootnote 1 for their experiments. Often, several UCI datasets are used in each study, which can include a variety of domain problems. In contrast, medical related datasets, such as microarray or gene datasets, are the most widely considered domain problem in MVI. Other domain problems considered less often include image data, software measurement and project, financial data, questionnaire based data.

Table 1 The numbers of works using different domain datasets

The results indicate that in most MVI studies, more than one specific domain problem is considered. The advantage of doing this is to prove the domain scalability of the MVI method used. However, further analysis of the dataset characteristics, including the number of features (i.e., attributes) and data samples, allows some limitations of past work to be identified. One major limitation is the problem of dataset size. That is, most studies use small scale UCI datasets that contain small numbers of features and/or data samples, with the number of features ranging from 4 to 89 and the usual number of data samples ranging from several hundred to thousands. Some exceptions are Folino and Pisani (2016) and Farhangfar et al. (2007), who used very large scale datasets containing a very high number of feature dimensions, i.e., 216, and a very large amount of data samples, i.e., 581,012 and 256,000, respectively.

Another limitation of past studies is that although there are three different types of features, categorical, numerical, and mixed types, very few have analyzed differences in performance of MVI methods between different feature types. There are, however, two exceptions to this, namely, Tsai and Chang (2016) and Stekhoven and Buhlmann (2012).

3.2 Missing rates

In general, most studies have examined imputation performance by performing different missing value simulations over a chosen dataset using different missing rates. Some have considered very small missing rates, e.g., less than 30%, while others have focused on large ranges of missing rates, such as from 5 to 80%. Figure 2 shows the number of works that consider missing rates that are less than 30%, between 30 and 50%, and greater than 50%.

Fig. 2
figure 2

Number of works using missing rates that are less than 30%, between 30 and 50%, and greater than 50%

As we can see from the figure, most studies discuss missing rates that are less than 30% with only twelve works considering missing rates that are larger than 50%. Of the studies using very large missing rates, seven of them used the UCI datasets (Eirola et al. 2013; Kapelner and Bleich 2015; Kiasari et al. 2017; Mesquite et al. 2017; Purwar and Singh 2015; Qin et al. 2009; Zhu et al. 2011), one the Digital Bibliographic Library Browser (DBLP) datasetFootnote 2 (Li et al. 2014), one the wireless sensor network dataset (Li and Parker 2014), one the medical dataset (Janssen et al. 2010), one the traffic flow dataset (Chen et al. 2017), and one a synthetic dataset (Graham et al. 2007).

In short, most of the datasets which have been used are relatively small, containing several hundreds to thousands of data samples, in contrast to the DBLP and wireless sensor networks datasets which contain 10,000 and 12,000 data samples, respectively.

3.3 Missingness mechanisms

According to Little and Rubin (1987), there are three types of missingness mechanisms that can cause an incomplete dataset. They are missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR). MCAR occurs when the probability of an instance (case) having a missing value for an attribute does not depend on either the known values or the missing data.

On the other hand, MAR occurs when the probability of an instance having a missing value for an attribute may depend on the value of that attribute. In other words, when the distribution of an instance having missing values for an attribute depends on the observed data, but does not depend on the missing data. NMAR occurs when the probability of an instance having a missing value for an attribute may depend on the value of that attribute.

Therefore, there are three different ways to artificially simulate a collected dataset as an incomplete dataset containing a controlled missing rate. Figure 3 shows the number of works detailing simulations of the three types of missingness mechanisms.

Fig. 3
figure 3

Number of works discussing the different missingness mechanisms

The results show that most researchers have only used the MCAR mechanism for their experiments. Very few (i.e., 15 works) have considered all three mechanisms for each chosen dataset. Specifically, seven have used the UCI datasets (Garciarena and Santana 2017; Kapelner and Bleich 2015; Pan et al. 2015; Tian et al. 2014; Twala 2009; Twala et al. 2008; Valdiviezo and Van Aelst 2015; Xia et al. 2017; Zhu et al. 2012), three the software measurement/project datasets (Khoshgoftaar and Van Hulse 2008; Song et al. 2008; Van Hulse and Khoshgoftaar 2014), one for the medical dataset (Armitage et al. 2015), and two synthetic datasets (Ding and Simonoff 2010; Hapfelmeier and Ulm 2014).

4 Missing value imputation techniques

In general, missing value imputation techniques can be classified into two types, namely, statistical and machine learning based techniques (Aittokallio 2009; Garcia-Laencina et al. 2010). Related studies have considered some of these techniques as the basis for relevant experiments regardless of whether the work focuses on proposing a novel imputation technique or on comparing some chosen imputation techniques for specific domain problems.

The following subsections describe the analysis related to the questions as to what kinds of baseline techniques have been used for MVI and which is the most popular. Note that describing the concepts of these statistical and machine learning based techniques is not the main focus of this paper.

4.1 Statistical techniques

Table 2 lists the statistical techniques that have been used in studies published from 2006 to 2017. As we can see, expectation management (EM), linear regression (LR), least squares (LS), and mean/mode are the top four most widely used statistical techniques, having been applied in 23, 15, 33, and 28 works respectively.

Table 2 The statistical techniques used in the literature

Among these top four statistical techniques, the mean and mode methods are the simplest imputation methods for imputing numerical and categorical attribute values, respectively. In the mean approach, missing attributes are filled in by the average value of that attribute in all the observed data. On the other hand, the mode approach uses the attribute value in all the observed data that appears most often to fill in the missing attribute values.

In the EM algorithm, it consists of two steps where the E-step calculates the expectation of the complete data sufficient statistics given the observed data and current parameter estimates, and the M-step updates the parameter estimates through the maximum likelihood approach based on the current values of the complete sufficient statistics. The EM algorithm then proceeds in an iterative manner until the difference between the last two consecutive parameter estimates converges to a specified criterion. According to the final parameter estimates and the observed data, the expectation of each missing value can be calculated, which will be used as the imputation value.

For regression based imputation methods, the relationships among attributes are estimated, and then the regression coefficients are used to estimate the missing attribute values. Particularly, linear regression and logistic regression are used for the prediction of numerical and categorical attribute values, respectively. In general, the method of least squares (LS) (or ordinary least squares) is used in linear regression to produce the final estimation by minimizing the measured and predicted values of the attributes.

More specifically, for the category of least squares (LS), various estimation techniques can be used to replace ordinary least squares (OLS) to produce the final prediction result, such as ILLS, LTS, LSA, LLS, NIPALS, OLS, PLS, and SLLS.

Figure 4 shows the year-wise distribution of works applying the top four statistical techniques. Although LS is the most widely used MVI technique, there has had been no related work considering this technique within the last 3 years, except for Pati and Das (2017). Instead, researchers have tended to prefer the EM, LR, and mean/mode techniques. Of these, the mean/mode technique is the second mostly widely used, having been used in publications appearing each year from 2006 and 2017. This survey indicates that the mean/mode technique should be regarded as one of the representative baseline statistical MVI techniques.

Fig. 4
figure 4

The year-wise distribution of the number of works using the EM, LR, LS, and mean/mode techniques

Regarding the researches using one of the top four statistical MVI techniques, we can further analyze the relationships between these techniques and their experimental datasets, which are shown in Table 3.

Table 3 The number of works for the relationships between the top four statistical MVI techniques and their experimental datasets

As we can see that the major statistical MVI technique for medical domain datasets is LS. This indicates that most medical domain datasets contain the numerical data type of missing values. Moreover, these medical domain datasets are usually simulated with the missing rates that are smaller than 30%, and related studies using the LS techniques only consider the MCAR and MAR missingness mechanism. On other hand, EM, LR, and mean/mode are the widely used statistical MVI techniques for UCI datasets where the simulated missing rates mostly range from 30 to 50%.

4.2 Machine learning based techniques

Table 4 lists the machine learning based techniques that have been applied in the literature from 2006 to 2017. The top four techniques are clustering, DT, KNN and RF, which have been used in 14, 17, 52, and 9 related works, respectively.

Table 4 The machine learning based techniques used in the literature

Among the top four machine learning based techniques, cluster analysis is only the unsupervised learning technique whose task is to group a set of similar objects into the same clusters. Specifically, each cluster center (or centroid) is the mean of the objects in the same cluster. To impute the missing values, the distance between incomplete data and the identified cluster centroids is calculated where the closest centroid’s values are used to fill in missing values.

On the other hand, KNN is one representative supervised learning (classification) technique where missing values are imputed using the values calculated from the k nearest observed data. In particular, the nearest neighbors can be identified by some specific distance function, usually the Euclidean distance. For missing value imputation, the missing data is used as the testing case, in which the complete and missing attributes represent the input features and output class label (or prediction), respectively. Next, its k nearest observed data using the complete attributes can be identified whose class label is used to impute the missing attribute.

In DT, it is a tree-like model that each internal node denotes a test of an attribute and each branch represents an outcome of the test. The leaf nodes represent classes or class distributions. The upper-most node in a tree is the root node with the highest entropy. In the tree-growing process, the attribute having the highest information gain is chosen to split the node into child nodes. In related literatures, C4.5/5.0 and CART are used for imputing categorical and numerical attribute values, respectively.

About RF, multiple decision trees are constructed based on the bootstrapping procedure and the final predictions are given by averaged values or majority votes of each tree’s prediction. The imputation process by DT and RF is similar to KNN where the internal and leaf nodes represent the input features and output class label, respectively.

Figure 5 shows the year-wise distribution for the number of works that have used the top four machine learning based techniques. KNN is no doubt the most popular MVI technique in this category, and can be regarded as the most representative baseline machine learning based MVI technique. On the other hand, the clustering and RF techniques have recently been utilized in a number of studies, whereas DT has been consistently used each year from 2006 and 2016.

Fig. 5
figure 5

The year-wise distribution for the number of works that have used the DT, clustering, KNN, and RF techniques

According to the researches using one of the top four machine learning based MVI techniques, Table 5 shows the relationships between these techniques and their experimental datasets.

Table 5 The number of works for the relationships between the top four machine learning based MVI techniques and their experimental datasets

Regarding Table 5, KNN is the most widely used machine learning based MVI technique, especially for UCI and medical datasets. However, for the questionnaire datasets, related studies have never considered machine learning based MVI techniques before. On other hand, for the missing rates, if we count the number of works by DT and RF (an ensemble of DTs) together, very few studies simulated with the missing rates that are smaller than 30% (i.e. 3 out of 22) whereas 13 and 6 works consider the 30–50% and larger than 50% missing rates, respectively. This is different from clustering and KNN that most studies simulated with the missing rates that are smaller than 30% (i.e. 10 out of 16 and 34 out of 60, respectively).

5 Evaluation methods

5.1 Direct evaluation

The final step after imputation of the missing values is to evaluate the imputation results. The most commonly used method is to directly assess the difference between the original values in the collected dataset and the estimated or predicted values in the simulated incomplete dataset. There are two types of attribute values, namely, discrete and continuous. For evaluation of results of the imputation of discrete values, the percentage of values that have been predicted correctly (or incorrectly) for the missing values is usually used, e.g. Nishanth and Ravi (2016) and Valdiviezo and Van Aelst (2015). The percentage of correct predictions (PCP) can be obtained by

$$ {\textit{PCP}} = 100 \times \frac{number\,of\,correct\,predictions}{total\,number\,of\,predictions} $$
(1)

On the other hand, for the imputation of continuous values, the mean absolute percentage error (MAPE) and/or root-mean-square error (RMSE) related measures are used. Gautam and Ravi (2015) and Silva-Ramirez et al. (2015). MAPE and RMSE can be computed by Eqs. (2) and (3), respectively.

$$ {\textit{MAPE}} = \frac{100}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{x_{i} - \hat{x}_{i} }}{{x_{i} }}} \right| $$
(2)
$$ {\textit{RMSE}} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left( {x_{i} - \hat{x}_{i} } \right)^{2} } } $$
(3)

where \( x_{i} \) is the actual value, \( \hat{x}_{i} \) is the predicted value and n is the total number of missing value.

5.2 Classification accuracy

Another strategy for assessing the imputation quality is to examine the classification performance of some chosen classifiers trained by the imputed datasets. Different from the direct evaluation strategy, after the imputation process is completed, the imputed dataset without missing values is used to train some specific classifier(s), and another testing set is chosen to test their classification performance. Since using different imputation methods for the same incomplete datasets is likely to produce different imputation results, the classifier with higher classification accuracy is indicated by the better imputation quality of its training and datasets. Consequently, the better imputation methods can be identified.

The proportion of related works that utilize this type of evaluation strategy is much smaller than the number utilizing the direct evaluation strategy. Moreover, studies which consider both evaluation strategies at the same time are even rarer. Figure 6 shows the number of works using these evaluation methods and Table 6 lists the related works that consider the classification accuracy of classifiers and two evaluation methods at the same time. As we can see that very few studies consider both evaluation methods at the same time.

Fig. 6
figure 6

The number of works using the different evaluation methods

Table 6 Evaluation strategies of related works

Ten different classification techniques have been employed in related works for analysis of the classification accuracy of the classifiers, as shown in Fig. 7. As can be seen, KNN, DT, NB, SVM, and MLP are the top five classifiers constructed for evaluating the imputation results.

Fig. 7
figure 7

The number of works using the ten different classifiers

5.3 Computational time

In addition to the two afore-mentioned evaluation strategies, it is also important to take into consideration the computational time for each MVI method. This issue is especially critical for machine learning based MVI techniques which require some time for the model training step. Moreover, when the size of the dataset as well as the missing rate is very large, the imputation process is likely to take a lot of time. Among the articles reviewed, only 16 examined the computation time, these are Huang et al. (2017), Kiasari et al. (2017), Saha et al. (2017), Valdiviezo and Van Aelst (2015), Li and Parker (2014), Shah et al. (2014), Tian et al. (2014), Aydilek and Arslan (2013), Liu et al. (2013), Rahman and Islam (2013), Stekhoven and Buhlmann (2012), Zhu et al. (2012), Zhang et al. (2011), Tuikkala et al. (2008), Farhangfar et al. (2007), and Lin et al. (2006).

5.4 Missing data simulation strategies for missing value imputation

The simulation of an incomplete dataset with a specific missing rate is usually performed several times in order to avoid producing biased imputation results. This is because the missing data in the incomplete dataset can be different for each simulation even with the same missing rate. In general, there are two strategies used to perform a missing data simulation. The first one is to directly use the whole of the chosen dataset, making it become an incomplete dataset based on a specified missing rate.

The second strategy is to first divide the chosen dataset into training and testing subsets by some method such as n-fold cross validation (CV) (Kohavi 1995), or by adopting fixed proportions for the training and testing subsets, e.g. 70% and 30%, respectively. Then, either the training or testing subset is used to perform the missing data simulation for a specific missing rate.

Figure 8 shows the number of works that have used the different strategies, including the whole dataset (denoted as whole), training set obtained by cross validation (CV-training), and the testing set obtained by cross validation (CV-testing). Note that studies that do not clearly describe their simulation strategies are not counted here.

Fig. 8
figure 8

The number of works using the whole, CV-training, and CV-testing simulation strategies

This result shows that most studies use the whole dataset strategy, with the imputation results usually evaluated by the direction evaluation method. Only a small proportion of related studies use the cross validation method for missing data simulation. It should be noted that there is no study which has considered both training and testing subsets when performing missing data simulation for a specific missing rate. This simulation method is much closer to real world problems based on historical data where the collected data contain some missing values and the new unknown testing data may also contain some missing values.

6 Discussion

The above results of analysis of the related literature show the existence of some limitations related to technical issues in the experimental procedure, which can be regarded as future research directions for MVI. They are discussed in greater detail below.

6.1 The chosen datasets

The domain problem datasets for MVI can be broadly classified into two categories. The first type is based on a number of UCI datasets that cover various domain problems. The second is based on specific (real world) domain datasets. Although a survey of related works in the first category study show variety of different domain datasets, they are not as large in scale as real world datasets, which often contain a very large number of feature dimensions (e.g. over 100) and/or data samples (e.g. over 100,000).

In addition, in terms of attribute values, the chosen dataset can contain categorical, numerical, or both categorical and numerical types of data. Using different types of data may affect the imputation performances of different MVI methods.

We now discuss the missing rate used for the simulation for a chosen dataset. It is very hard to define the missing rates as practical, say less than 30%. To deal with real world problems, however, it would be better to be able to perform simulations with larger missing rates (e.g. 70%) or a wide range of missing rates (e.g. from 10 to 90%). The findings from this kind of simulation would be more practical.

About the missing mechanisms, different domain problem datasets with missing data may be occurred based on the MCAR, MAR, and NMAR scenarios. Consideration of only one type of scenario in the incomplete dataset simulation is not enough to fully understand the imputation performance of the MVI methods.

Another issue that could affect the imputation results is whether to perform feature and/or instance selection before or after MVI. Feature and instance selection are aimed at filtering out unrepresentative features and data samples from a given dataset, respectively. Performing one or both of these tasks before MVI could make the complete dataset ‘cleaner’, which might lead to produce better imputation results.

Alternatively, performing feature and/or instance selection over an imputed dataset after MVI could make the classifier perform better than one based on the imputed dataset without feature and/or instance selection. There have been very few studies which have considered the effect of feature/instance selection on MVI (Aussem and de Morais 2010; Doquire and Verleysen 2012; Hapfelmeier and Ulm 2014; Huang et al. 2016; Sun et al. 2009; Tsai and Chang 2016).

6.2 The MVI techniques

The various baseline MVI techniques discussed in related works can be classified into two types, statistical or machine learning based techniques. The analytical results detailed in Sect. 4 clearly identify the most popular MVI techniques. However, there has been no comprehensive study comparing these well-known MVI techniques in terms of different domain datasets containing a wide range of missing rates based on different missing mechanisms. The findings of this study allow us to understand which technique(s) are more suitable for which kind of incomplete dataset. The results can be regarded as guidelines for the choice of the most representative MVI technique(s) in future work.

Several novel approaches have been proposed that do not require performing the MVI process to tackle the incomplete datasets; see for example, Conroy et al. (2016), Polikar et al. (2010), and Yan et al. (2017). It would very useful to examine the final classification accuracy of these constructed classifiers based on these approaches as well as the imputed datasets by the representative baseline MVI technique(s). This study can answer the question: When should we perform missing value imputation?

Furthermore, most of the proposed novel (hybrid) approaches are either statistical techniques (such as the studies on dynamic Fisher’s linear discrimination by Leung and Leung 2013; iterative bi-cluster based least squares by Cheng et al. 2012; interval imputuation by PCA by Zuccolotto 2012, etc.), or machine learning based techniques (such as the studies by Folino and Pisani (2016) who combined genetic programming and ensemble learning models; Zhang et al. (2015) who combined particle swarm optimization and fuzzy c-means; Silva-Ramirez et al. (2015) who combined MLP and KNN, and so on). The results show that there have been very few studies where a combination of both types of MVI techniques is considered.

6.3 The evaluation methods

Evaluation is very critical for validation of the performance of the MVI technique and reaching the final conclusions. As noted in Sect. 5, the direct evaluation method, the classification accuracy of the classifiers, and consideration of the computational time are the three main ways to evaluate the MVI technique. Using all three of these evaluation metrics would allow us to fully understand the performance as well as provide suggestions for the development of better technique(s). However, this has not been the case, all three evaluation metrics are not used together in most related studies, which is one of the main limitations of current work in the literature and should be considered in future research.

It is suggested that when the chosen dataset is divided into training and testing subsets for missing data simulation, it would be more practical to make both subsets become incomplete rather than focus on only one of them. For instance, start with a collected historical dataset which is incomplete. After performing the MVI process the imputed dataset is used to train a classifier, after which a new testing dataset can be collected, ready for the classification task. However, it could happen that this testing dataset is also incomplete, so MVI is required. After performing MVI over the incomplete testing dataset, it can then be fed into the constructed classifier.

It this case, the missing rates of the training and testing datasets can significantly affect the final classification accuracy of the classifiers and what is the best combination of MVI techniques and classifiers should be answered.

7 Conclusion

Missing value imputation (MVI) for incomplete datasets is a very important problem in data mining and big data analysis. If the incomplete datasets are not well imputed, the final mining or analysis result could be affected. This paper discusses a literature review and analysis of 111 related journal articles published from 2006 to 2017.

The review and analysis focus on the issues encountered during the MVI process. They include (1) the chosen datasets as well as their domain problems, missing rates, and missingness mechanisms in the simulation, (2) the MVI techniques employed, and (3) the evaluation methods considered.

The analysis results show the existence of many limitations encountered in the current literature, which can be improved upon in future. In summary, these include the scalability of datasets, the wide range of missing rates with the MCAR, MAR, and NMAR missingness mechanisms, the representative MVI baseline techniques, the development of novel hybrid approaches by combining statistical and machine learning based techniques, the consideration of three evaluation metrics together, and missing data simulation for both training and testing datasets.