Parallel Learning Algorithms of Local Support Vector Regression for Dealing with Large Datasets

Do, Thanh-Nghi; Bui, Le-Diem

doi:10.1007/978-3-662-58808-6_3

Thanh-Nghi Do^15,16 &
Le-Diem Bui^15,17

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11390))

407 Accesses
1 Citations

Abstract

New parallel algorithms of local support vector regression (local SVR), called kSVR, krSVR are proposed in this paper to efficiently handle the prediction task for large datasets. The learning strategy of kSVR performs the regression task with two main steps. The first one is to partition the training data into k clusters, followed which the second one is to learn the SVR model from each cluster to predict the data locally in the parallel way on multi-core computers. The krSVR learning algorithm trains an ensemble of T random kSVR models for improving the generalization capacity of the kSVR alone. The performance analysis in terms of the algorithmic complexity and the generalization capacity illustrates that our kSVR and krSVR algorithms are faster than the standard SVR for the non-linear regression on large datasets while maintaining the high correctness in the prediction. The numerical test results on five large datasets from UCI repository showed that proposed kSVR and krSVR algorithms are efficient compared to the standard SVR. Typically, the average training time of kSVR and krSVR are 183.5 and 43.3 times faster than the standard SVR; kSVR and krSVR also improve 62.10%, 63.70% of the relative prediction correctness compared to the standard SVR, respectively.

Access provided by Autonomous University of Puebla. Download chapter PDF

Parallel Algorithm of Local Support Vector Regression for Large Datasets

Random Local SVMs for Classifying Large Datasets

Non-linear Classification of Massive Datasets with a Parallel Algorithm of Local Support Vector Machines

Keywords

1 Introduction

In last decades, the progress in computer hardware, the increasing number of internet users and mobile device access to internet have enabled explosion in data. Researchers from the University of Berkeley estimate that about 1 Exabyte ($10^9$ Gigabyte) of data are generated every year [1]. Recent book [2] shows that Google, Yahoo!, Microsoft, Facebook, Twitter, YouTube and other internet-based companies have Exabytes of data due to hundreds of millions of users and billion daily active users. Such a huge amount of data yields challenges in data analysis because current analytical techniques are not well suited for that data scale. Therefore, it is a high priority to create new learning algorithms for addressing massive datasets. Our research aims to propose parallel learning algorithms of local support vector regression (local SVR) for dealing with large datasets.

Support vector machines (SVM) proposed by [3] and kernel-based methods have shown practical relevance for classification, regression and novelty detection. Successful applications are reported for face identification, text categorization and bioinformatics [4]. Nevertheless, the SVM learning is accomplished through a quadratic programming (QP), so that the computational cost of a SVM approach is at least square of the number of training datapoints making SVM impractical to handle large datasets. There is a need to scale-up SVM learning algorithms to deal with massive datasets.

In this paper, our investigation aims at developing new parallel algorithms of local SVR to efficiently handle the non-linear regression of large datasets.

Instead of building a global SVR model, as done by the classical algorithm is very difficult to deal with large datasets, our kSVR algorithm [5] is to learn in the parallel way an ensemble of local ones that are easily trained by the standard SVR algorithm like LibSVM [6]. The kSVR algorithm performs the training task with two main steps. The first one is to use kmeans algorithm [7] to partition the large training dataset into k clusters (subsets). The idea is to reduce the data size for training local non-linear SVR models at the second step. The algorithm learns k non-linear SVR models in the parallel way on multi-core computers in which a SVR model is trained in each cluster to predict the data locally. We propose to develop the new ensemble-based learning algorithm, called krSVR. The training task of krSVR learns the multiple random kSVR models to improve the generalization capacity of the kSVR alone.

The performance analysis in terms of the algorithmic complexity and the generalization capacity and the numerical test results on five large datasets from UCI repository [8] showed that our proposed kSVR and krSVR algorithms are faster than the standard SVR for the non-linear regression of large datasets while achieving the high correctness in the prediction.

The paper is organized as follows. Section 2 briefly introduces the SVR algorithm. Section 3 presents our proposed parallel algorithm kSVR of local SVR models for the non-linear regression on large datasets. The learning algorithm krSVR for the ensemble of T random kSVR models is illustrated in Sect. 4. The experimental results is presented in Sect. 5 before the discussion on related works in Sect. 6. We then conclude in Sect. 7.

2 Support Vector Regression

Let’s start with the regression task for a training dataset with m datapoints $x_i$ ($i=1,\ \ldots ,m$) in the n-dimensional input space $R^n$, having corresponding targets $y_i \in R$, support vector regression (SVR) proposed by [3] tries to find the best hyperplane (denoted by the normal vector $w \in R^n$ and the scalar $b \in R$) that has at most $\varepsilon $ deviation from the target value $y_i$. The SVR pursues this goal with the quadratic programming (1) (Fig. 1).

$$\begin{aligned} \begin{array}{l} \min {(1/2) \displaystyle \sum _{i=1}^m \displaystyle \sum _{j=1}^m{(\alpha _i - {\alpha _i}^*)(\alpha _j - {\alpha _j}^*)K\langle x_i,x_j\rangle } - \displaystyle \sum _{i=1}^m(\alpha _i - {\alpha _i}^*)y_i + \varepsilon \displaystyle \sum _{i=1}^m(\alpha _i + {\alpha _i}^*)} \\ \\ s.t.\left\{ \begin{array}{l} \displaystyle \sum _{i=1}^m(\alpha _i - {\alpha _i}^*) = 0 \\ 0 \le \alpha _i, {\alpha _i}^* \le C {1} {1}\forall i = 1,2,...,m \end{array} \right. \end{array} \end{aligned}$$

(1)

where C is a positive constant used to tune the margin and the error and a linear kernel function $K\langle x_i,x_j \rangle = \langle x_i \cdot x_j\rangle $

The support vectors (for which $\alpha _i, {\alpha _i}^* > 0$) are given by the solution of the quadratic programming (1), and then, the predictive hyperplane and the scalar b are determined by these support vectors. The prediction of a new datapoint x based on the SVR model is as follows:

$$\begin{aligned} predict(x, SVR\ model) = \sum _{i=1}^{\sharp SV}{(\alpha _i - {\alpha _i}^*)K\langle x, x_i\rangle } - b. \end{aligned}$$

(2)

SVR algorithms can use different kernel functions [9] to handle variations on prediction problems. It only needs replacing the usual linear kernel function $K\langle x_i,x_j \rangle = \langle x_i \cdot x_j\rangle $ with other popular non-linear kernel functions, including:

a polynomial function of degree d: $K\langle x_i,x_j\rangle = (\langle x_i \cdot x_j\rangle + 1)^d .$
a RBF (Radial Basis Function): $K\langle x_i,x_j\rangle = e^{-\gamma \Vert x_i - x_j\Vert ^2}.$

The SVR models are most accurate and practical relevance for many successful applications reported in [4].

3 Parallel Algorithm of Local Support Vector Regression

The study in [10] illustrated that the computational cost requirements of the SVR solutions in (1) are at least $O(m^2)$ (where m is the number of training datapoints), making standard SVM intractable for large datasets. Learning a global SVR model on the full massive dataset is challenge due to the very high computational cost.

3.1 Learning k Local SVR Models

Our proposed kSVR algorithm learns an ensemble of local SVR models that are easily trained by the standard SVR algorithm. As illustrated in Fig. 3, the kSVR handles the regression task with two main steps. The first one uses kmeans algorithm [7] to partition the full training set into k clusters, and then the second one trains an ensemble of local SVR models in which a non-linear SVR is learnt from each cluster to predict the data locally. We consider a simplest regression task given a target variable y and a predictor (variable) x. Figure 2 shows the comparison between a global SVR model (left part) and 3 local SVR models (right part) for this regression task, using a non-linear RBF kernel function with $\gamma = 10$, a positive constant $C=10^5$ (i.e. the hyper-parameters $\theta = \{\gamma , C\}$) and a tolerance $\varepsilon = 0.05$.

Since the cluster size is smaller than the full training data size, the standard SVR can easily perform the training task on the data cluster. Furthermore, the kSVR learns independently k local models from k clusters. This training task of kSVR is easily parallelized to take into account the benefits of high performance computing, e.g. multi-core computers or grids. The simplest development of the parallel kSVR algorithm is based on the shared memory multiprocessing programming model OpenMP [11] on multi-core computers. The parallel training of kSVR is described in Algorithm 1.

3.2 Prediction of a New Datapoint x Using k Local SVR Models

The kSVR-model = $ \{(c_1, lsvr_1), (c_2, lsvr_2), \dots , (c_k, lsvr_k)$} is used to predict the target value of a new datapoint x as follows. The first step is to find the closest cluster based on the distance between x and the cluster centers:

$$\begin{aligned} c_{NN} = \mathop {\text {arg}\,\text {min}}\limits _c\ distance(x, c). \end{aligned}$$

(3)

And then, the target value of x is predicted by the local SVR model $lsvr_{NN}$ (corresponding to $c_{NN}$):

$$\begin{aligned} predict(x, kSVR\ model) = predict(x, lsvr_{NN}). \end{aligned}$$

(4)

We can see that the prediction of a new datapoint x in k local SVR models depends not only on the clustering assignment but also on the prediction of the local SVR model at the final stage. It is the local SVR model that directly predicts the outcome of the new datapoint x.

We study the performance analysis in terms of the algorithmic complexity and the generalization capacity to justify the competence of the learning algorithm kSVR.

3.3 Performance Analysis

Let’s start analysing the algorithmic complexity of building k local SVR models with the parallel kSVR algorithm. The full dataset with m datapoints is partitioned into k balanced clusters. It means that the cluster size is about $\frac{m}{k}$. According to [10], the computational cost requirements of the SVR solutions is quadratic in the number of training datapoints. Therefore, the training complexity of a local SVR is $O(\frac{m^2}{k^2})$. Thus, the algorithmic complexity of parallel training k local SVR models on a P-core processor is:

$$\begin{aligned} O(\frac{k}{P}\frac{m^2}{k^2}) = O(\frac{m^2}{k \cdot P}). \end{aligned}$$

(5)

This complexity analysis illustrates that parallel learning k local SVR models in the kSVR algorithm^{Footnote 1} is k.P times faster than building a global SVR model (the complexity is at least $O(m^2)$).

The generalization capacity of kSVR models trained by the kSVR algorithm can be explained in terms of the margin size of the binary support vector classification (SVC).

An intuitive geometric approach in [12] illustrates that the regression task of the SVR is considered as a classification problem of the SVC. Let x be the predictor variable and y the target variable. In the left part of Fig. 4, the SVR approach tries to find the optimal plane ($w.x - b = 0$) that has at most $\varepsilon $ deviation from target values $y_i$. This problem is transformed into the binary classification one as illustrated in the right part of Fig. 4. The positive class $D+$ is formed by increasing the target values $y_i$ by $\varepsilon $. The negative class $D-$ is formed by decreasing the target values $y_i$ by $\varepsilon $. The optimal plane found by the SVC approach for separating $D+$ from $D-$ is identical to the solution of the SVR approach. The biggest margin solution (largest separation boundary of two classes) gives the safest prediction model. It means that we can explain the generalization capacity of kSVR models trained by the kSVR algorithm in terms of the largest margin size of the binary SVC.

The performance analysis in terms of the generalization capacity of such local models is illustrated in [13,14,15]. Recently Do and his colleague [16, 17] show that the learning algorithms of local SVC models give a guarantee of the generalization capacity compared to global SVC one.

To assess the generalization capacity of local SVR models, we start with Theorem 5.2 ([18] p. 139). It mentions that the generalization ability of the large margin hyperplane is high. Given a training set with m datapoints being separated by the maximal margin hyperplanes, the expectation of the probability of test error is bounded as follows:

$$\begin{aligned} EP_{error} \le E\left\{ min\left( \frac{sv}{m}, \frac{1}{m}\left[ \frac{R^2}{\varDelta } \right] , \frac{n}{m} \right) \right\} . \end{aligned}$$

(6)

where sv is the number of support vectors, R is the radius of the sphere containing the data and $\varDelta $ is the size of the margin, n is the number of dimensions.

It means that the good generalization ability of the maximal margin hyperplane is illustrated in:

$$\begin{aligned} min \left( \frac{1}{m}\left[ \frac{R^2}{\varDelta } \right] \right) . \end{aligned}$$

(7)

In the training task, the kSVR algorithm splits the full dataset having m datapoints into k clusters (the cluster size $m_k$ is about $\frac{m}{k}$). And then the generalization capacity of local SVR models is assessed in:

$$\begin{aligned} min\left( \frac{1}{m_k}\left[ \frac{R_k^2}{\varDelta _k} \right] \right) . \end{aligned}$$

(8)

The generalization analysis bases on the comparison between Eq. (7) of the global SVR model trained from the full dataset and Eq. (8) of the local SVR model learnt from a cluster (subset). According to Theorem 1 in [16] and Theorems 2, 3 in [17], the inequality $\varDelta _{X_k} \ge \varDelta _X$ holds. Nevertheless, the use of the subset in the training task of kSVR leads to $R_k \le R$ and $m_k \le m$. It illustrates that there exists a compromise between the locality (the subset size, the radius of the sphere containing the data) and the generalized capacity (the margin size). Therefore, local SVR models trained by the kSVR algorithm can guarantee the prediction performance compared to the global SVR one.

The performance analysis in terms of the algorithmic complexity (5) and the generalization capacity (8) shows that the parameter k is used in the kSVR to give a trade-off between the generalization capacity and the computational cost. This can be understood as follows:

If k is large then the kSVR algorithm reduces significant training time. And then, the size of a cluster is small; The locality is extremely with a very low capacity.
If k is small then the kSVR algorithm reduces insignificant training time. However, the size of a cluster is large; It improves the capacity.

It leads to set k so that the cluster size is a large enough (e.g. 200 proposed by [14]).

4 Parallel Algorithm of T Random Local Support Vector Regression Models

The performance analysis of the kSVR algorithm shows that there is a trade-off between the algorithmic complexity and the generalization capacity. If the kSVR tries to speed up the training time against the learning algorithm of global SVR models by increasing the number of clusters (the parameter k) then it reduces the generalization ability. Due to this problem, we propose the ensemble-based learning algorithm of random local SVR models to improve the generalization capacity of the local one. The main idea is based on the Bias-Variance framework proposed by Breiman [19, 20]. The performance of a learning model depends on Bias term and Variance term. Bias is the systematic error and not depending on the learning sample. Variance is the error with respect to the variability of the learning model due to the learning sample randomness. The main idea of ensemble-based learning algorithms [19,20,21] and [22] is to use the randomization of the learning sample for reducing Variance and/or Bias. This key idea leads to the improvement of the generalization capacity of the use of the single one model. Therefore, we propose the ensemble learning algorithm krSVR to train T random kSVR models for reducing Variance. It means that the krSVR improves the generalization ability of the kSVR model alone.

4.1 Learning T Random Local SVR Models

The krSVR algorithm trains the ensemble of T random kSVR models using the kSVR algorithm (described in Algorithm 1 and Fig. 3). The kSVR algorithm trains the $t^{th}$ kSVR model from the $t^{th}$ bootstrap sample (sampling with replacement from the original dataset) using $n'$ dimensions randomly sampling without replacement from n original dimensions.

As described in Algorithm 2 and Fig. 5, the krSVR constructs independently T random local SVR models. It allows parallelizing the learning task with OpenMP on multi-core computers.

Thus, the complexity of parallel learning krSVR on a P-core processor is as follows:

$$\begin{aligned} O(\frac{T.m^2}{k.P}). \end{aligned}$$

(9)

4.2 Prediction of a New Datapoint x Using T Random Local SVR Models

The prediction for a new individual x is the average of the prediction results obtained by T random kSVR models.

5 Evaluation

We are interested in the performance of new parallel algorithms of local SVR (kSVR, krSVR) for dealing with large datasets. There is a need of numerical test results in terms of training time and prediction correctness to be consistent with the performance analysis of the algorithmic complexity and the generalization capacity in Sect. 3.

5.1 Software Programs

We have implemented algorithms kSVR, krSVR in C/C++, OpenMP [11], using the Automatically Tuned Linear Algebra Software (ATLAS [23]) and the highly efficient standard library SVM, LibSVM [6].

Due to the consistency with the performance analysis of the algorithmic complexity and the generalization capacity, our evaluation is reported in terms of training time and prediction correctness. We are interested in the comparison the regression results obtained by our proposed kSVR, krSVR for local SVR models with LibSVM for global SVR models.

All experiments are run on machine Linux Fedora 20, Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores and 32 GB main memory.

5.2 Datasets

All experiments are conducted with the five datasets from UCI repository [8]. Table 1 presents the description of datasets. The evaluation protocols are illustrated in the last column of Table 1. Datasets are already divided in training set (Trn) and test set (Tst). We used the training data to build the SVR models. Then, we predicted the test set using the resulting models.

Table 1. Description of datasets

Full size table

5.3 Tuning Parameters

We propose to use RBF kernel function type in training tasks of kSVR, krSVR and LibSVM for building SVR models because it is general and efficient [24]. The cross-validation protocol (2-fold) is used to tune the regression tolerance $\varepsilon $, the hyper-parameter $\gamma $ of RBF kernel (RBF kernel of two individuals $x_i$, $x_j$, $K[i,j] = exp(-\gamma \Vert x_i - x_j\Vert ^2)$) and the cost C (a trade-off between the margin size and the errors) to obtain a good correctness. For two largest datasets (Buzz in social media Twitter, YearPredictionMSD), we used a subset randomly sampling about 5% training dataset for tuning hyper-parameters due to the expensive computational cost.

Our kSVR uses the parameter k local models (number of clusters). We propose to set k so that each cluster consists of about 200 individuals. The idea gives a trade-off between the generalization capacity [15] and the computational cost. Furthermore, the krSVR algorithm learns 20 random k local SVR models ($T = 20$) with the number of random dimensions being the square root of the full set ($n' = \sqrt{(}n)$ as recommended by [20]).

Table 2 presents the hyper-parameters of kSVR, krSVR and LibSVM in the regression.

Table 2. Hyper-parameters of kSVR, krSVR and LibSVM

Full size table

Table 3. Regression results in terms of mean absolute error (MAE) and training time (minutes)

Full size table

5.4 Regression Results

The regression results of LibSVM, kSVR and krSVR on the five datasets are given in Table 3, Figs. 6, 7 and 8.

Comparison in Training Time

As it was expected, our proposed algorithms of local SVR models (kSVR, krSVR) outperform LibSVM in terms of training time. The average of kSVR, krSVR and LibSVM training time are 8.45 min, 35.79 min and 1551.15 min, respectively. It means that kSVR is the fastest among the three learning algorithms. kSVR is 183.5 times faster than LibSVM. krSVR is also 43.34 times faster than LibSVM. Training time of krSVR is about 4.23 times longer than kSVR.

For 3 first small datasets (Appliances energy prediction, Facebook comment volume, BlogFeedback), the training speed improvements of kSVR and krSVR versus LibSVM are 21.10 and 3.97 times, respectively.

With two large datasets (Buzz in social media - Twitter and YearPredictionMSD), the learning time improvements of kSVR and krSVR against LibSVM is more significant. These training speed improvements of kSVR and krSVR compared to LibSVM are 200.44 times and 48.63 times, respectively.

Comparison in Prediction Correctness

In terms of prediction correctness (measured by the mean absolute error - MAE), the error average made by kSVR, krSVR and LibSVM are 23.50, 22.52 and 62.01, respectively. The comparison of prediction correctness, dataset by dataset, shows that:

kSVR has 4 wins, 1 defeat against LibSVM; kSVR is beaten only once (with Appliances energy prediction dataset) by LibSVM;
krSVR is most accurate with 5/5 wins versus LibSVM and kSVR.

The regression results show that our kSVR, krSVR are more accurate than LibSVM for the prediction. It seems to be that local SVR models are suited for large datasets. The data partition stage allows local SVR algorithms being easily to fit regression models for data subsets. It leads to improve the prediction correctness of local SVR algorithms. In particular with a large dataset as Buzz in social media - Twitter, the target domain is very large [0, 75724.5], therefore the prediction error made by any algorithm is also high. Our kSVR, krSVR can reduce the prediction error about 5 times compared to LibSVM.

The numerical results demonstrate that kSVR, krSVR improve not only the training time, but also the prediction correctness when dealing with large datasets. The regression results allow to believe that our proposed kSVR, krSVR are efficient for handling such large datasets.

6 Discussion on Related Works

Our proposal is related to large-scale SVM learning algorithms. The improvements of SVM training on very large datasets include effective heuristic methods in the decomposition of the original quadratic programming into series of small problems [10, 25, 26] and [6]. Recent works [27, 28] proposed the stochastic gradient descent methods for dealing with large scale linear SVM solvers. The parallel and distributed algorithms [29,30,31] for the linear classification improve learning performance for large datasets by dividing the problem into sub-problems that execute on large numbers of networked PCs, grid computing, multi-core computers.

The review paper [32] provides a comprehensive survey on large-scale linear support vector classification. LIBLINEAR [33] and its extension [34] uses the Newton method for the primal-form of SVM and and the coordinate descent approach for the dual-form SVM to deal with very large linear classification and regression. The parallel algorithms of logistic regression and linear SVM using Spark [35] are proposed in [36]. The distributed Newton algorithm of logistic regression [37] is implemented in the Message Passing Interface (MPI). The parallel dual coordinate descent method for linear classification [38] is implemented in multi-core environments using OpenMP. The incremental and decremental algorithms [39] aim at training linear classification of large data that cannot fit in memory. These algorithms are proposed to efficiently deal large-scale linear classification tasks in a very-high-dimensional input space. But the computational cost of a non-linear SVM approach is still impractical. The work in [40] tries to automatically determine which kernel classifiers perform strictly better than linear for a given data set.

Our proposal is in some aspects related to local SVM learning algorithms. The first approach is to classify data in hierarchical strategy. This kind of training algorithm performs the classification task with two main steps. The first one is to cluster the full dataset into homogeneous groups (clusters) and then the second one is to learn the local supervised classification models from clusters. The paper [41] proposed to use the expectation-maximization (EM) clustering algorithm [42] for partitioning the training set into k joint clusters (the EM clustering algorithm makes a soft assignment based on the posterior probabilities [43]); for each cluster, a neural network (NN) is learnt to classify the individuals in the cluster. The parallel mixture of SVMs algorithm proposed by [44] constructs local SVM models instead of NN ones in [41]. CSVM [45] uses kmeans algorithm [7] to partition the full dataset into k disjoint clusters; then the algorithm learns weighted local linear SVMs from clusters. More recent kSVM [46], krSVM [47] (random ensemble of kSVM), tSVM [16] propose to parallely train the local non-linear SVMs instead of weighting linear ones of CSVM. DTSVM [48, 49] uses the decision tree algorithm [50, 51] to split the full dataset into disjoint regions (tree leaves) and then the algorithm builds the local SVMs for classifying the individuals in tree leaves. These algorithms aim at speeding up the learning time.

The second approach is to learn local supervised models from k nearest neighbors (kNN) of a new testing individual. First local learning algorithm of Bottou & Vapnik [14] find kNN of a test individual; train a neural network with only these k neighborhoods and apply the resulting network to the test individual. k-local hyperplane and convex distance nearest neighbor algorithms are also proposed in [52]. More recent local SVM algorithms aim to use the different methods for kNN retrieval; including SVM-kNN [53] trying different metrics, ALH [54] using the weighted distance and features, FaLK-SVM [55] speeding up the kNN retrieval with the cover tree [56].

A theorical analysis for such local algorithms discussed in [13] introduces the trade-off between the capacity of learning system and the number of available individuals. The size of the neighborhoods is used as an additional free parameters to control generalisation capacity against locality of local learning algorithms.

7 Conclusion and Future Works

We presented new parallel algorithms of local SVR that achieve high performances for the non-linear regression on large datasets. The training task of kSVR is to split the full training dataset into k clusters. This step is to reduce data size in training local SVR. And then it easily learns k non-linear SVR models in the parallel way on multi-core computers in which a SVR model is trained in each cluster to predict the data locally. The krSVR learning algorithm is to improve the generalization capacity of the kSVR alone by training an ensemble random kSVR models. The performance analysis and the numerical test results on five datasets from UCI repository showed that our proposed kSVR, krSVR are efficient in terms of training time and prediction correctness compared to the standard SVR. The learning time improvements of kSVR and krSVR versus LibSVM are 183.5 and 43.3 times. kSVR and krSVR improve 62.10%, 63.70% of the relative prediction correctness compared to the standard SVR, respectively. An example of kSVR’s effectiveness is given with the non-linear regression of YearPredictionMSD dataset (having 400000 datapoints, 90 dimensions) in 6.33 min and 7.86 mean absolute error obtained on the prediction of the testset.

In the near future, we intend to provide more empirical test on large benchmarks and comparisons with other algorithms. A promising avenue for future research is able to automatically tune hyperparameters for kSVR and krSVR.

Notes

1.
It must be noted that the complexity of the kSVR approach does not include the kmeans clustering used to partition the full dataset. But this step requires insignificant time compared with the quadratic programming solution.

References

Lyman, P., et al.: How much information (2003)
Google Scholar
National Research Council, Division on Engineering and Physical Sciences, Board on Mathematical Sciences and Their Applications, Committee on the Analysis of Massive Data, Committee on Applied and Theoretical Statistics: Frontiers in Massive Data Analysis. The National Academies Press (2013)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Guyon, I.: Web page on SVM applications (1999). http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html
Bui, L.D., Tran-Nguyen, M.T., Kim, Y.G., Do, T.N.: Parallel algorithm of local support vector regression for large datasets. In: Proceedings of Future Data and Security Engineering - 4th International Conference, FDSE 2017, pp. 139–153, Ho Chi Minh City, Vietnam, 29 November–1 December (2017)
Chapter Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley, January 1967
Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York (2000)
Book MATH Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208 (1999)
Google Scholar
OpenMP Architecture Review Board: OpenMP application program interface version 3.0 (2008)
Google Scholar
Bi, J., Bennett, K.P.: A geometric approach to support vector regression. Neurocomputing 55(1–2), 79–108 (2003)
Article Google Scholar
Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems 4, NIPS Conference, Denver, Colorado, USA, 2–5 December 1991, pp. 831–838 (1991)
Google Scholar
Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Article Google Scholar
Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)
Article Google Scholar
Do, T.N., Poulet, F.: Parallel learning of local SVM algorithms for classifying large datasets. T. Large-Scale Data-Knowl.-Cent. Syst. 31, 67–93 (2016)
Google Scholar
Do, T.N., Poulet, F.: Latent-lSVM classification of very high-dimensional and large-scale multi-class datasets. Concurr. Comput.: Pract. Exp. 0(0), e4224
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (2000). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)
Article MathSciNet MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Whaley, R., Dongarra, J.: Automatically tuned linear algebra software. In: Ninth SIAM Conference on Parallel Processing for Scientific Computing, CD-ROM Proceedings (1999)
Google Scholar
Lin, C.: A practical guide to support vector classification (2003)
Google Scholar
Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theory of 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Gile, L., Morgan, N., Wilson, E. (eds.) Neural Networks for Signal Processing VII, Jose Principe, pp. 276–285 (1997)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the Twenty-Fourth International Conference Machine Learning, pp. 807–814 (2007). ACM
Google Scholar
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 161–168. NIPS Foundation (2008). http://books.nips.cc
Do, T.N.: Parallel multiclass stochastic gradient descent algorithms for classifying million images with very-high-dimensional signatures into thousands classes. Vietnam. J. Comput. Sci. 1(2), 107–115 (2014)
Article Google Scholar
Do, T.N., Poulet, F.: Parallel multiclass logistic regression for classifying large scale image datasets. In: Advanced Computational Methods for Knowledge Engineering - Proceedings of 3rd International Conference on Computer Science, Applied Mathematics and Applications - ICCSAMA 2015, Metz, France, 11–13 May 2015, pp. 255–266 (2015)
Chapter Google Scholar
Do, T.-N., Tran-Nguyen, M.-T.: Incremental parallel support vector machines for classifying large-scale multi-class image datasets. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds.) FDSE 2016. LNCS, vol. 10018, pp. 20–39. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48057-2_2
Chapter Google Scholar
Yuan, G., Ho, C., Lin, C.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)
MATH Google Scholar
Ho, C., Lin, C.: Large-scale linear support vector regression. J. Mach. Learn. Res. 13, 3323–3348 (2012)
MathSciNet MATH Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010)
Google Scholar
Lin, C., Tsai, C., Lee, C., Lin, C.: Large-scale logistic regression and linear support vector machines using spark. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, 27–30 October 2014, pp. 519–528 (2014)
Google Scholar
Zhuang, Y., Chin, W., Juan, Y., Lin, C.: Distributed Newton methods for regularized logistic regression. In: Proceedings Advances in Knowledge Discovery and Data Mining - 19th Pacific-Asia Conference, PAKDD 2015, Part II, Ho Chi Minh City, Vietnam, 19–22 May 2015, pp. 690–703 (2015)
Chapter Google Scholar
Chiang, W., Lee, M., Lin, C.: Parallel dual coordinate descent method for large-scale linear classification in multi-core environments. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 1485–1494 (2016)
Google Scholar
Tsai, C., Lin, C., Lin, C.: Incremental and decremental training for linear classification. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, USA, 24–27 August 2014, pp. 343–352 (2014)
Google Scholar
Huang, H., Lin, C.: Linear and kernel classification: when to use which? In: Proceedings of the SIAM International Conference on Data Mining 2016 (2016)
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. Neural Comput. 14(5), 1105–1114 (2002)
Article MATH Google Scholar
Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, 29 April–1 May 2013, Volume 31 of JMLR Proceedings, pp. 307–315 (2013)
Google Scholar
Do, T.-N.: Non-linear classification of massive datasets with a parallel algorithm of local support vector machines. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. AISC, vol. 358, pp. 231–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17996-4_21
Chapter Google Scholar
Do, T.-N., Poulet, F.: Random local SVMs for classifying large datasets. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds.) FDSE 2015. LNCS, vol. 9446, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26135-5_1
Chapter Google Scholar
Chang, F., Guo, C.Y., Lin, X.R., Lu, C.J.: Tree decomposition for large-scale SVM problems. J. Mach. Learn. Res. 11, 2935–2972 (2010)
MathSciNet MATH Google Scholar
Chang, F., Liu, C.C.: Decision tree as an accelerator for support vector machines. In: Ding, X. (ed.) Advances in Character Recognition. InTech (2012)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Kennett Square (1984)
MATH Google Scholar
Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992. The MIT Press (2001)
Google Scholar
Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2126–2136 (2006)
Google Scholar
Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(13–15), 3001–3004 (2008)
Article Google Scholar
Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)
MathSciNet MATH Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104. ACM (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Technology, Can Tho University, Cantho, 92100, Vietnam
Thanh-Nghi Do & Le-Diem Bui
UMI UMMISCO 209 (IRD/UPMC), UPMC, Sorbonne University, Pierre and Marie Curie University, Paris 6, France
Thanh-Nghi Do
AI Lab, Computer Science Department, Gyeongsang National University, Jinju, Korea
Le-Diem Bui

Authors

Thanh-Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar
Le-Diem Bui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh-Nghi Do .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Roland Wagner
Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tran Khanh Dang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Do, TN., Bui, LD. (2019). Parallel Learning Algorithms of Local Support Vector Regression for Dealing with Large Datasets. In: Hameurlain, A., Wagner, R., Dang, T. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLI. Lecture Notes in Computer Science(), vol 11390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58808-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-58808-6_3
Published: 07 February 2019
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58807-9
Online ISBN: 978-3-662-58808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Learning Algorithms of Local Support Vector Regression for Dealing with Large Datasets

Abstract

Similar content being viewed by others

Parallel Algorithm of Local Support Vector Regression for Large Datasets

Random Local SVMs for Classifying Large Datasets

Non-linear Classification of Massive Datasets with a Parallel Algorithm of Local Support Vector Machines

Keywords

1 Introduction

2 Support Vector Regression