A Specialized Probability Density Function for the Input of Mixture of Gaussian Processes

Zhao, Longbo; Ma, Jinwen

doi:10.1007/978-3-030-01313-4_8

Longbo Zhao¹⁸ &
Jinwen Ma¹⁸

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 539))

Included in the following conference series:

International Conference on Intelligence Science

1040 Accesses

Abstract

Mixture of Gaussian Processes (MGP) is a generative model being powerful and widely used in the fields of machine learning and data mining. However, when we learn this generative model on a given dataset, we should set the probability density function (pdf) of the input in advance. In general, it can be set as a Gaussian distribution. But, for some actual data like time series, this setting or assumption is not reasonable and effective. In this paper, we propose a specialized pdf for the input of MGP model which is a piecewise-defined continuous function with three parts such that the middle part takes the form of a uniform distribution, while the two side parts take the form of Gaussian distribution. This specialized pdf is more consistent with the uniform distribution of the input than the Gaussian pdf. The two tails of the pdf with the form of a Gaussian distribution ensure the effectiveness of the iteration of the hard-cut EM algorithm for MGPs. It demonstrated by the experiments on the simulation and stock datasets that the MGP model with these specialized pdfs can lead to a better result on time series prediction in comparison with the general MGP models as well as the other classical regression methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Non-central Student-t Mixture of Student-t Processes for Robust Regression and Prediction

Gaussian processes with skewed Laplace spectral mixture kernels for long-term forecasting

Article Open access 12 July 2021

Stock Price Prediction Through the Mixture of Gaussian Processes via the Precise Hard-cut EM Algorithm

Keywords

1 Introduction

Gaussian process (GP) is a powerful model and widely used in machine learning and data mining [1,2,3]. However, there are two main limitations. Firstly, it cannot fit the multi-modal dataset well because GP model employs a global scale parameter [4]. Secondly, its parameter learning consumes $O(N^{3})$ computational time [5, 6], where N is the number of training samples. In order to overcome those difficulties, Tresp [4] proposed mixture of Gaussian processes (MGP) in 2000, which was developed from the mixture of experts. Since then, many kinds of MGP model have been proposed and can be classified into two main forms: the generative model [7,8,9,10] and conditional model [4, 6, 11,12,13]. In comparison with the conditional model, the generative model has two main advantages: (1) The missing features can be easily inferred from the outputs; (2) The influence of the inputs on the outputs is more clear [8]. Therefore, many scholars have studied the generative model [14,15,16,17,18,19,20].

However, when we learn the generative model on a given dataset, we should set the probability density function (pdf) of the input in advance. In general, it can be set as a Gaussian distribution [14,15,16,17,18,19,20]. But, for some actual data like time series, this setting or assumption is not so reasonable and effective. When we learn MGP model on these actual data, we usually need to utilize the ARMA model [14,15,16,17,18,19,20,21] to transform the data, and then use the transformed data on the MGP model. However, this transformation can destroy the correlation of samples, which is very important for MGP model. Figure 1 shows the eLoad data [14] from which we can see that samples in three different colors (blue, black, and red) represent three temporally sequential samples, respectively. Figure 2 shows the transformed eLoad data from which we can find that three temporally sequential samples are mixed together and cannot be classified effectively. In this paper, we propose a specialized pdf for the input of the MGP model to solve this problem. As shown in Fig. 3, this pdf consists of three components. The left and right side parts are Gaussian distributions, while the middle is a uniform distribution. For the training of the MGP model, we use the hard-cut EM algorithm [17] as the basic learning framework for parameter estimation. Actually, the hard-cut EM algorithm can get better result than some popular learning algorithms.

The rest of the paper is organized as follows. Section 2 introduces the GP and MGP models. We describe the specialized probability density function in Sect. 3. We further propose the learning algorithm for the MGP model of the specialized pdfs in Sect. 4. The experimental results are contained in Sect. 5. Finally, we make a brief conclusion in Sect. 6.

2 GP and MGP Models

2.1 GP Model

We mathematically define the GP model as follows:

$$\begin{aligned} {{\varvec{Y}}}\sim N(m({{\varvec{X}}}),K({{\varvec{X}}},{{\varvec{X}}})) \end{aligned}$$

(1)

where D = {X,Y} = {(${\varvec{x}}_{i}$, $y_{i}$): i =1,2,...,N}, ${\varvec{x}}_{i}$ denotes a d-dimensional input vector, and $y_{i}$ is the corresponding output. m(X) and K(X,X) denote the mean vector and covariance matrix, respectively. Without loss of generality, we assume m(X) = 0. There are many choices for covariance function, such as linear, Gaussian noise, squared exponential function and so on. Here, we adopt the squared exponential (SE) covariance function [10]:

$$\begin{aligned} K({{\varvec{{x}}}}_{i},{{\varvec{{x}}}}_{j};{{\varvec{\theta }}})=\sigma _{f}^2exp(-\frac{\sigma _{l}^2}{2}\Vert {{\varvec{{x}}}}_{i}-{{\varvec{{x}}}}_{j}\Vert ^2)+\sigma _{n}^2I _{(i=j)} \end{aligned}$$

(2)

where $\varvec{\theta }$ = {$\sigma _{f}^2$,$\sigma _{l}^2$,$\sigma _{n}^2$} denote the vector. On the given sample dataset D, the log-likelihood function can be expressed as follows:

$$\begin{aligned} \log p({{\varvec{Y}}}|{{\varvec{X}}},\varvec{\theta })=\log N ({{\varvec{Y}}}|{{\varvec{0}}},K(\mathbf X ,\mathbf X )) \end{aligned}$$

(3)

In order to obtain the estimation of parameters $\varvec{\theta }$, we perform the maximum likelihood estimation (MLE) procedure [10], that is, we get

$$\begin{aligned} \hat{\varvec{\theta }}= {\mathop {argmax}\nolimits _{\varvec{\theta }}}\log N ({{\varvec{Y}}}|{{\varvec{0}}},K({{\varvec{X}}},{{\varvec{X}}})) \end{aligned}$$

(4)

2.2 MGP Model

Denote C and N as the number of GP components and training samples in the MGP model, respectively. On the basis of the GP model, we define MGP model by the following steps:

Step 1. Partition samples into each GP components by the Multinomial distribution:

$$\begin{aligned} p(z_{n}=c)=\pi _{c} \end{aligned}$$

(5)

where c = 1,...,C and n = 1,...,N.

Step 2. Accordingly, each input ${\varvec{x}}_{i}$ fulfills the following distribution:

$$\begin{aligned} p({{\varvec{x}}_{i}}| z_{n}= c)\sim p({{\varvec{x}}}| \varvec{\psi }_{c}) \end{aligned}$$

(6)

where {$\varvec{\psi }_{c}: c=1, ..., C $} is the parameter set. In general, p(${\varvec{x}}|\varvec{\psi }_{c}$) is a Gaussian distribution.

Step 3. Denote ${\varvec{I}}_{c}$ = {$n \vert z_{n}=c$}, ${\varvec{X}}_{c}$ = {${\varvec{x}}_{n} \vert z_{n}=c$}, ${{\varvec{Y}}_{c}}=\{ y_{n} \vert z_{n}=c \}$ (c=1,...,C, n=1,...,N) as the sample indexes, inputs and outputs of the training samples in the c-th component, respectively. Given ${\varvec{X}}_{c}$, the corresponding c-th GP component can be mathematically defined as follows:

$$\begin{aligned} {{\varvec{Y}}_{c}}\sim {N}({{\varvec{0}}}, K({{\varvec{X}}_{c}},{{\varvec{X}}_{c}})) \end{aligned}$$

(7)

where K(${\varvec{X}}_{c}$,${\varvec{X}}_{c}$) is given by Eq.(2) with the hyper-parameter ${\varvec{\theta }}_{c} =\{\sigma _{fc}^2, \sigma _{lc}^2,\sigma _{nc}^2\}$.

Based on Eqs. (5), (6) and (7), we mathematically define the MGP model. The log-likelihood function is derived as follows:

$$\begin{aligned} {\begin{matrix} \log (p ({{\varvec{Y}}_{c}}|{{\varvec{X}}_{c}}, \varvec{\varTheta }, \varvec{\varPsi })) &{}=\sum _{c=1}^{C}(\sum _{n\in {{\varvec{I}}_{c}}}(\log ({\pi _{c}} {p}({{\varvec{x}}_{n}}| {\varvec{\mu }}_{c},{{\varvec{S}}}_{c})))\\ {} &{}\quad +\log ({p}({{\varvec{Y}}_{c}}|{{\varvec{X}}_{c}},{\varvec{\theta }}_{c}))) \end{matrix}} \end{aligned}$$

(8)

where $\varvec{\varTheta }=\{\varvec{\theta }_{c}:c=1, ..., C \}$ and $\varvec{\varPsi }= \{ \varvec{\psi }_{c}, {\pi }_{c}: c =1, ..., C \}$ denote the hyper-parameters and parameters of the MGP model, respectively.

3 Specialized Input Distribution and Its Learning Algorithm

For many real world datasets, such as UCI machine learning repository, Gaussian distribution is not appropriate for the input. In order to solve this problem, we propose a specialized distribution for this situation.

3.1 Specialized PDF

This specialized distribution is a piecewise-defined continuous function, which consists of three parts, the middle part is a uniform distribution density, both sides are Gaussian distribution densities, shown in Fig. 3. We mathematically defined the specialized distribution as follows:

$$\begin{aligned} P(\varvec{x};\varvec{\psi })={\left\{ \begin{array}{ll} \frac{\lambda _{1}}{(\sqrt{2\pi }\tau _1)}\exp ^ {-\frac{(\varvec{x}-\varvec{a})^2}{2\tau _1^2}}&{}\varvec{x}<\varvec{a}\\ \lambda &{} \varvec{a} \le \varvec{x} \le \varvec{b} \\ \frac{\lambda _{2}}{(\sqrt{2\pi }\tau _2)}\exp ^ {-\frac{(\varvec{x}-\varvec{b})^2}{2\tau _2^2}}&{}\varvec{x}>\varvec{b}\\ \end{array}\right. } \end{aligned}$$

(9)

where we redefine $\varvec{\psi }$$=${$\lambda ,\lambda _{1},\lambda _ {2},\tau _1,\tau _2$,$\varvec{a}$,$\varvec{b}$} as the parameter vector.

3.2 Learning Algorithm for the Specialized PDF

In order to learn $\varvec{\psi }$, we set that the input interval ($\varvec{a}$,$\varvec{b}$) contains the number of the samples with probability $p_0$. Denote X and N as the training sample set and the number of training sample, respectively. We summarize the algorithm framework as following steps:

Step 1. Learn a, b, and $\lambda $:

$$\begin{aligned} {{\varvec{a}}}=X_{\frac{N(1-p_{0})}{2}};{{\varvec{b}}}=X_{\frac{N(1+p_{0})}{2}};\lambda =\frac{{p}_{0}}{({{\varvec{b}}}-{{\varvec{a}}})} \end{aligned}$$

(10)

where p(x< $X_{\frac{N(1-p_{0})}{2}}$ $| x \in $ ${\varvec{X}}$) = $\frac{(1-p_{0})}{2}$. In order to reduce the effect of the misclassified (or outlier) point on the middle part, we estimate $\varvec{a}$ and $\varvec{b}$ as Eq.(10) do.

Step 2. Estimate $\lambda _{1}$, $\lambda _{2}$, $\tau _{1}$ and $\tau _{2}$.

Denote $p_{1}$ and $p_{2}$ as the sample ratio at both left side and right side, respectively. The probability density function is continuously integrable, and the integral of the probability density function is equal to 1. In other word:

$$\begin{aligned} \int {P(\varvec{x};\varvec{\psi })}dx={\left\{ \begin{array}{ll} {p}_{1} &{}\varvec{x} <\varvec{a}\\ {p}_{0} &{} \varvec{a} \le \varvec{x} \le \varvec{b} \\ {p}_{2} &{} \varvec{x} > \varvec{b} \\ \end{array}\right. }; \qquad {p}_{0} +{p}_{1} + {p}_{2} =1 \end{aligned}$$

(11)

According to the continuity of the probability density function, we only need do same simple calculations to get {$\lambda _{1},\lambda _ {2},\tau _{1},\tau _{2}$}:

$$\begin{aligned} \lambda _{1}=2 {p}_{1} ;\lambda _{2}=2 {p}_{2} ;\tau _{1}=\frac{\lambda _{1}}{\sqrt{2\pi }\lambda };\tau _{2}=\frac{\lambda _{2}}{\sqrt{2\pi }\lambda } \end{aligned}$$

(12)

4 The MGP Model of the Specialized PDFs and Its Learning Algorithm

We now consider the MGP model with these specialized pdfs. For the parameter learning of the MGP model, there are main three kinds of learning algorithms: MCMC methods [22, 23], variational Bayesian inference [24, 25], and EM algorithm [5, 9, 11]. However, the MCMC methods and variational Bayesian inference methods have their own limitations: the time complexity of the MCMC method is very high, and variational Bayesian inference may lead to a rather deviation from the true objective function. EM algorithm is an important and effective iterative algorithm to do maximum likelihood or maximum a posterior(MAP) estimates of parameters for mixture model. However, for such a complex MGP model, the posteriors of latent variables and Q function are rather complicated. In order to overcome this difficulty, we implement the hard-cut EM algorithm [17] to learn parameter, which makes certain approximations in E-step.

Denote ${\varvec{z}}_{nc}$ be the latent variables, where ${\varvec{z}}_{nc}$ is a Kronecker delta function, ${\varvec{z}}_{nc}$ = 1, if the sample (${\varvec{x}}_{n}$,$y_{n}$) belongs to the c-th GP component. Therefore, we can obtain the log likelihood function of the complete data from Eq. (8) as follows:

$$\begin{aligned} {\begin{matrix} \log (p({{\varvec{Y}}},{{\varvec{Z}}}| {{\varvec{X}}}, \varvec{\varTheta }, \varvec{\varPsi } )) &{}=\sum _{c=1}^{C}(\sum _{n=1}^{N}( {\varvec{z}}_{nc} \log (\pi _{c} p({{\varvec{x}}_{n}}| \varvec{\psi } _{c})))\\ {} &{} +\log ( p ( {\varvec{Y}}_{c} | {\varvec{X}}_{c} , \varvec{\theta } _{c}))) \end{matrix}} \end{aligned}$$

(13)

The main idea of hard-cut EM algorithm can be expressed as the following steps:

E-step. Assign the samples to the corresponding GP component according to the maximum a posterior (MAP) criterion:

$$\begin{aligned} \widehat{k}_{n}=argmax_{1 \le c \le C} \{ \pi _{c} p({\varvec{x}}_{n}|\varvec{\psi } _{c}) p(y_{n}|\varvec{\theta }_{c}) \} \end{aligned}$$

(14)

that is, latent variable ${\varvec{z}}_{\widehat{k}_{n}n}$=1.

M-step. With the known partition, we can estimate the parameters $\varvec{\varPsi }$ and hyper-parameters $\varvec{\varTheta }$ via the MLE procedure:

(1)
For learning the parameters $\{\varvec{\psi }_{c}\}_{c}$, we perform the learning algorithm in the last section.
(2)
For estimating the hyper-parameter $\varvec{\varTheta }$, we perform the MLE procedure on each c-th component to estimate $\varvec{\theta }_{c}$ as shown in Eq. (4).

5 Experimental Results

In order to test the accuracy and effectiveness of the specialized pdf for MGP model, we carry out several experiments on the simulation and stock datasets. We employ the root mean squared error (RMSE) to measure the prediction accuracy, which is defined as follows:

$$\begin{aligned} RMSE=\sqrt{\frac{\sum _{n=1}^{N}({{\varvec{y}}_{n}}-\hat{{{\varvec{y}}_{n}}})^2}{N}} \end{aligned}$$

(15)

where $\hat{{\varvec{y}}_{n}}$ and ${{\varvec{y}}_{n}}$ denote the predicted value and true value, respectively. We also compare our algorithm with some classical machine learning algorithms: kernel, RBF, SVM, and denote ‘OURS’ as our proposed model with the hard-cut EM algorithm.

5.1 Simulation Experiments

In the simulation experiments, we generate three groups of synthetic datasets from MGP model. those three MGP models contain 4, 6, 8 GP components, respectively. The number of samples in each group is 2600, 3900, 5000, respectively. In each group, there are three datasets, which are the same except the degree of overlap. Figure 4 shows the dataset with the smallest degree of overlap with 4 GP components. On each group dataset, we run each algorithm 100 times, and randomly extract training samples and test samples, where 1/3 are training samples and other 2/3 are test samples. The RMSE of each algorithm is listed in Table 1. From the Table 1, We can see that our proposed algorithm obtains the better results. Figure 5 shows the specialized pdfs on the first group dataset with the smallest overlapping degree. We can obtain that the specialized pdf at both ends of the data in the form of a Gaussian distribution of attenuation, the specialized pdf in the middle of the data is a uniform distribution. This shape of the specialized pdf is more consistent with the uniform distribution than Gaussian distribution. The attenuation of both ends of the specialized pdf in the form of a Gaussian distribution ensures the effectiveness of the iteration of the hard-cut EM algorithm. Then, the class label of the samples can be updated according to the MAP criteria in the iteration of hard-cut EM algorithm. If we apply uniform distribution only, the iterative steps of hard-cut EM algorithm is invalid.

Table 1. The RMSEs of the four algorithms on the three groups.

Full size table

Table 2. The RMSEs of the four algorithms on the three groups of the transformed stock datasets.

Full size table

5.2 Prediction on Stock Data

In this section, we obtain the closing price data of three stocks from Shanghai Stock Exchange, and the IDs are 300015, 002643, and 601058, respectively.

From Eq. (10), we can know that the specialized pdf is closely related to the interval length of the middle data. In order to check the effect of different input lengths on the prediction accuracy of the algorithm, we do some transformations on the input. Since the range of output changes is too large, we use a linear function to narrow the output down to the same range as the synthetic data. In summary, we transform the datasets as follows:

(i)
Transform the input as following equation:

$$\begin{aligned} {X}_{n} =\frac{n}{\delta } \end{aligned}$$
(16)
where i = 1,...,N, N is the sample number, ${\delta }=\{101, 51, 23, 11, 7, 3, 1\}$.
(ii)
Transform the output by a linearly compressed, and the compressed interval is [−4.5, 4.5].

$$\begin{aligned} \tilde{y}=\frac{9y}{M-m}+\frac{4.5}{M-m} \end{aligned}$$
(17)

where M and m denote the maximum value and minimum value of the stock, respectively.

Through the above transformations, each stock can produce 7 datasets. In each 7 datasets of three stock datasets, we repeat each regression algorithm 100 times, and randomly extracted 1/3 as training samples and the other 2/3 as test samples. The RMSE of each algorithm on those three transformed stock datasets is listed in Table 2. From Table 2, we can obtain that our proposed algorithm can get a better predict accuracy than other classical regression algorithms, and our algorithm obtain the better result with the smaller $\delta $, but this is not absolute.

6 Conclusion

We have designed a specialized pdf for the input of MGP model which consists of three parts: the right and left side parts still take the form of Gaussian distributions, while the middle part takes the form of a uniform distribution. This specialized pdf has the advantages of both the Gaussian distribution and the uniform distribution. That is, the tail Gaussian distributions in the left and right side parts ensure that the hard-cut EM algorithm can perform more efficiently during each iteration, and the uniform distribution in the middle part is more reasonable for the time series data. The experiments are conducted on three groups of synthetic datasets and stock datasets. It is demonstrated by the experimental results that the hard-cut EM algorithm for the MGPs with the specialized pdfs can obtain a better prediction accuracy than the other classical regression algorithms. This specialized input pdf is more effective for the time series data.

References

Rasmussen, C.E.: Evaluation of Gaussian Processes and Other Methods for Non-linear Regression. University of Toronto (1999)
Google Scholar
Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)
Article Google Scholar
Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: NIPS, vol. 4, p. 1 (2003)
Google Scholar
Tresp, V.: Mixtures of Gaussian processes. In: Advances in Neural Information Processing Systems, pp. 654–660 (2001)
Google Scholar
Yuan, C., Neubauer, C.: Variational mixture of Gaussian process experts. In: Advances in Neural Information Processing Systems, pp. 1897–1904 (2009)
Google Scholar
Stachniss, C., Plagemann, C., Lilienthal, A.J., et al.: Gas Distribution Modeling using Sparse Gaussian Process Mixture Models. In: Robotics: Science and Systems, vol. 3 (2008)
Google Scholar
Yang, Y., Ma, J.: An efficient EM approach to parameter learning of the mixture of Gaussian processes. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011. LNCS, vol. 6676, pp. 165–174. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21090-7_20
Chapter Google Scholar
Meeds, E., Osindero, S.: An alternative infinite mixture of Gaussian process experts. In: Advances in Neural Information Processing Systems, pp. 883–890 (2006)
Google Scholar
Sun, S., Xu, X.: Variational inference for infinite mixtures of Gaussian processes with applications to traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 12(2), 466–475 (2011)
Article Google Scholar
Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for machine learning, MIT Press 2(3), 4 (2006)
Google Scholar
Nguyen, T., Bonilla, E.: Fast allocation of Gaussian process experts. In: International Conference on Machine Learning, pp. 145–153 (2014)
Google Scholar
Lázaro-Gredilla, M., Van Vaerenbergh, S., Lawrence, N.D.: Overlapping mixtures of Gaussian processes for the data association problem. Pattern Recogn. 45(4), 1386–1395 (2012)
Article Google Scholar
Ross, J., Dy, J.: Nonparametric mixture of Gaussian processes with constraints. In: International Conference on Machine Learning, 1346–1354 (2013)
Google Scholar
Wu, D., Ma, J.: A two-layer mixture model of Gaussian process functional regressions and its MCMC EM algorithm. IEEE Trans. Neural Netw. Learn. Syst. (2018)
Google Scholar
Wu, D., Chen, Z., Ma, J.: An MCMC based EM algorithm for mixtures of Gaussian processes. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds.) ISNN 2015. LNCS, vol. 9377, pp. 327–334. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25393-0_36
Chapter Google Scholar
Wu, D., Ma, J.: A DAEM algorithm for mixtures of Gaussian process functional regressions. In: Huang, D.-S., Han, K., Hussain, A. (eds.) ICIC 2016. LNCS (LNAI), vol. 9773, pp. 294–303. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42297-8_28
Chapter Google Scholar
Chen, Z., Ma, J., Zhou, Y.: A precise hard-cut EM algorithm for mixtures of Gaussian processes. In: Huang, D.-S., Jo, K.-H., Wang, L. (eds.) ICIC 2014. LNCS (LNAI), vol. 8589, pp. 68–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09339-0_7
Chapter Google Scholar
Chen, Z., Ma, J.: The hard-cut EM algorithm for mixture of sparse Gaussian processes. In: Huang, D.-S., Han, K. (eds.) ICIC 2015. LNCS (LNAI), vol. 9227, pp. 13–24. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22053-6_2
Chapter Google Scholar
Zhao, L., Chen, Z., Ma, J.: An effective model selection criterion for mixtures of Gaussian processes. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds.) ISNN 2015. LNCS, vol. 9377, pp. 345–354. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25393-0_38
Chapter Google Scholar
Zhao, L., Ma, J.: A dynamic model selection algorithm for mixtures of Gaussian processes. In: 2016 IEEE 13th International Conference on Signal Processing (ICSP), pp. 1095–1099. IEEE (2016)
Google Scholar
Liu, S., Ma, J.: Stock price prediction through the mixture of Gaussian processes via the precise hard-cut EM algorithm. In: Huang, D.-S., Han, K., Hussain, A. (eds.) ICIC 2016. LNCS (LNAI), vol. 9773, pp. 282–293. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42297-8_27
Chapter Google Scholar
Shi, J.Q., Murray-Smith, R., Titterington, D.M.: Bayesian regression and classification using mixtures of Gaussian processes. Int. J. Adapt. Control. Signal Process. 17(2), 149–161 (2003)
Article Google Scholar
Tayal, A., Poupart, P., Li, Y.: Hierarchical double Dirichlet process mixture of Gaussian processes. In: AAAI (2012)
Google Scholar
Chatzis, S.P., Demiris, Y.: Nonparametric mixtures of Gaussian processes with power-law behavior. IEEE Trans. Neural Netw. Learn. Syst. 23(12), 1862–1871 (2012)
Article Google Scholar
Kapoor, A., Ahn, H., Picard, R.W.: Mixture of Gaussian processes for combining multiple modalities. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 86–96. Springer, Heidelberg (2005). https://doi.org/10.1007/11494683_9
Chapter Google Scholar

Download references

Acknowledgment

This work was supported by the National Science Foundation of China under Grant 61171138.

Author information

Authors and Affiliations

Department of Information Science, School of Mathematical Sciences and LMAM, Peking University, Beijing, 100871, People’s Republic of China
Longbo Zhao & Jinwen Ma

Authors

Longbo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwen Ma .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Zhongzhi Shi
University of Amsterdam, Amsterdam, The Netherlands
Cyriel Pennartz
Peking University, Beijing, China
Tiejun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, L., Ma, J. (2018). A Specialized Probability Density Function for the Input of Mixture of Gaussian Processes. In: Shi, Z., Pennartz, C., Huang, T. (eds) Intelligence Science II. ICIS 2018. IFIP Advances in Information and Communication Technology, vol 539. Springer, Cham. https://doi.org/10.1007/978-3-030-01313-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-01313-4_8
Published: 02 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01312-7
Online ISBN: 978-3-030-01313-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)