Abstract
We propose a model-based approach to the twofold problem of prediction and exploratory analysis of heterogeneous symbolic sequence collections. Our model is based on seeking low entropy local representations joined together with a smooth nonlinear mixing process. Low entropy components are desirable, as they tend to be both more interpretable and more predictable. The nonlinear mixing in turn acts as a regulariser, and in addition, it creates a topographic ordering of the sequence histories, which is useful for exploratory purposes. The combination of these two modelling elements is performed through the generative probabilistic formalism, which ensures a flexible and technically sound predictive modelling framework. Unlike previous generative topographic modelling approaches for discrete data, the estimation algorithm associated with our model is designed to scale to large data sets by exploiting data sparseness. In addition, local convergence is guaranteed without the need for tuning optimisation parameters or making approximations to the non-Gaussian likelihood. These characteristics make it the first generative topographic model for discrete symbolic data with large scale real-world applicability. We analyse and discuss the relationship of our approach with a number of models and methods. We empirically demonstrate robustness against varying sample sizes, leading to significant improvements in terms of predictive performance over the state of the art. Finally we detail an application to the prediction and exploratory analysis of a large real-world web navigation sequence collection.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Attias, H. (2001). Learning in high dimensions: Modular mixture models. In Proceedings of the 8th International Conference on Artificial Intelligence and Statistics (pp. 144–148).
Bengio, Y., Paiement, J.-F., & Vincent, P. (2004). Neural information processing systems (NIPS): Vol. 16. Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. Cambridge: MIT Press.
Bernardo, J. M., & Smith, A. F. M. (2001). Bayesian theory. Cambridge: Wiley.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press. Chap. 7
Bishop, C. M., Svensen, M., & & Williams, C. K. I. (1998a). The generative topographic mapping. Neural Computation, 10(1), 215–234.
Bishop, C. M., Svensen, M., & Williams, C. K. I. (1998b). Developments of the generative topographic mapping. Neurocomputing, 21, 203–224.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(5), 993–1022.
Buntine, W. (2002). Variational extensions to EM and multinomial PCA. In Proc. of the 13-th European Conference on Machine Learning (ECML).
Cadez, I., Heckerman, D., Meek, C., Smyth, P., & White, S. (2003). Model-based clustering and visualisation of navigation patterns on a web site. Data Mining and Knowledge Discovery, 7(4), 499–242.
Carreira-Perpiñán, M. Á., & Renals S. (1998). Experimental evaluation of latent variable models for dimensionality reduction. In Proc. IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing (NNSP’98) (pp. 165–173).
Celeux, G., Chrétien, S., Forbes, F., & Mkhadri, A. (2001). A component-wise EM algorithm for mixtures. Journal of Computational & Graphical Statistics, 10(4), 697–712(16).
Cover, T., & Thomas, J. (1991). Elements of information theory. New York: Wiley.
Girolami, M., & Kabán, A. (2005). Sequential activity profiling: Latent Dirichlet allocation of Markov chains. Data Mining and Knowledge Discovery, 10, 175–196.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York: Springer.
Hofmann, T. (2000). Probmap—A probabilistic approach for mapping large document collections. Journal for Intelligent Data Analysis, 4, 149–164.
Hofmann, T., & Buhmann, J. (1998). Competitive learning algorithms for robust vector quantization. IEEE Transactions on Signal Processing, 46(6), 1665–1675.
Hollmén, J., Tresp, V., & Simula, O. (1999). A self-organizing map algorithm for clustering probabilistic models. In Proc. of the 9-th International Conference on Artificial Neural Networks (ICANN):, Vol. 2 (pp. 946–951).
Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T. L., & Tennenbaum, J. B. (2005). Neural information processing systems (NIPS): Vol. 17. Parametric embedding for class visualisation. Cambridge: MIT Press.
Kabán, A. (2005). A scalable generative topographic mapping for sparse data sequences. In Proc. IEEE International Conference on Information Technology: Coding and Computing (ITCC) (pp. 51–56).
Kabán, A., & Girolami, M. (2001). A combined latent class and trait model for the analysis and visualisation of discrete data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8), 859–872.
Kaski, S., Kangas, J., & Kohonen, T. (1998). Bibliography of self-organizing map (SOM) papers: 1981–1997. In Neural computing surveys: Vol. 1 (pp. 102–350).
Keller, M., & Bengio, S. (2004). TTMM: a graphical model for document representation. In PASCAL Workshop on Text Mining and Understanding, Grenoble, France
Kelley, T. C. (1995). Iterative methods for optimization. Frontiers in Applied Mathematics, Philadelphia: SIAM.
Kohonen, T. (1999). Self-organising maps. Berlin: Springer.
McLachlan, G., & Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.
Peterson, C., & Söderberg, B. (1989). A new method for mapping optimization problems onto neural networks. International Journal of Neural Systems, 1(1), 3–22.
Ramakrishnan, N., & Grama, A. (2001). Mining scientific data. Advances in Computers, 55, 119–169.
Roweis, S., Saul, L. K., & Hinton, G. (2002). Neural information processing systems (NIPS): Vol. 14. Global coordination of local linear models (pp. 889–896). Cambridge: MIT Press.
Salakhutdinov, R., Roweis, S., & Ghahramani, Z. (2003). Optimization with EM and expectation-conjugate-gradient. In Proc. of the 20-th International Conference on Machine Learning (ICML) (pp. 672–679).
Sarukkai, R. (2000). Link prediction and path analysis using Markov chains. Computer Networks, 33(1–6), 377–386.
Tiño, P., Kabán, A., & Sun, Y. (2004). A generative probabilistic approach to visualising sets of symbolic sequences. In Proc. of the 10-th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (pp. 701–706). New York: ACM.
Wu, J. M., & Chiu, S. J. (2001). Independent component analysis using Potts models. IEEE Transactions on Neural Networks, 12(2), 202–211.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Zoubin Ghahramani.
Rights and permissions
About this article
Cite this article
Kabán, A. Predictive Modelling of Heterogeneous Sequence Collections by Topographic Ordering of Histories. Mach Learn 68, 63–95 (2007). https://doi.org/10.1007/s10994-007-5008-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5008-8