Abstract
In dyadic prediction, the input consists of a pair of items (a dyad), and the goal is to predict the value of an observation related to the dyad. Special cases of dyadic prediction include collaborative filtering, where the goal is to predict ratings associated with (user, movie) pairs, and link prediction, where the goal is to predict the presence or absence of an edge between two nodes in a graph. In this paper, we study the problem of predicting labels associated with dyad members. Special cases of this problem include predicting characteristics of users in a collaborative filtering scenario, and predicting the label of a node in a graph, which is a task sometimes called within-network classification or relational learning. This paper shows how to extend a recent dyadic prediction method to predict labels for nodes and labels for edges simultaneously. The new method learns latent features within a log-linear model in a supervised way, to maximize predictive accuracy for both dyad observations and item labels. We compare the new approach to existing methods for within-network classification, both experimentally and analytically. The experiments show, surprisingly, that learning latent features in an unsupervised way is superior for some applications to learning them in a supervised way.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Blei DM, McAuliffe JD (2010) Supervised topic models. Revised version. http://arxiv.org/PS_cache/arxiv/pdf/1003/1003.0783v1.pdf
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9: 1871–1874
Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (Denver, CO, USA, June 7–11, 2005), JCDL’05. ACM, New York, NY, pp 141–142
Macskassy SA, Provost F (2003) A simple relational classifier. In: Proceedings of the second workshop on multi-relational data mining (MRDM-2003) at KDD-2003, pp 64–76
Menon AK, Elkan C (2010a) Dyadic prediction using a latent feature log-linear model. http://arxiv.org/abs/1006.2156
Menon AK, Elkan C (2010b) Fast algorithms for approximating singular value decomposition. ACM Trans Knowl Discov Data. Special issue large-scale data mining: theory appl (to appear)
Sarkar P, Chen L, Dubrawski A (2008) Dynamic network model for predicting occurrences of salmonella at food facilities. In: Proceedings of the BioSecure international workshop. Springer, Heidelberg, pp 56–63
Tang L (2010) Social dimension approach to classification in large-scale networks. http://www.public.asu.edu/~ltang9/social_dimension.html
Tang L, Liu H (2009) Relational learning via latent social dimensions. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Edmonton, Alberta, pp 817–826
USPS (2010) USPS dataset. Obtained from http://www-i6.informatik.rwth-aachen.de/~keysers/usps.html
Weimer M, Karatzoglou A, Smola AJ (2008) Improving maximum margin matrix factorization. In: European conference on machine learning and principles and practice of knowledge discovery in databases. pp 263–276
Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: ACM SIGIR conference on research and development in information retrieval. ACM, Boston, pp 258–265
Yu S, Yu K, Tresp V, Kriegel HP, Wu M (2006) Supervised probabilistic principal component analysis. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 464–473
Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Amsterdam, pp 487–494
Acknowledgments
The authors thank Lei Tang for gracious help with running the code for SocDim and for answering several queries regarding the same. The authors also thank David Blei for providing the senator dataset.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: José L Balcázar, Francesco Bonchi, Aristides Gionis, Michéle Sebag.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Menon, A.K., Elkan, C. Predicting labels for dyadic data. Data Min Knowl Disc 21, 327–343 (2010). https://doi.org/10.1007/s10618-010-0189-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0189-3