Abstract
In this article, I will discuss three challenges in today’s data mining field. These challenges include: the transfer learning challenge, the social learning challenge and the mobile context mining challenge. I pick these three challenges because I think time is ripe for each of them to be addressed in a major way in the near future, given the current technological and societal readiness to tackle them. I also believe that each of the three challenges discussed in this article will help move the science and engineering of data mining forward, and have a great impact on society.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Caruana R. Multitask learning. Machine Learning, 1997, 28, 41–75
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010 Available at http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191
Raina R, Ng A Y, Koller D. Constructing informative priors using transfer learning. In: Proceedings of 23rd International Conference on Machine Learning, Carnegie Mellon, Pittsburgh, Pennsylvania. 2006, 713–720
Dai W, Xue G, Yang Q, Yu Y. Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA. 2007, 210–219
Dai W, Xue G, Yang Q, Yu Y. Transferring naive Bayes classifiers for text classification. In: Proceedings of the 22rd AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada. 2007, 540–545
Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning. In: Proceedings of the Conference on Empirical Methods in Natural Language, Sydney, Australia. 2006, 120–128
Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic. 2007, 432–439
Pan S J, Ni X, Sun J T, Yang Q, Chen Z. Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW. 2010, 751–760
Wu P, Dietterich T G. Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Alberta, Canada. 2004, 871–878
Arnold A, Nallapati R, Cohen W W. A comparative study of methods for transductive transfer learning. In: Proceedings of the 7th IEEE International Conference on Data Mining Workshops, Washington, DC, USA, IEEE Computer Society. 2007, 77–82
Raykar V C, Krishnapuram B, Bi J, Dundar M, Rao R B. Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: Proceedings of the 25th International Conference on Machine learning, Helsinki, Finland. 2008, 808–815
Ling X, Xue G R, Dai W, Jiang Y, Yang Q, Yu Y. Can Chinese web pages be classified with English data source? In: Proceedings of the 17th International Conference onWorldWideWeb, Beijing, China. 2008, 969–978
Yang Q, Chen Y, Xue G R, Dai W, Yu Y. Heterogeneous transfer learning for image clustering via the social Web. In: ACL-IJCNLP (2009). 1–9
Yang Q. Activity recognition: Linking low-level sensors to highlevel intelligence. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 20–25
Pan S J, Shen D, Yang Q, Kwok J T. Transferring localization models across space. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1383–1388
Zheng V W, Pan S J, Yang Q, Pan J J. Transferring multi-device localization models using latent multi-task learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1427–1432
Su E C Y, Chiu H S, Lo A, Hwang J K, Sung T Y, Hsu W L. Protein subcellular localization prediction based on compartment-specific feature and structure conservation. BMC Bioinformatics, 2007, 8(1): 330–341
Muskal S M, Kim S H. Predicting protein secondary structure content. A tandem neural network approach. Journal of Molecular Biology, 1992, 225(3): 713–727
Zhou G P. An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry, 1998, 17(8): 729–738
Zhou G P, Assa-Munt N. Some insights into protein structural class prediction. Proteins, 2001, 44(1): 57–59
Chou K C. Prediction of protein cellular attributes using pseudoamino acid composition. Proteins, 2001, 43(3): 246–255
Liu W, Chou K C. Prediction of protein secondary structure content. Protein Engineering, 1999, 12(12): 1041–1050
Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research, 1998, 26(9): 2230–2236
Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics, 2004, 20(1): 21–28
Yu C S, Lin C J, Hwang J K. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein science: A Publication of the Protein Society, Protein Sci., 2004, 13(5): 1402–1406
Shen H B, Yang J, Chou K C. Euk-PLoc: An ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids, 2007, 33(1): 57–67
Chou K C, Shen H B. Cell-PLoc: A package of Web servers for predicting subcellular localization of proteins in various organisms. Nature Protocols, 2008, 3(2): 153–162
Xu Q, Pan S J, Xue H H, Yang Q. Multitask learning for protein subcellular location prediction. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2010
Wang F-Y, Carley K M, Zeng D, Mao W. Social computing: From social informatics to social intelligence. In: IEEE Intelligent Systems, March/April. 2007, 79–83
Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. JASIST, 2007, 58(7): 1019–1031
Liben-Nowell D, Kleinberg J M. The link prediction problem for social networks. In: ACM Conference on Information and Knowledge Management. 2003, 556–559
Breese J, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence. 1998, 43–52
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: An open architecture for Collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work. 1994, 175–186
Herlocker J, Konstan J A, Riedl J. An empirical analysis of design choices in neighborhood-based collaborative Filtering algorithms. Information Retrieval, 2002, 5(4): 287–310
Sarwar B, Karypis G, Konstan J, Reidl J. Item-based collaborative filtering recommendation algorithms. In: WWW. 2001, 285–295
Han J, Sun Y, Yan Y, Yu P S. Mining knowledge from databases: An information network analysis approach. In: SIGMOD Conference. 2010, 1251–1252
Gruhl D, Guha R V, Liben-Nowell D, Tomkins A. Information diffusion through blogspace. In: WWW. 2004, 491–501
Tang J, Sun J, Wang C, Yang Z. Social influence analysis in largescale networks. In: ACM KDD. 2009, 807–816
Leskovec J, Backstrom L, Kumar R, Tomkins A. Microscopic evolution of social networks. In: ACM KDD. 2008, 462–470
Linden G, Smith B, York J. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 2003, 7(1): 76–80
Goldberg K, Roeder T, Gupta D, Perkins C. Eigentaste: A constant time collaborative filtering algorithm. Information Rretrieval, 2001, 4(2): 133–151
Ma H, King I, Lyu M. Effective missing data prediction for collaborative filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 39–46
Rennie J, Srebro N. Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22nd International Conference on Machine Learning. 2005, 713–719
Paterek A. Improving regularized singular value decomposition for collaborative filtering. In: Proceedings of KDD Cup and Workshop. 2007
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009, 42(8): 30–37
Hofmann T. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 2004, 22(1): 89–115
Jin R, Si L, Zhai C, Callan J. Collaborative filtering with decoupled models for preferences and ratings. In: ACM Conference on Information and Knowledge Management. 2003, 309–316
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning. 2007, 791–798
Li B, Yang Q, Xue X. Transfer learning for collaborative filtering via a rating-matrix generative model. In: ICML. 2009, 617–624
Pan W, Xiang E W, Liu N, Yang Q. Transfer learning in collaborative filtering for sparsity reduction. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear
Kittur A, Chi E H, Suh B. Crowdsourcing user studies with Mechanical Turk. In: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems (2008). CHI’ 08. ACM, New York, NY, 2008, 453–456
Das A S, Datar M, Garg A, Rajaram S. Google news personalization: scalable online collaborative filtering. In: Proceedings of WWW. 2007, 271–280
Dean J, Ghemawat S. Mapreduce. Communications of the ACM, 2008, 51(1): 107–113
Yin J, Chai X, Yang Q. High-level goal recognition in a wireless LAN. In: Proceedings of the 19th AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2004, 578–584
Chai X, Yang Q. Multiple-goal recognition from low-level signals. In: Proceedings of the 20 AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2005, 3–8
Hu D H, Yang Q. Cigar: Concurrent and interleaving goal and activity recognition. In: Proceedings of the 23 AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2008, 1715–1720
Yin J, Yang Q, Pan J J. Sensor-based abnormal human-activity detection. IEEE Trans. on Knowl. and Data Eng., 2008, 20(8): 1082–1090
Hu D H, Zhang X X, Yin J, Zheng VW, Yang Q. Abnormal activity recognition based on HDP-HMM models. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 1715–1720
Zheng V W, Zheng Y, Xie X, Yang Q. Collaborative location and activity recommendations with gps history data. In: WWW. 2010, 1029–1038
Zheng V W, Cao B, Zheng Y, Xie X, Yang Q. Collaborative filtering meets mobile recommendation: A user-centered approach. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear
Eagle N. Mobile Phones as Social Sensors. The Handbook of Emergent Technologies in Social Research. Oxford University Press, 2010
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, Q. Three challenges in data mining. Front. Comput. Sci. China 4, 324–333 (2010). https://doi.org/10.1007/s11704-010-0102-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-010-0102-7