Abstract
Predicting future high impact academic papers is of benefit to a range of stakeholders, including governments, universities, academics, and investors. Being able to predict ‘the next big thing’ allows the allocation of resources to fields where these rapid developments are occurring. This paper develops a new method for predicting a paper’s future impact using features of the paper’s neighbourhood in the citation network, including measures of interdisciplinarity. Predictors of high impact papers include high early citation counts of the paper, high citation counts by the paper, citations of and by highly cited papers, and interdisciplinary citations of the paper and of papers that cite it. The Scopus database, consisting of over 24 million publication records from 1996-2010 across a wide range of disciplines, is used to motivate and evaluate the methods presented.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Australian Government: Australia in the Asian Century White Paper (2012)
Department of Industry, Innovation, Science, Research and Tertiary Education: 2012 National Research Investment Plan (2012)
Office of the Chief Scientist of Australia: Health of Australian Science (2012)
Price, D.: Networks of scientific papers. Science 149(3683), 510–515 (1965)
Castellano, C., Radicchi, F.: On the fairness of using relative indicators for comparing citation performance in different disciplines. Archivum Immunologiae et Therapiae Experimentalis 57(2), 85–90 (2009)
Radicchi, F., Fortunato, S., Castellano, C.: Universality of citation distributions: Toward an objective measure of scientific impact. Proc. Natl. Acad. Sci. USA 105(45), 17268–17272 (2008)
Waltman, L., van Eck, N.J., van Raan, A.F.: Universality of citation distributions revisited. J. Am. Soc. Inf. Sci. Technol. 63(1), 72–77 (2012)
Small, H.: Tracking and predicting growth areas in science. Scientometrics 68(3), 595–610 (2006)
Upham, S., Small, H.: Emerging research fronts in science and technology: patterns of new knowledge development. Scientometrics 83(1), 15–38 (2010)
Adams, J.: Early citation counts correlate with accumulated impact. Scientometrics 63(3), 567–581 (2005)
Manjunatha, J.N., Sivaramakrishnan, K.R., Pandey, R.K., Murthy, M.N.: Citation prediction using time series approach KDD cup 2003 (task 1). SIGKDD Explor. Newsl. 5(2), 152–153 (2003)
Shibata, N., Kajikawa, Y., Matsushima, K.: Topological analysis of citation networks to discover the future core articles. J. Am. Soc. Inf. Sci. Technol. 58(6), 872–882 (2007)
Castillo, C., Donato, D., Gionis, A.: Estimating number of citations using author reputation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 107–117. Springer, Heidelberg (2007)
Yan, R., Tang, J., Liu, X., Shan, D., Li, X.: Citation count prediction: learning to estimate future citations for literature. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 1247–1252 (2011)
Yogatama, D., Heilman, M., O’Connor, B., Dyer, C., Routledge, B.R., Smith, N.A.: Predicting a scientific community’s response to an article. In: 0, pp. 594–604 (2011)
Bettencourt, L., Kaiser, D., Kaur, J., Castillo-Chávez, C., Wojick, D.: Population modeling of the emergence and development of scientific fields. Scientometrics 75(3), 495–518 (2008)
Goffman, W., Newill, V.A.: Generalization of epidemic theory: An application to the transmission of ideas. Nature 204(4955), 225–228 (1964)
Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Burt, R.S.: Structural holes: the social structure of competition. Harvard University Press, Cambridge (1992)
Chen, C.: Predictive effects of structural variation on citation counts. J. Am. Soc. Inf. Sci. Technol. 63(3), 431–449 (2012)
Chen, C., Chen, Y., Horowitz, M., Hou, H., Liu, Z., Pellegrino, D.: Towards an explanatory and computational theory of scientific discovery. J. Informetr. 3(3), 191–209 (2009)
Adams, J., Jackson, L., Marshall, S.: Bibliometric analysis of interdisciplinary research. Report to Higher Education Funding Council for England (2007)
Larivière, V., Gingras, Y.: On the relationship between interdisciplinarity and scientific impact. J. Am. Soc. Inf. Sci. Technol. 61(1), 126–131 (2009)
Nankani, E., Simoff, S.: Predictive analytics that takes in account network relations: A case study of research data of a contemporary university. In: Proceedings of the 8th Australasian Data Mining Conference, AusDM 2009, pp. 99–108 (2009)
Scopus: Scopus custom technical requirements, Version 2.0 (2009)
Guo, H., Weingart, S., Börner, K.: Mixed-indicators model for identifying emerging research areas. Scientometrics 89(1), 421–435 (2011)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Liaw, A., Wiener, M.: Package ‘randomForest’: Breiman and Cutler’s random forests for classification and regression (2012)
R Documentation: Fitting linear models (2012)
Therneau, T.M., Atkinson, E.: An introduction to recursive partitioning using the RPART routines (2011)
R Documentation: Test for association/correlation between paired samples (2012)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Shibata, N., Kajikawa, Y., Sakata, I.: Link prediction in citation networks. J. Am. Soc. Inf. Sci. Technol. 63(1), 78–85 (2012)
McNamara, D.: A new method for the prediction of emerging fields of research. Honours thesis, Australian National University (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McNamara, D., Wong, P., Christen, P., Ng, K.S. (2013). Predicting High Impact Academic Papers Using Citation Network Features. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-40319-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)