Abstract
Author name ambiguity occurs when multiple authors share a common name and an author writes one’s name in many ways. This hinders the quality of information retrieval and correct attribution to authors in bibliographic databases. Despite much research in the past decade, the author name ambiguity problem remains largely unsolved. Outstanding issues include limited capabilities (solve only homonyms or synonyms), require extra information (Web or user feedback), actual number of authors K in advance and not scalable. In this paper, a method called GCLUSIM is proposed which uses graph structural clustering and proposed similarity measure to resolve ambiguous authors. GCLUSIM preprocesses citation data set and constructs co-authors graph. Graph-based structural clustering is applied to the constructed graph to identify hub nodes, outliers, and clusters of nodes. It resolves homonyms by splitting these clusters if the feature vector similarity between these clusters is less than the predefined threshold and synonyms by exploiting proposed similarity. Finally, it disambiguates sole authors by comparing name and feature vector similarities with the disambiguated clusters. Experiments are performed with Arnetminer and BDBComp to validate the performance of the GCLUSIM. Results show that GCLUSIM is scalable, overall better in performance than baselines and the number of clusters found is close to the ground truth clusters.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bhattacharya, I.; Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 5 (2007)
Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.: Effective self-training author name disambiguation in scholarly digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 39–48. ACM (2010)
Tang, J.; Fong, A.C.; Wang, B.; Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)
Han, H.; Xu, W.; Zha, H.; Giles, C.L.: A hierarchical naive bayes mixture model for name disambiguation in author citations. In: Proceedings of the 2005 ACM symposium on Applied computing, pp. 1065–1069. ACM (2005)
Shin, D.; Kim, T.; Choi, J.; Kim, J.: Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 15–50 (2014)
Han, D.; Liu, S.; Hu, Y.; Wang, B.; Sun, Y.: Elm-based name disambiguation in bibliography. World Wide Web 18(2), 253–263 (2015)
On, B.W.; Lee, D.; Kang, J.; Mitra, P.: Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 344–353. ACM (2005)
Huang, J.; Ertekin, S.; Giles, C.L.: Efficient name disambiguation for large-scale databases. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 536–544. Springer (2006)
Treeratpituk, P.; Giles, C.L.: Disambiguating authors in academic publications using random forests. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 39–48. ACM (2009)
Cota, R.G.; Ferreira, A.A.; Nascimento, C.; Gonçalves, M.A.; Laender, A.H.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)
de Carvalho, A.P.; Ferreira, A.A.; Laender, A.H.; Gonçalves, M.A.: Incremental unsupervised name disambiguation in cleaned digital libraries. J. Inf. Data Manag. 2(3), 289 (2011)
Fan, X.; Wang, J.; Pu, X.; Zhou, L.; Lv, B.: On graph-based name disambiguation. J. Data Inf. Qual. (JDIQ) 2(2), 10 (2011)
Onodera, N.; Iwasawa, M.; Midorikawa, N.; Yoshikane, F.; Amano, K.; Ootani, Y.; Kodama, T.; Kiyama, Y.; Tsunoda, H.; Yamazaki, S.: A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. J. Am. Soc. Inf. Sci. Technol. 62(4), 677–690 (2011)
Huynh, T.; Hoang, K.; Do, T.; Huynh, D.: Vietnamese author name disambiguation for integrating publications from heterogeneous sources. In: Asian Conference on Intelligent Information and Database Systems, pp. 226–235. Springer (2013)
Liu, Y.; Tang, Y.: Network based framework for author name disambiguation applications. Int. J. u and e Serv. Sci. Technol. 8(9), 75–82 (2015)
Wang, X.; Tang, J.; Cheng, H.; Philip, S.Y.: Adana: Active name disambiguation. In: 2011 IEEE 11th International Conference on Data Mining, pp. 794–803. IEEE (2011)
On, B.W.; Elmacioglu, E.; Lee, D.; Kang, J.; Pei, J.: Improving grouped-entity resolution using quasi-cliques. In: Sixth International Conference on Data Mining (ICDM’06), pp. 1008–1015. IEEE (2006)
Peng, H.T.; Lu, C.Y.; Hsu, W.; Ho, J.M.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)
Han, H.; Giles, L.; Zha, H.; Li, C.; Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004, pp. 296–305. IEEE (2004)
Wang, J.; Berzins, K.; Hicks, D.; Melkers, J.; Xiao, F.; Pinheiro, D.: A boosted-trees method for name disambiguation. Scientometrics 93(2), 391–411 (2012)
Xu, X.; Yuruk, N.; Feng, Z.; Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007)
Johnson, D.B.: Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975)
On, B.W.; Lee, I.; Lee, D.: Scalable clustering methods for the name disambiguation problem. Knowl. Inf. Syst. 31(1), 129–151 (2012)
Tran, H.N.; Huynh, T.; Do, T.: Author name disambiguation by using deep neural network. In: Asian Conference on Intelligent Information and Database Systems, pp. 123–132. Springer (2014)
Wu, H.; Li, B.; Pei, Y.; He, J.: Unsupervised author disambiguation using dempster-shafer theory. Scientometrics 101(3), 1955–1972 (2014)
Zhu, J.; Yang, Y.; Xie, Q.; Wang, L.; Hassan, S.U.: Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3), 2255–2274 (2014)
Levin, F.H.; Heuser, C.A.: Evaluating the use of social networks in author name disambiguation in digital libraries. J. Inf. Data Manag. 1(2), 183 (2010)
Shoaib, M.; Daud, A.; Khiyal, M.S.H.: Improving similarity measures for publications with special focus on author name disambiguation. Arab. J. Sci. Eng. 40(6), 1591–1605 (2015)
Al-Safadi, L.; Al-Rgebh, D.; AlOhali, W.: A comparison between ontology-based and translation-based semantic search engines for arabic blogs. Arab. J. Sci. Eng. 38(11), 2985 (2013)
Al-Rajebah, N.I.; Al-Khalifa, H.S.: Extracting ontologies from arabic wikipedia: a linguistic approach. Arab. J. Sci. Eng 39(4), 2749–2771 (2014)
Mansouri, D.; Mille, A.; Hamdi-Cherif, A.: Adaptive delivery of trainings using ontologies and case-based reasoning. Arab. J. Sci. Eng. 39(3), 1849 (2014)
Huang, Z.; Zhang, J.; Zhang, B.: Information recommendation between user groups in social networks. Arab. J. Sci. Eng. 40(5), 1443–1453 (2015)
Liu, Q.; Zhou, B.; Li, S.; Li, A.p; Zou, P.; Jia, Y.: Community detection utilizing a novel multi-swarm fruit fly optimization algorithm with hill-climbing strategy. Arab. J. Sci. Eng. 41(3), 807–828 (2016)
Imran, M.; Gillani, S.; Marchese, M.: A real-time heuristic-based unsupervised method for name disambiguation in digital libraries. D Lib. Mag. 19(9), 1 (2013)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Kang, I.S.; Na, S.H.; Lee, S.; Jung, H.; Kim, P.; Sung, W.K.; Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)
Cohen, W.; Ravikumar, P.; Fienberg, S.: A comparison of string metrics for matching names and records. In: Kdd Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)
Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.; Gonçalves, M.A.; Ferreira, A.A.: Using web information for author name disambiguation. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 49–58. ACM (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hussain, I., Asghar, S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity. Arab J Sci Eng 43, 7421–7437 (2018). https://doi.org/10.1007/s13369-018-3099-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-018-3099-0