Introduction

DOI, namely digital object identifier, is a permanent alphanumeric string to uniquely identify objects. The DOI name is a case-insensitive string and is made up of a prefix beginning with “10.” and a suffix separated by a forward slash. The DOI is widely used to identify academic publications (Boudry and Chartron 2017; Gorraiz et al. 2016). The DOI name, like an ID card number, can be used to identify a specific publication within a huge database.

Franceschini et al. (2015) revealed that quite a few single DOI names were incorrectly assigned to multiple papers indexed in the Scopus database. Yet it remains unknown whether this also holds for other leading standardized publication datasets (Tang and Shapira 2011). In this paper, we developed a special search query and to experimentally probe the DOI name error problem in the Web of Science.

Method

Let us start with admitting that there is no silver line which can systematically identify and retrieve all records indexed in Web of Science with wrong DOI names. Yet intuitively, also based on our past research experience that the number “0” is most often be treated as the letter “O”. Thus we use a special string (within the DOI name, the letter “O” appears between two 0–9 numbers) as an illustrating case to explore if such errors also happened to DOI, an increasingly heavily relied tool to identify research articles.

$$ \begin{aligned}&{\text{DO}} = \left( {\begin{array}{*{20}l} { 1 0. * 0 {\text{O0* OR 10}} . * 0 {\text{O1* OR 10}} . * 0 {\text{O2* OR 10}} . * 0 {\text{O3* OR 10}} . * 0 {\text{O4* OR 10}} . * 0 {\text{O5* OR}}} \hfill \\ { 1 0. * 0 {\text{O6* OR 10}} . * 0 {\text{O7* OR 10}} . * 0 {\text{O8* OR 10}} . * 0 {\text{O9* OR 10}} . * 1 {\text{O0* OR 10}} . * 1 {\text{O1* OR 10}} . * 1 {\text{O2*}}} \hfill \\ {{\text{OR 10}} . * 1 {\text{O3* OR 10}} . * 1 {\text{O4* OR 10}} . * 1 {\text{O5* OR 10}} . * 1 {\text{O6* OR 10}} . * 1 {\text{O7* OR 10}} . * 1 {\text{O8* OR}}} \hfill \\ { 1 0. * 1 {\text{O9* OR 10}} . * 2 {\text{O0* OR 10}} . * 2 {\text{O1* OR 10}} . * 2 {\text{O2* OR 10}} . * 2 {\text{O3* OR 10}} . * 2 {\text{O4* OR 10}} . * 2 {\text{O5*}}} \hfill \\ {{\text{OR 10}} . * 2 {\text{O6* OR 10}} . * 2 {\text{O7* OR 10}} . * 2 {\text{O8* OR 10}} . * 2 {\text{O9* OR 10}} . * 3 {\text{O0* OR 10}} . * 3 {\text{O1* OR}}} \hfill \\ { 1 0. * 3 {\text{O2* OR 10}} . * 3 {\text{O3* OR 10}} . * 3 {\text{O4* OR 10}} . * 3 {\text{O5* OR 10}} . * 3 {\text{O6* OR 10}} . * 3 {\text{O7* OR 10}} . * 3 {\text{O8*}}} \hfill \\ {{\text{OR 10}} . * 3 {\text{O9* OR 10}} . * 4 {\text{O0* OR 10}} . * 4 {\text{O1* OR 10}} . * 4 {\text{O2* OR 10}} . * 4 {\text{O3* OR 10}} . * 4 {\text{O4* OR}}} \hfill \\ { 1 0. * 4 {\text{O5* OR 10}} . * 4 {\text{O6* OR 10}} . * 4 {\text{O7* OR 10}} . * 4 {\text{O8* OR 10}} . * 4 {\text{O9* OR 10}} . * 5 {\text{O0* OR 10}} . * 5 {\text{O1*}}} \hfill \\ {{\text{OR 10}} . * 5 {\text{O2* OR 10}} . * 5 {\text{O3* OR 10}} . * 5 {\text{O4* OR 10}} . * 5 {\text{O5* OR 10}} . * 5 {\text{O6* OR 10}} . * 5 {\text{O7* OR}}} \hfill \\ { 1 0. * 5 {\text{O8* OR 10}} . * 5 {\text{O9* OR 10}} . * 6 {\text{O0* OR 10}} . * 6 {\text{O1* OR 10}} . * 6 {\text{O2* OR 10}} . * 6 {\text{O3* OR 10}} . * 6 {\text{O4*}}} \hfill \\ {{\text{OR 10}} . * 6 {\text{O5* OR 10}} . * 6 {\text{O6* OR 10}} . * 6 {\text{O7* OR 10}} . * 6 {\text{O8* OR 10}} . * 6 {\text{O9* OR 10}} . * 7 {\text{O0* OR}}} \hfill \\ { 1 0. * 7 {\text{O1* OR 10}} . * 7 {\text{O2* OR 10}} . * 7 {\text{O3* OR 10}} . * 7 {\text{O4* OR 10}} . * 7 {\text{O5* OR 10}} . * 7 {\text{O6* OR 10}} . * 7 {\text{O7*}}} \hfill \\ {{\text{OR 10}} . * 7 {\text{O8* OR 10}} . * 7 {\text{O9* OR 10}} . * 8 {\text{O0* OR 10}} . * 8 {\text{O1* OR 10}} . * 8 {\text{O2* OR 10}} . * 8 {\text{O3* OR}}} \hfill \\ { 1 0. * 8 {\text{O4* OR 10}} . * 8 {\text{O5* OR 10}} . * 8 {\text{O6* OR 10}} . * 8 {\text{O7* OR 10}} . * 8 {\text{O8* OR 10}} . * 8 {\text{O9* OR 10}} . * 9 {\text{O0*}}} \hfill \\ {\text{OR 10}} . * 9 {\text{O1* OR 10}} . * 9 {\text{O2* OR 10}} . * 9 {\text{O3* OR 10}} . * 9 {\text{O4* OR 10}} . * 9 {\text{O5* OR 10}} . * 9 {\text{O6* OR}} \hfill \\ 1 0. * 9 {\text{O7* OR 10}} . * 9 {\text{O8* OR 10}} . * 9 {\text{O9*}} \end{array} } \right)\\ &{{\text{Indexes}} = {\text{SCI - EXPANDED, SSCI, A}\&\text{HCI Timespan}} = 1 9 0 0- 2 0 1 7} \end{aligned} $$

We conducted the searching on November 5th, 2018 from the library of Xi’an Jiao Tong University. 319 records were hit by the search strategy. Only 310 records published in English were selected for further analysis. All the bibliographic information of these records is downloaded and further imported to Excel for further process. We used the DOI system (http://dx.doi.org/) as the golden rule to resolve the DOI names.

Analyses

Wrong DOI names

Among these 310 records, we find 119 DOI names downloaded from the Web of Science (the third column of Table 1) cannot be found in the DOI System. These unresolved DOI names cover records published from 2001 to 2017. For comparison, we also manually collected each record’s DOI name provided on the publisher’s webpage of each record (the fifth column of Table 1). We assumed that the DOI names provided by the publishers on the webpage is correct. Luckily, all the DOI names provided by the publishers can be found in the DOI System. The DOI names provided by the publishers can be linked to the papers via the DOI System. That is to say, the 119 DOI names downloaded from the Web of Science are incorrect. By comparing the DOI names in the third and fifth columns of Table 1, we find that not only the letter “O” is easily confused with the number “0”, but also similar characters such as “b” and “6”, “O” and “Q” are easily to be confused.

Table 1 A sample of incorrect DOI names in Web of Science

One paper with two different DOI names

Interestingly, for 73 of all the 119 records, both the DOI names downloaded from the Web of Science and those collected from the publishers can be used to search the specific record in Web of Science (last column of Table 1). For example, the query DO = (“https://doi.org/10.1106/606r-6mlh-6mqa-hpr2” and “https://doi.org/10.1106/6o6r-6mlh-6mqa-hpr2”) can be used to search the first record in the Web of Science. Put it in other words, for some records, the Web of Science has stored both the wrong DOI names and right DOI names. However, we are provided with the wrong DOI names when we download the bibliographic data.

We also find one article with two correct DOI names. For example, both the DOI names https://doi.org/10.1109/tgrs.20o4.826811 and https://doi.org/10.1109/tgrs.2004.826811 can be resolved to this article via the DOI System but only the second DOI name works in the Web of Science. By a closer checking, we find the first DOI name appears on the publisher’s webpage of this article, however, the second DOI name appears on the PDF file of this article.

Discussion

As noted above, there is no simple way to identify and thus to assess the extent of DOI errors in the Web of Science dataset. This paper, yet, with a special search query reminds us the existence of DOI name errors in the Web of Science, though we do not know how representative this common error is when compared to other possible mistypes. For some records, Web of Science may contain two different DOI names for one record.

Based on our analysis, we argue that since the wrong DOI names cannot be resolved via the DOI System, Web of Science can cooperate with the DOI System to identify potentially wrong DOI names and then recollect the correct DOI names from the publishers. For those articles with two different DOI names problem, Web of Science should identify the correct DOI names and keep the correct DOI names only. We hope this problem can be fixed in the near future.

Both the Web of Science and Scopus are widely used in bibliometric analysis (Liao et al. 2019; Yu et al. 2018). However, these two databases are not free of errors/limitations (Franceschini et al. 2016; Hu et al. 2012; Liu 2017; Liu et al. 2018). For example, the funding is a great concern for academia (Liu et al. 2019), however, the funding acknowledgement information collected by Web of Science is not free of error/limitation (Tang et al. 2017). Users should be aware of these errors/limitations and at the same time the database providers should try to solve or improve the data quality problem.