Introduction

Judit Bar-Ilan was a leading information scientist, with a strong mathematical background, and a final target on users’ behaviour, just the academic cocktail I was eager to find at the time when I decided to direct my life towards the Academy. Her influence on my academic training is immeasurable.

Judit sadly passed away on July 16, 2019, and this work aims to pay tribute to her achievements and academic legacy.

Judit received a technical education including B.Sc. in Mathematics and Computer Science—with distinction (1981), M.Sc. in Mathematics—with distinction (1983), and a PhD in Computer Science (1990), all at the Hebrew University of Jerusalem.

After an academic cycle including a postdoctoral position at the Weizmann Institute of Science (1989–1990), a visiting Lecturer position at the Department of Mathematics and Computer Science, University of Haifa (1990–1991), and being responsible for the seminars in Computer Science at The Open University of Israel (1990–1992), she moved back to the Hebrew University of Jerusalem in 1991–1992 to become a member of the School of Library and Information Studies, where her academic career—forevermore linked to the Social Sciences—started, first as External Teacher, and later as Teaching Fellow (1992–1994), Teacher (1994–1998) and Senior Teacher (1998–2002).

Later, Judit moved to the Department of Information Science at Bar-Ilan University in 2002, where she was head of Department from 2008 to 2012, and was promoted to Full Professor in 2010.

Judit’s outstanding oeuvre comprises over 300 academic publications, including journal articles, book chapters, conference papers, book reviews, to which we must add her teaching dedication and an active role in the community through numerous conference program committee memberships and journal editorial board positions. The impact of Judit’s work can be fairly reflected through the nearly 4000 citations currently received according to Scopus (over 8000 according to Google Scholar citation profiles).

Judit was active in different fields, such as informetrics and webometrics (search engine studies and link analysis), information retrieval and dynamics, internet research, information behaviour and usability, citation analysis (especially web citation search engines, such as Google Scholar), and altmetrics (Thelwall 2017).

In recognition of her career, Judit was honoured, among other awards, with the Derek de Solla Price Memorial Medal in 2017,Footnote 1 awarded by the International Society for Scientometrics and Informetrics (ISSI),Footnote 2 and with the Research in Information Science Award in 2018, awarded by the Association for Information Science and Technology (ASIST).Footnote 3

When I started figuring out the topic for this tribute, I was first tempted to perform a webometric analysis of Judit’s personal websiteFootnote 4 or to carry out a content analysis of the results retrieved by Google to the query “judit bar-ilan”, following Judit’s own footprints in the magnificent tributes and festschrifts she herself had previously paid to Paul Erdos (Bar-Ilan 1998b), Peter Ingwersen (Bar-Ilan 2010) or Eugene Garfield (Bar-Ilan 2018). Then I considered the possibility of performing a bibliometric analysis of Judit’s work through Google Scholar or even to go through with an Altmetrics study. All of them were areas in which Judit left her academic mark, and that could faithfully reflect the multidisciplinary impact of her work.

However, while consulting her extensive bibliography, one of her first works published in the journal Scientometrics (Bar-Ilan 1998a), entitled: «On the overlap, the precision and estimated recall of search engines. A case study of the query ‘Erdos’», came to my hands. This publication exhibits a large number of quantitative measures applied to several search engines with the purpose of establishing performance evaluation parameters, from an ‘informetrics’ point of view. This work initiated one of the Judit’s main lines of research, and helped, along with the seminal works of Isidro Aguillo, Tomas Almind, Lennart Björneborn, Peter Ingwersen, Mike Thelwall and Liwen Vaughan, among others, to lay the foundations of the so-called Webometrics (i.e., informetrics analyses of the Web).

Search engines studies constitute a large research area, mainly mastered by computer sciences. Scopus indexes currently 22,152 documents (from 1992 to 2019), out of which 15,779 (71.2%) have been published in sources totally or partially classified in this area, while social sciences exhibits just 2815 contributions (12.7%). One of Judit’s main contributions was precisely to study search engines as carriers of information to users, either scholars or general citizens.

Following this line of though, the first objective of this work is to provide a descriptive and systematic literature review of Judit’s contributions dedicated to search engines studies. The second objective is to determine the degree of interdisciplinarity of this specific body of literature, analysing both the cited references (those contributions cited by Judit’s work) and the citing documents (those contributions citing Judit’s work).

Methods

The first step consisted on identifying the bibliographic corpus dedicated to search engine studies. To do this I accessed to the Judit Bar-Ilan’s public profile on Google Scholar Citations,Footnote 5 as of 25 December, 2019, which included 230 items.

The selection process was carried out in two consecutive iterations. The first iteration gathered 52 contributions, after reading the title and abstract of each of the 230 items. The second iteration reduced the corpus to a final set of 47 contributions (33 journal articles, 11 conference papers, and 3 book chapters), after a cursory reading of the full text of each pre-selected contribution (See “Appendix 1”).

The second step consisted on the realization of a systematic literature review. A detailed reading of the 47 contributions was made in order to extract some basic information, specifically the search engines under study, the research method used, the queries (if any) performed, the number of results analysed (sample size), the date of experiments, and, last but not least, the search engines’ parameters and variables studied.

The third step consisted on extracting the cited references from all these contributions. To do this, all cited references from Scopus (42 out of the 47 articles are indexed in this database) were automatically downloaded. The cited references for the remaining five contributions were directly extracted from the manuscripts’ full text.

The fourth step consisted on extracting the citing documents. The references of all works citing any of the 47 contributions were automatically downloaded from Scopus.

After this, a data cleansing step (fifth step) was carried out to fix and normalise both cited references and citing sources, due to the significant number of errors encountered.

Finally, the sixth step was dedicated to the interdisciplinarity. In this case, only journal articles were considered, for the sake of clarity.

Each bibliographic reference, either cited reference or citing document, was categorized according to the category assigned to the journal were the article had been published. In order to maintain consistency and coherence, the 27 major thematic categories provided by the Scopus Subject Areas and Subject Categories were utilised. When a journal was categorised under more than one major thematic category, a fractional counting (1/n) was used. Therefore, a weighted number of cited references and a weighted number of citations received were obtained.

This way, a score for each category and contribution was obtained, considering both the articles included in the set of cited references (influential articles for Judit) and the articles included in the set of citing references (articles influenced by Judit). These scores were all transformed into percentage values to minimize size effects.

All process was carried out last week of December, 2019.

Results

Systematic literature review

Judit initially cultivated this field in a relative lonely way. She was the unique author in 21 out of the 47 selected works. Later, Bluma Peritz (5), Maaya Zhitomirsky-Geffet (5) and specially Mark Levene (14) become her closest collaborators.

The 47 contributions that shape Judit’s oeuvre on search engine studies achieve 914 citations according to Scopus. This number climbs to 1923 in Google Scholar Citation profile.Footnote 6 The article entitled «Search engine results over time: A case study on search engine stability» (Bar-Ilan 2003), published in the unfortunately defunct journal Cybermetrics (88 citations according to Scopus; 199 according to Google Scholar), and the article «Data collection methods on the Web for infometric purposes—A review and analysis» (Bar-Ilan 2001), published in Scientometrics (89 citations received computed by Scopus; 180 by Google Scholar), stand out as Judit’s most cited contributions on the topic.

Taking apart descriptive and theoretical-oriented documents, 38 contributions out of the 47 provide empirical results on search engines. “Appendix 2” contains detailed information about search engines covered, parameters studied, methods employed, queries used, and sample sizes employed. Data collection dates, when available, have also been collected.

Most of Judit’s contributions start by acknowledging the Internet as an emerging information medium, where users were experiencing a ‘Web document explosion’ (Bar-Ilan 1998b). Consequently, Internet in general, and the Web in particular, might become as a potential information and bibliographical source for scientists (Bar-Ilan 2000). Within this ecosystem, search engines appeared to constitute an essential part of the Web (Bar-Ilan 2002). However, Judit’s experiments came to demonstrate that the quality and the reliability of most of the available search tools were not satisfactory (Bar-Ilan 2001).

If “Appendix 2” (Search engine column) is analysed, one can feel witness to the evolution of the search engine market. Driving through Judit’s work we can move from pioneer search engines like Altavista, Excite, Fast, Infoseek, Northern Light or Lycos to the current landscape dominated by Google, including the usage of local search engines (Walla, Morfix, Tapuz, Yandex, Rambler, Voila, Origo-Vizala, etc.) on the route.

At the end, 43 different search engines were tested, being Google (including different market versions) and Altavista the most widely employed (29 and 18 times respectively).

The review of this body of literature also allows locating beautiful pieces. The rise of Google was prophesied by Judit almost 20 years ago, when she pointed out that “for almost all purposes it will be enough to search Google to get good coverage of a topic on the Internet” (Bar-Ilan 2002). Otherwise, Judit proposed the creation of “vertical search engines and directories per disciplines with high quality control (Bar-Ilan 2001), prophesizing the launch of Google Scholar. The ideal of a search engine serving the scientific community accompanied Judit along different contributions (Bar-Ilan 2005a, b), where even a name ‘Webomet’, originally coined by Björneborn, was adopted.

All the parameters, variables and indicators used by Judit to characterize and evaluate search engines constitute another essential contribution to the field. Adopting postulates from the Information Retrieval (IR) field, Judit calculated several variables: estimated recall, technical relevance, technical precision, overlap, self-overlap, coverage, relative coverage, and evolution over time. Special attention was paid to the analysis of the stability and fluctuation over time, putting the URLs at the heart of the analyses (lost URLs, dropped URLs, forgotten and totally forgotten URLs, Recovered URLs, etc.).

Following in the wake of Judit’s works on search engine studies, we can see a movement from pure informetric methods to content analyses first, and user studies later. From quantitative analyses aimed at discovering the response of search engines as information retrieval systems to characterizing the results offered (content-centred studies) and the user responses (user-centred studies). Judit mixed quantitative and qualitative methods, and gradually she moved from technical precision to ordering results, providing empirical results to the emerging field of search engine optimization (SEO), with users’ studies and tailored experimental designs.

The evolution of Judit’s works on search engine studies can be observed in the co-occurrence map of keywords included in Fig. 1. ‘Search engines’ (29 occurrences), ‘World Wide Web’ (14) and ‘Information retrieval’ (12) stand out as the most used keywords.

Fig. 1
figure 1

Co-occurrence overlay map of keywords (1998–2019). Map generated with VOSviewer (https://www.vosviewer.com). Terms extracted from Scopus database. Total documents included: 42; total keywords included: 214

Interdisciplinarity

Interdisciplinarity remains as a controversial concept in Scientometrics, as nuanced differences between interdisciplinary, multidisciplinary, and cross-disciplinary emerge but remain hard to handle, especially when measured at the journal-level.

As the eminent Albert-Láazló Barabási has recently pointed out in a Twitter thread, whereas ‘multidisciplinary’ refers to separate disciplines coming together in the same journal, yet remaining distinct,Footnote 7 ‘interdisciplinarity’ refers to integration of disciplines in the same publication. Therefore, Interdisciplinary impact is the diversity of disciplines that a discovery influences, defined by the disciplines that cited the paper.Footnote 8 Cross-disciplinarity emerges when a disciplinary paper impacts other disciplines.Footnote 9

Following this terminology, the overall goal of this section is to analyse the interdisciplinary degree of Judit’s work on search engine studies.

From the 47 contributions, Judit provided a total of 1832 cited references, mainly to journal articles (48.5%). However, the great amount of references to online material (24.6%) really stands out. Judit was eclectic and heterodox in her citing profile. She frequently cited newspapers, search engines’ webpages with technical information and definitions, reports, dictionaries, encyclopaedias, working papers, discussion lists, conclusions from conference special interest groups, and above all, posts from specialized blogs. Search Engine Watch,Footnote 10 a reputed blog devoted to search engine optimization, is cited up to 72 times. Search engine studies are a very highly dynamic area, and the most updated and fresh content is generally found in these online sources.

Conference papers (19.2%) are intensely cited as well, both from computer sciences side (e.g., International World Wide Web Conference or International ACM SIGIR Conference) and social sciences side (e.g., ASIS Annual Meeting or International Conference of the International Society for Scientometrics and Informetrics).

Otherwise, the typology of citing sources is obviously more restricted. A total amount of 916 citations have been computed, mainly from journal articles (75.4%) and conference papers (18.4%). Figure 2 shows the distribution of document types according to both cited references and citing sources.

Fig. 2
figure 2

Document types: cited references (up) and citing documents (below)

An author-level analysis have been carried out to reveal those authors most cited by Judit’s work on search engines (authors who influenced Judit), and complementarily to this, those authors who most cited Judit (authors influenced by Judit). Table 1 includes the top 20 authors on both sides of the academic coin.

Table 1 Authors: cited references and citing sources

On the one hand we can observe that Judit was influenced to a great extent by authors from computer science, such as Amanda Spink, Bernard Jansen, Clyde Lee Giles, Steve Lawrence or Andrei Broder. It is worth to mention the appearance of David Sullivan (blogger at Search Engine Watch blog) as the second most cited author, as well as the presence of Google as institutional author.

On the other hand, we appreciate a strong influence of Judit’s work on authors who, regardless their educational background, have published mainly in the social sciences in general, and webometrics in particular, such as Liwen Vaughan, Isidro Aguillo, Kaivan Kousha or Jose Luis Ortega. In addition we find other important authors with a high technical background like Dirk Lewandowski and Han Woo Park. Finally, Mike Thelwall exhibits a great influence both on the citations received by and provided to Judit.

As regards the publication sources, we can observe a similar pattern (Table 2). Taking apart the presence of specialized blogs and conference proceedings, cited references include interdisciplinary journals with a great weight on technical aspects and pure computer sciences journals (e.g., Computer Networks, Lecture Notes in Computer Science). On the other side, the citing documents exhibit a greater presence of journals from library and information sciences. In any case, JASIST, an interdisciplinary journal, appears as the most important source for Judit’s works on search engine studies.

Table 2 Journals: cited references and citing sources

If we move towards the thematic categories (only journal articles considered), cited references (n = 888 references) are covered both by computer sciences (36.5% of all weighted references) and social sciences (36.3%), followed by decision sciences (11%).

Citing documents (n = 688 citations) are concentrated in social sciences (44.2% of all weighted citations received), followed by computer sciences (33.7%) and decision sciences (7.5%). That is, same fields with different percentages (Fig. 3). Within social sciences, impact comes mainly from library and information science (478 out of the 568 citations from journals totally or partially categorized under social sciences belong to this subcategory).

Fig. 3
figure 3

Thematic categories (bibliographic corpus on search engine studies): cited references and citing sources

Leaving behind the overall behaviour, the performance of particular contributions exhibits interesting information about interdisciplinarity. Figure 4 includes the cited-references/citing-sources balance for two selected contributions (labelled P002 and P025 in “Appendix 1”).

Fig. 4
figure 4

Thematic categories (specific contributions): cited references and citing sources. (up) P002: search engine results over time: a case study on search engine stability (cybermetrics). (below) P025: methods for comparing rankings of search engine results (computer networks)

P002: This article, originally published in the journal Cybermetrics (Bar-Ilan 2003), was conceived mainly with references from computer science journals (48.1%), but it attracted citations mainly from articles published in social sciences (53.1%).

P025: This article, originally published in the journal Computer Networks (Bar-Ilan, Mat-Hassan and Levene 2006), was conceived with references both from social sciences (28.25%) and computer science journals (25.5%), but it attracted citations mainly from articles published in computer sciences (38.9%), ‘other disciplines’ (20.4%), especially Business, Management and Accounting, and Medicine, and to a lesser extent, social sciences (17. %).

To finalize the analysis, we have obtained a two-dimensional coordinates based on the interdisciplinarity of each contribution. To do this, we need to establish a thematic category which will act as a baseline. In this case, the selected category was ‘social sciences’.

For each document, the percentage of cited references outside the social sciences (cited dimension), and the percentage of citing documents outside the social sciences (citing dimensions) were estimated. Then we could plot the coordinates for each of the contributions (Fig. 5).

Fig. 5
figure 5

Interdisciplinarity quadrant

As we can observe, the majority of contributions are located in quadrant 4 (high cited-references interdisciplinarity, high citing-documents interdisciplinarity), with the exception of document P020 (a journal article written in German with only 3 journal articles cited, and 4 citations received), and P018 (a conference paper, which receives just 1 citation from a journal categorized under Social Sciences).

Discussion and conclusions

This work reports on the contributions of Judit Bar-Ilan to the search engines studies. To do this, two complementary approaches have been carried out. First, a systematic literature review of 47 publications authored or co-authored by Judit and devoted to this topic. And second, an interdisciplinarity analysis based on the cited references (publications cited by Judit) and citing documents (publications that cite Judit’s documents).

The systematic literature review unravels the breadth and depth of Judit’s work on search engines, the immense amount of search engines studied and indicators measured. In addition to this, an evolution over the years is detected towards empirical user studies and search engine results rank, with a mixture of quantitative and qualitative methods.

The interdisciplinary analysis shows Judit as a scientist who not only researched the Web but also used it to nurture her publications with numerous mentions of online resources with useful, necessary, updated and rigorous information. That is to say, Judit talked the talk and walked the walk. Otherwise, the results evidence that Judit fed academically on computer sciences, being able to cross the ocean to social sciences, achieving a significant impact especially, but not exclusively, on library and information science.

Throughout this work, we can find some limitations. First, article categorization was performed at the journal-level, which introduces unsurmountable methodological problems. However, recent article-level categorizations still do not solve the problems. Yet, some journal classification inconsistencies were foundFootnote 11 and manually treated. Expanding the analysis by taking the specific subject categories into account is also advisable. Second, only citing sources indexed in Scopus were considered. Including a wide spectrum of citations received (mainly from Google Scholar) might help to obtain a wider citation scenario. Third, only journal articles were considered in the interdisciplinarity analysis. The inclusion of other document types (mainly book chapters and conference papers) might increase the weight of computer sciences, especially on the citing documents side.

At all events, this work evidences the richness, impact, and interdisciplinarity of Judit’s work, and her legacy to the field of search engines studies.