Keywords

1 Introduction

Artificial intelligence (AI) was first conceived in the 1950s, with the essays of Alan Turing on the use of computers to simulate intelligent behavior. The term itself was coined at the Dartmouth College conference in 1956, describing it as the science and engineering of making intelligent machines. Since then, AI has evolved dramatically over the last five decades, addressing multiple dimensions of informational and algorithmic issues. Current predictive models create opportunities for personalized medicine, being used for the diagnosis of diseases, the prediction of therapeutic responses, and potentially preventive medicine [1]. In addition, AI is considered the main driver for the 4th Industrial Revolution [2], which impacts our economies and societies [3].

Early on, healthcare was identified as one of the most promising applications of AI. The first AI systems were knowledge-based decision support systems that presented good performance but were never used routinely on actual patients for two main reasons: i) these were stand-alone systems, not connected to empirical data, such as EHR; ii) due to subjectivity of the expertise expressed in those rules, the systems were not accepted, being more useful for teaching than for clinical practice [4]. Some of the limitations were only overcome with the advent of machine learning in the 2000s. Algorithms that learn from data offered a more practical approach than previous expert systems, which crafted medical knowledge into decision rules.

In the earlier prediction algorithms, there was a need for a feature selection and engineering process to make the models useful. Deep learning provided some advance on this since it learns complex features from the raw data rather than leaning on manual feature engineering. Such AI algorithmic advancements raised the debate about the usefulness of data-driven instead of theory-driven models [17]. However, a famous counterexample of the pure data-driven AI application is the Google Flu Trends, an AI-based tool that used aggregated search data to estimate flu activity in certain regions. In practice, Google’s algorithm predicted more than double the proportion of doctor visits for influenza-like illness than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States [5]. As pointed in [6], incorporating AI into clinical practice remains a challenge because of methodological flaws and underlying biases present in the study design.

In this work, we sought to understand how AI technologies and methods have evolved along with healthcare, understand the paths taken so far and discuss possible trends for the area’s future. We do that with bibliometric analysis that helped us extract interesting patterns out of the scientific literature from this field of research. In particular, we aimed to answer the following research questions: RQ1) how were different periods organized around AI and how are different countries associated with key terms? This question is essential to understand waves of technological impact on the field; RQ2) how different topics have evolved over the years? This question is relevant to understand which bodies of knowledge were predominant at a specific time.

2 A Brief History of AI in Healthcare

As stated in the Introduction, AI was first introduced in the 1950s, and the earliest works in medicine have been reported almost two decades later. Some early developments of AI, such as the first industrial robot arm (Unimate in 1961), the first chatbot (Eliza, in 1964), and the first electronic person (Shakey, in 1966), were important milestones for AI but not directly applied to medical informatics.

The first generation of AI systems may be considered from the 1960 and 1970 decades, where the intention was to curate medical knowledge by experts and formulate robust decision rules [9]. Early AI in medicine researchers had discovered the applicability of AI to life sciences, especially in the Dendral experiments [8] in the late 1960s. This project gathered scientists from different areas in collaborative work that demonstrated the ability to represent and use expert knowledge in symbolic form. During the 1970s, there was a growing interest in biomedical applications, using the ARPANET and the SUMEX-AIM [12] infrastructure, promoting AI applications to biological and medical problems, and the collaboration and resource sharing within a national community of health research projects. Projects such as CASNET [10] and MYCIN [11] were developed. The late 1970s was known as the first “AI Winter”, which showed reduced funding and interest in the field due to the perceived limitations of AI. In 1986, a decision support system - DXplain [14] - used symptoms to generate differential diagnoses on approximately 500 diseases. In 1991, the field was still consolidating amid a second “AI winter” because of the high cost of developing and maintaining expert systems and databases [1]. In this time, the popularization of personal computers and high-performance workstations enabled new types of AI in medical research and new models for technology dissemination [18]. Technological developments in the late 2000s and early 2010s, such as the IBM Watson [19] and Apple’s Siri, have brought natural language processing and machine learning methods to analyze data over unstructured content to generate probable answers, being easier to build, maintain, and supported diverse applications [20, 21]. IBM Watson was an open-domain question-answering system that used natural language processing and various searches to analyze data over unstructured content to generate probable answers, in particular from patients’ electronic medical records. It showed success in some medical areas (e.g. [13]) and failures in others [26]. With the availability of larger health datasets, cloud computing, and improved computing power in these years, there was an important advancement in deep learning, with relevant applications in medical image analysis. Arterys [22] was the first deep learning cloud-based application approved by the US Food and Drugs Administration (FDA) in 2017. The application analyzed cardiac magnetic resonance images, and it was further expanded to liver, lung, chest, and musculoskeletal X-ray images.

3 Related Work

We list in this section some of the works that described the development of AI in medical informatics and healthcare, with different strategies and analytical tools.

In [9], the authors present a high-level overview of AI in medicine, dividing it into two significant successful periods: i) the early adoption, in the 60s–70s, with the development of expert systems and the codification of medical knowledge in explicit conditional rules and ii) the recent (from 2012 thus far) development of machine learning and deep learning techniques which showed a big improvement in image-based diagnosis, genome interpretation, biomarker discovery, patient monitoring, clinical outcome prediction, and robotic surgery.

In [24], the authors reviewed all the papers published in the AIME (Artificial Intelligence in MEdicine) conference from 1985 to 2013 and identified 30 research topics across 12 themes. The authors adopted a mixed-method approach, creating a taxonomy of themes using topic analysis and then counting the number of citations to identify the most impacting papers for the community. Knowledge engineering topics dominated in the first decade, and then machine learning and data mining prevailed after that. Both themes contributed to 51% of all papers produced in that period.

In [1], the authors presented a historical perspective and divided the adoption of AI technologies into three periods: i) 1950–1970, focusing on machines with the ability to make inferences that only a human could make; ii) 1970–2000, the ‘AI winter’, a period of reduced funding and interestFootnote 1, where some expert systems prototypes were successfully developed, and iii) the 1990s–2020s, where machine learning and deep learning gained momentum to provide personalized medicine, supported by infrastructural developments for the collection and storage of data and processing power.

The authors in [18] listed topics and themes compiled from the Artificial Intelligence in Medicine Europe (AIME) proceedings for 16 years (1991–2007). Topics included clinical data mining, knowledge discovery from databases, ontologies, text and image processing, feature selection, workflow, visualization.

Other systematic reviews were done for AI in medicine but limited to some niches, such as applications of deep learning in healthcare [28], surveillance in public health [29], consumer health [30], AI education for health professionals [25, 27], AI adoption in healthcare [31], and economic impact [32]. In our previous work [33] we mapped out all the production medical informatics, emphasizing the importance of AI for developing the area.

This work extends the literature by providing an exploratory and empirical vision of AI applications in medical informatics, analyzing scientific literature developed since the 1990s through bibliometric techniques, having more than 15 thousand papers collected.

4 Methodology

We used the standard bibliometric workflow as defined in [23], consisting of five phases: study design, data collection, data analysis, data visualization, and interpretation. The study design consists of the research questions posed in Sect. 1 and the search strategy delimited in this section.

The data was collected from the Web of Science (WoS) Core Collection database, used here as a proxy for the science production as a whole. First, we defined a search string limited to papers categorized as “medical informatics” in the database, limiting research to those related to this area. Next, we filtered for papers written English and published in journals, conferences, or reviews. Finally, we expanded the query string using terms related to AI, using similar terms from the MeSH (Medical Subjects HeadingFootnote 2) taxonomy and other terms extracted from conferences’ calls for papers. The final query string is made explicit in Table 1 and it was executed on April 2021. The results were exported from the WoS platform and imported into CorText (www.cortext.net) tool for further analysis.

Table 1. Query string used for the search.

The data analysis and visualization are carried out in Sect. 5. For RQ1, we performed a period detector over the authors’ keywords of the papers in the dataset. The algorithm works by creating bag-of-words with the frequency of the top 500 keywords for each year. It then calculates the degree of similarity between each vector of keyword frequencies and determines cutting points in sequential years, generating clusters of years with similar occurrences of keywords. With that, it is possible to visualize how topic shifts occur over time. We focused on keywords since they are considered the basic elements of representing knowledge concepts and have been used to reveal research domains [7]. The associations of the countries and the key terms were answered by applying contingency matrices. It encodes the correlation of the elements of two dimensions, showing the joint distribution of two fields - in this case, countries of the first authors and the most frequent keywords. The contingency matrix shows the degree of correlation between any pair of items drawn from each field. The chi-squared metric is used for the correlation measurement, and a p-value of 0.05 is used to detect spurious relations. With that, it is possible to determine which elements of each dimension presented more correlation than the expected value for that pair of items. Thus, we can determine which countries contributed more to which topics in the whole dataset.

To address RQ2, we explored the co-occurrence of keywords in two levels: dividing by the periods detected and an overall clustering of subjects. We applied a network mapping of keyword co-occurrences for each of the periods detected. It is a bibliometric approach to visualize a knowledge structure from a research field [15]. For a paper containing a pair of keywords, an edge was created between both keywords (the nodes), conditioned, in this case, to the period to which the edge occurred. We limited it to the top 100 keywords to make the visualization cleaner. This leads to the definition of clusters around these pairs of keywords. The keyword clusters are formed using the Louvain algorithm [16] for community detection in graphs. Finally, we analyzed how each cluster evolved in each of the periods. As a result, we can identify how different topics are often connected and their predominance in general over the periods detected in RQ1. As the last step of the methodology, the interpretation is performed in Sect. 6, discussing the results obtained.

5 Results

The search strategy resulted in 15566 papers from 1973 to 2021. However, papers until 1989 did not present keywords, and this number fell to 15484 papers from 1990 to 2021. This is a limitation of our study in offering a historical perspective before that date. Figure 1 shows how the number of papers published has evolved over the years due to our search strategy. We can see consistent growth over the years, in particular in the 2010s.

Fig. 1.
figure 1

The evolution on the number of papers published AI in medical informatics.

The period detector was executed with data from these years. Figure 2 illustrates the four periods detected: 1990–1996, 1997–2003, 2004–2015, and 2016–2021, as illustrated in the upper triangle of the matrix. The lower triangle illustrates the similarity between each pair of years.

Fig. 2.
figure 2

The four periods detected for our dataset.

The number of countries that produced works in this area also increased over the periods. In the first period, authors from 45 countries produced at least one paper; in the second period, 63; in the third, 95; the last period showed contributions from 115 countries. The five countries with the most contributions (USA, China, UK, Germany, and France) accounted for 53% of all authors.

Fig. 3.
figure 3

The contingency matrix between countries and top keywords. Correlations statistically significant are marked by ‘X’.

Figure 3 presents the contingency matrix correlating countries’ production and the most frequent keywords. The redder the cell, the more it deviates positively from its expected value; on the other hand, the bluer the cell, the more it deviates negatively from its expected value. We can notice the prevalence of some topics in the production of some countries. For instance, there is a strong correlation between the USA and natural language processing, France and ontology, and China and deep learning.

We created a keyword co-occurrence graph over the years. Figure 4 presents the networks that emerged for each period. In the first period (1990–1996), we can note the prevalence of expert systems and concepts such as knowledge acquisition, knowledge-based systems, and computer-aided diagnosis. The second period (1997–2003) includes knowledge representation, the Internet, decision support systems, fuzzy logic, and case-based reasoning. The third period (2004–2015) is marked by ontologies for interoperability, the Semantic Web, and natural language processing for information extraction from texts. Data mining techniques were also very used to discover new knowledge from databases. In the last period (2016–2021), deep learning and machine learning dominated the research in AI for medical informatics, together with mobile health, the Internet of Things (IoT), EHR, and the big data collected from these sources.

Besides, the following clusters were identified, considering all the periods:

  1. i)

    natural language processing, related to information storage, extraction, and retrieval, electronic health records, UMLS, named entity recognition, social media;

  2. ii)

    deep learning, related to computer vision, its types (neural networks, convolutional neural networks) and their application: image segmentation, feature extraction, transfer learning, visualization, lung cancer, Parkinson’s disease;

  3. iii)

    machine learning, related to data mining, feature selection, medical diagnosis, ECG, EEG, medical diagnosis, epilepsy, CDSS, and classification/ regression algorithms;

  4. iv)

    e-health/m-health: digital health, telemedicine, smartphone, IoT, physical activity, covid-19, mental health, dementia, Alzheimer’s disease, smart home;

  5. v)

    expert systems, related to knowledge representation, management, and acquisition, decision support systems, diagnosis, Bayesian networks;

  6. vi)

    knowledge representation, related to ontology, semantic Web, Internet, expert systems, data integration, decision making, diagnosis; and

  7. vii)

    security and privacy, related to authentication and smart cards.

Figure 5 presents how the identified clusters evolved over the four periods detected. On the one hand, we can see the decline in the interest in expert systems and knowledge representation in the last period. On the other hand, the rise of deep learning in computer vision. Machine learning has been established as the main sub-field since the early 2000s.

Fig. 4.
figure 4

The four clusters of keyword co-occurrences in each period detected in our dataset.

Fig. 5.
figure 5

The evolution in cluster distribution over the four periods.

6 Discussion

Our RQ1 showed four distinct periods since the 1990s, namely: 1990–1996, 1997–2003, 2004–2015, and 2016–2021. In the first period (1990–1996), research was dedicated to expert systems to support physicians in clinical decision-making. Second period (1997–2003), we may observe the expansion of the Internet and the use of Symbolic AI employing knowledge representation and case-based reasoning. The third period (2004–2015) is a transition where we may observe research interest in interoperability and ontologies and the emergence of machine learning techniques. The last period (2016–2021) consolidates the convolutional neural networks (machine learning and deep learning) associated with big data from EHR, IoT, mobile health, and social media.

In addition, we showed which countries contributed most to the development of which topics. We noticed three relevant cases: the large scientific production of natural language processing papers in the USA, mainly for the information extraction of electronic health records in the 2010s; the deep learning production in China for image analysis using convolutional neural networks and feature extraction, since 2017; and ontology papers by France institutions for clinical decision support systems in the early 2010s. These are aggregations by country, which means that individual institutions may differ from that distribution.

Our results for RQ2 identified seven clusters of related keywords, most of them directly related to sub-fields of artificial intelligence. To the best of our knowledge, there is no formal taxonomy of AI sub-fields and applications. Based on MeSH and the ACM Computing Classification SystemFootnote 3, we identified expert systems, computer vision, machine learning, knowledge representation, and natural language processing. However, some sub-fields in these taxonomies were not covered, such as robotics, planning and schedule, and speech processing. These results suggest which applications were more successfully applied in medical informatics than others. The clusters are similar to those found in [24], which detected other sub-fields such as distributed systems, uncertainty management, and bioinformatics; however, they focused on only one journal and considered looser criteria - at least five papers mentioned. Besides, two other clusters not directly related were found in our work: i) mobile health/e-health, mobile phones, social, public health; and ii) security, regarding privacy and other issues based on data storage and processing.

In our previous study [34], we performed a forecast study with specialists in medical diagnosis. We found that most of the current developments in AI are likely to be incorporated in the next ten years. Besides, two barriers to adopting AI have been identified: the difficulty of incorporating clinical practice and the regulation of AI technologies. We argue that these are challenges that must be researched and are likely to appear in the next decade. In another study [33], we identified digital health as a new trend in medical informatics, heavily based on new sensors (smartphones, wearables, IoT, social media) and AI technologies, suggesting a digital transformation for healthcare.

As pointed by [18], AI in healthcare cannot be set off from the rest of medical informatics nor the world of health planning and policy. Realistic expectations require that we draw upon AI as only one of the many methodological domains from which good and necessary ideas can be derived. It is the ultimate application in healthcare that must drive our work, oriented by policy and socio-cultural realities avoiding a new ‘AI Winter’.

7 Conclusion

This work presented a bibliometric analysis of AI development in the area of medical informatics. We sought to understand how the research field of AI has been supporting medical informatics over time. The results may be related to [1], where the authors divide the history of AI in healthcare into three periods. The bibliometric analysis of this work fits in the last period (1990–2020) but extends that work by showing empirical results in fine-grained detail.

We identified four AI periods in healthcare development associated with respective technologies and applications (1990–1996, 1997–2003, 2004–2015, and 2016–2021). We argue that technological innovations such as the popularization of the Internet in the 1990s, the use of smartphones in the 2000s, the development of cloud computing in the 2010s for storage and computing power, and algorithmic innovations like deep learning techniques enabled and drove these shifts. Besides, the shift of data sources for algorithm learning is noteworthy. In the beginning, the data came from medical experts who were interviewed or watched to obtain explicit algorithmic rules. As more data was collected and made available, for instance, with the Web and social media development and the digitization of hospital records (EHR, mainly), more robust algorithms that learn from data were applied, showing promising results.

This paper presents some limitations. First, a lack of qualitative analysis to present a detailed analysis of specific papers that may represent the periods or clusters - this would require more space in this paper and will be used for future work in a journal paper. Second, the search strategy consisted of papers under medical informatics for journals and conferences in WoS. It may have caused false negatives in the results, although it also overlaps with other research areas such as computer science, information science, and engineering. The keywords were used as the source of information for the analysis, but papers before 1990 did not present this information.

Future works could draw on the association between countries and topics to understand AI developments’ social and geopolitical aspects. Emergent topics, e.g., data quality issues such as missing and imbalanced data, and racial and gender biases issues, will be important for future research in healthcare. The regulation of AI will also play a key role for new applications and additional research must be done to avoid inequality and harmful issues.