1 Introduction

In today's fast-paced technological landscape, organizations seeking a competitive edge must secure patents to safeguard their innovations. To devise effective research and development (R&D) strategies, it is essential to comprehend the ever-changing scientific and technological domain and to keep apprised of the most recent developments. Patent analytics, which includes the analysis of scientific publications and patents, has emerged as a valuable resource for businesses. Its applications include business strategy formulation, portfolio analysis, competitor evaluation, technology assessment, and cost analysis.

Patent documents has long been widely recognized by academics and practitioners as a reliable and rigorous source of information for proxy measures of innovation activity (Candelin-Palmqvist et al. 2012; Maragakis et al. 2023; Puccetti et al. 2021; Shen et al. 2020; Shibata et al. 2008). Including technical, market, and legal information, the patent literature is a significant body of work in science and technology (Yanhui and Lixin 2024) that has been assessed as novel and progressive (Adams and Tate 2009; Jang and Yoon 2021). In the continuously changing global market, patent is a valuable company intellectual property, for research and development (R&D) to increase competitiveness by providing technological trends and insight (Jo Kim et al. 2015; Jun et al. 2015; Kim and Bae 2017). Therefore, this research field is one of the critical issues in the forward-looking approaches that have become the focus of innovation management studies in recent years (Naeini et al. 2022). In this study, "patent analytics" is defined as the same as "patent landscaping," which is the act of looking up patents about a particular topic to gauge innovation and evaluate its challenges (Abood and Feltenberger 2018).

Innovation is greatly aided by patent analytics, which provides insightful information extracted from the massive amount of data found in patents. Organizations can obtain a thorough grasp of the state of the art in a particular technology field by examining patent data. This entails tracking patterns in the filing of patents, identifying cutting-edge technologies, and spotting possible holes that could use innovation. Businesses can use patent analytics to inform strategic choices about their R&D (research and development) expenditures. They can learn about the technological orientations of their competitors and pinpoint areas where they can concentrate their R&D efforts to obtain a competitive advantage by looking through competitor patents. Companies can make sure their inventions respect other people's intellectual property rights by examining current patents. This lessens the possibility of pricey patent disputes and legal obstacles. Potential partners or collaborators working on complementary technology can be found through patent analysis. Furthermore, to facilitate access to new markets and technologies, it can identify patents that may be purchased or licensed. Figure 1 depicts an example of an innovation funnel (Ernst et al. 2015). Observed that patent analytics plays an important role in providing insights for all phases in the innovation funnel which include market know-how, technology know-how, technology selection, market acceptance, and strategy for technology concept.

Fig. 1
figure 1

Example of innovation funnel (Ernst et al. 2015)

There were about 3.4 million published patent applications in 2021, this number is expected to increase over the next 15 years (Adel and Harrison 2024). Furthermore, patent documents encompass a variety of features ranging from bibliographic information (e.g. patent registration office, assignee, inventor, dates, IPC, and citations), textual components (e.g. title, abstract, description, and claims), and image (i.e. drawings) (Trippe 2015). The territorial protection of inventions results in patent families, which consider the geographic jurisdictions of the patent office (Fernandez 2022). Due to differences in patenting authority policies, the three most important dates regarding a patent are the priority date (i.e. filing date at the first patent office), the application date (i.e. filling date), and the publication date (i.e. generally 18 months after the filing date) (Trippe 2015). The classic "innovation funnel" theory has outlined that the creation of successful industrial innovations needs to go through a series of pathways from global to micro levels which transform ideas into inventions that can be commercialized (Dunphy et al. 1996). Hence, these significant increases in volume, various features availability, and volatility of legal context from patent serve wealth references for innovation stakeholders, offering insight for each funnel level spanning from market landscape to technology detail. Nevertheless, in line with the 7 V’sFootnote 1 of big data definition (Khan et al. 2014), this patent's abundance is a challenge for innovation researchers and managers to view patent analytics no longer just as individual document analysis but as big data analytics (Kim et al. 2021; Park et al. 2018). Therefore, the contemporary patent analytics question is how big data patents can be transformed into business value for companies (Chiu 2018; Jun 2021; Zhang et al. 2019).

With the digitization of patent data, the largest technical information resource in the world is now available at an affordable price (Aristodemou and Tietze 2018). This is possible because processing ad-hoc data analysis queries over large volumes of patent databases quickly and cheaply can be achieved through cloud services (Antunes et al. 2018). The consolidation of AI, big data, and cloud computing can push the limits of prior expert-dependent patent analysis (Alderucci and Sicker 2019; Lee et al. 2016a, b; Park et al. 2018), by escalating and accelerating strategic technology and innovation decision-making by the innovation manager to be defined objectively from massive patent data (Jun 2021; Lee et al. 2018). Nevertheless, there is an under-adoption risk or even loss of these technological opportunities due to management ignorance influenced by the dependency on traditional best practices (Wang et al. 2016), the uncertainty of the technology’s use cases (Ameye et al. 2023) and unhandled information overload (Jackson and Farzaneh 2012; Raguseo 2018). As a result, it is unclear how to divide the white-collar and blue-collar responsibilities and tasks of big data analysis of patent databases for enterprise technology innovation, and it becomes increasingly blurred as the organization scales up, which involves cross-departmental from R&D until Information Technology (IT) teams, to decide strategic innovation and technology infrastructure investment. Therefore, standardization of the patent analytics methodology framework is imperative to develop best practices (Grant et al. 2014).

Previous literature has reviewed a variety of patent techniques and analysis tools which fully or partially facilitated by the AI approach (Abbas et al. 2014; Aristodemou and Tietze 2018; Moehrle et al. 2010) and how to search patent based on the semantic approach and user (Bonino et al. 2010). However, apart from articles on trends of patent databases still understudied (Kim and Lee 2015; Walter et al. 2022a), there is still a lack of specific user targets and its use case in the innovation process (Aristodemou and Tietze 2018), leaving knowledge discrepancy of innovation-interest researchers and users to appropriately determine structured workflow required to conduct patent analytics effectively by optimizing AI, big data, and cloud technology. The era of Artificial Intelligence makes data play a main driver of the innovation management process (Tekic and Füller 2023). In particular, the creation of innovations by companies today highlights how capable they are in handling big data and business analytics (Yoshikuni et al. 2023). The utilization of big data encourages companies to explore hidden corporate values and exploit new opportunities (Raguseo 2018). Furthermore, cloud computing as complementary technology facilitated big data by proceeding online repository system which can reduce costs, improve system performance agility and flexibility, and enable data exchange across corporate boundaries (Bonner et al. 2017; Liu and Xu 2017). Since AI, big data, and cloud computing provide significant cost advantages for companies in data repository and information processing, the current way of innovation needs to be examined (Füller et al. 2022; Haefner et al. 2021; Y. Liu and Mai 2024).

Addressing this requirement, using the Systematic Literature Review (SLR) methodology, this review aims to identify and analyse frameworks, techniques or algorithms, databases, tools, insight of strategic technology innovation, and big data as well as cloud computing application that used in AI-driven patent analytics research from scientific papers between 2017 and 2022. In contrast to previous studies that classify user-oriented patent analysis tasks (Bonino et al. 2010) and Patinformatics as ARIS-business process model (Moehrle et al. 2010), in this study, the de facto industry standard for data mining that more suitable for big data era, the CRISP-DM process model (Ncr and Clinton, 1999; Schröer et al. 2021), was adapted to investigate the extent of the standardization deficit for patent analytics frameworks. Given the time frame of this study, our systematic review builds on and expands on previous efforts (Aristodemou and Tietze 2018) that also filled gap with empirical evidence of the patent database usage and the practical know-how of patent analytics when a rapid evolution of AI than its prediction in 2017 (Brynjolfsson and Mcafee 2017; Panetta 2018).

Furthermore, this study contributes to the recent progress of patent analytics that has been supported by AI, big data, and cloud technologies in the domain of enterprise technology innovation, especially R&D strategy, and technology analysis process. Set the context from this holistic point of view minimized overlap between knowledge management and technology management of previous patent analysis studies (Aristodemou and Tietze 2018) which provided longitudinal evidence of patent analytics influence on innovation management practices. This study offers managerial implications by compiling use cases that can be used as a guideline for innovation practitioners to exploit and expedite public or commercial patent databases to be valuable technological knowledge to support the innovation management process and offers a coherent future research agenda for researchers to develop more advanced AI, big data, and cloud computing-based patent analytics methodology (Alderucci and Sicker 2019).

This paper is organized as follows. We described the SLR methodology for our research used in Sect. 2. The result and discussion from the selected paper are presented in Sect. 3. We outline our research limitations in Sect. 4. Then, conclusion of this study in Sect. 5.

2 Methodology

This study followed an established process to generate a systematic sample of scientific papers (Kitchenham et al. 2009). The SLR methodology allows to answer a specific research question through identifying all empirical evidence based on predetermined inclusion criteria (Snyder 2019). Bias in findings can be minimized because the selection of studies according to strict and transparent procedures (Snyder 2019). The main objective of the SLR is to provide a comprehensive understanding to innovation-interest researchers and users to appropriately determine the structured workflow required to conduct patent analytics effectively by optimizing AI, big data, and/or, cloud technology. Based on this purpose, clear research questions are needed to define the scope and contribution to academia or practice (Hällgren 2012). It is effective to inquire the precise and expressive questions to highlight effective future research work direction. Table 1 shows the research questions that motivated this literature review.

Table 1 Research questions on literature review

To find primary studies, the following steps must be taken: selecting digital libraries, defining the search string, conducting a pilot search, refining search strings, and retrieving the initial list of primary studies. Sample studies were obtained from scientific publications that had been published in the Web of Science (WOS) database. This database was chosen because it is the oldest, most widely used and authoritative database of research publications and citations in the world (Birkle et al. 2020). A search string was created based on a combination of alternative spellings or synonyms of the term "patent analytics". The following keywords were used in the SLR search:

2.1 “Patent analytics” OR “Patent analysis”

The search for scientific publications was conducted in April 2022. The search was limited to the abstract, title, and keywords of the papers since a large number of relevant papers were published in recent years. From the initial search results, the filtering feature on the WoS website was used to efficiently identify papers that were relevant to research interests. As an update to the study from (Aristodemou and Tietze 2018) and examine the impact of developments in artificial intelligence that have been heralded since 2017 (Brynjolfsson and Mcafee 2017; Panetta 2018) specifically on patent analytics studies, we set the publication year 2017–2022. We set journal articles, review articles, proceeding papers, and book chapters as sample inclusion criteria to capture theoretical and practical works of literature. To avoid misinterpretation, papers that do not use English are excluded from the samples. Figure 2 outlines the selection process of papers for the systematic literature review using the PRISMA method (Moher et al. 2009).

Fig. 2
figure 2

PRISMA flow diagram for SLR

After supported filtering, paper selection was carried out based on the study content. One of the authors analyzed this list and excluded articles that were only partially related to our research and/or outside the scope of this special issue, which focuses on patent analysis for strategic technology and innovation leveraging AI technologies, big data, and/or cloud computing. The selection process is divided into two stages. The first stage is screening through the title and abstract. By perform quality checking, the second stage is an in-depth study by reading the full text to ensure eligibility. Papers that pass these two stages become the final samples for the systematic review and further analysis. Then, we meticulously extracted data from the selected papers, organizing it into distinct categories detailed in Table 2. The organization of the paper by the authors in this extraction format allows for concept-centric synthesis (Kraus et al. 2020). This extracted data was subsequently catalogued and analyzed using Microsoft Excel and Nvivo.

Table 2 Extraction form

3 SLR analysis and discussion

This section presents the analysis results of the SLR based on the six research questions outlined in the previous section. The analysis was done on the 169 selected papers using MS-Excel and NVivo. Figure 3 shows the number of papers according to year and it depicts the growing trend on patent analytics since 2017. Note that some of the papers published in 2017 were included, but it was registered online in 2018.

Fig. 3
figure 3

Patent analytics publication trends

3.1 RQ1-data science framework and process used for patent analytics

The cross industry standard process for data mining (CRISP-DM) is a process model that serves as the base for a data science process. As shown in Fig. 4, CRISP-DM was published in 1999 to standardize data mining processes across industries, since then has become the most common methodology for data mining, analytics, and data science projects. Generally, it has six sequential phases as follow:

  1. i.

    Business understanding: understanding the needs and commercial goals for the data mining project are the main goals of this phase. Determining the target audience, outlining the issue statement, and establishing the project's success criteria are all included in this.

  2. ii.

    Data understanding: understanding the data that will be used in the project is the main goal of this phase. This entails examining the data, spotting problems with its quality, and recording its features.

  3. iii.

    Data preparation: data preparation for modelling is the main goal of this phase. Cleaning, converting, and choosing the pertinent features for modelling are all included in this process.

  4. iv.

    Modelling: data mining model construction and training are the main goals of this phase. This encompasses the process of choosing suitable modelling methodologies, developing the models, and assessing their efficacy.

  5. v.

    Evaluation: this stage is devoted to assessing the models' performance and choosing the most suitable model for implementation. In order to do this, the models' memory, accuracy, and precision are evaluated and contrasted with the business goals.

  6. vi.

    Deployment: putting the chosen model into production is the main goal of this phase. This entails monitoring the model's performance, updating it as necessary, and integrating it into an already-existing system.

Fig. 4
figure 4

Cross industry standard process for data mining (IBM 2021)

We extracted related patent analytics processes using the CRISP-DM as a guideline, and the analysis results are shown in Fig. 5. We also used terms that are similar to those used in CRISP-DM, as some of the terms are sub-tasks in CRISP-DM. For example, data collection is a component of data understanding. Our research reveals a startling omission in the landscape of patent analytics: the absence of emphasis on “Business Understanding” and “Data Understanding”. Despite the fact that these terms are essential to CRISP-DM, their absence from the discourse of selected papers is striking. In the context of patent analytics, understanding specific business or strategic goals is not merely advantageous; it is essential. Without a clear understanding of the organization’s objectives and success criteria, any analytics project runs the risk of being ineffective.

Fig. 5
figure 5

RQ1 patent analytics process

In addition, “Data Understanding” is similarly underrepresented. Understanding the nature, quality, and limitations of the data at hand is required for meaningful analysis in the era of big data. Ignoring this phase can result in unbalanced outcomes and lost opportunities. The underemphasis on preparatory phases such as data collection, preprocessing, and cleansing is also noteworthy. These are the unsung champions of data analytics, the processes that transform raw data into a usable format. In patent analytics, where data can be especially unstructured and chaotic, ignoring these preparatory phases can have disastrous effects on the caliber of insights obtained. What are the implications of these observations for the real world? First, practitioners of patent analytics must recognize that CRISP-DM is not a rigid doctrine. It should be modified to accommodate the specific requirements of patent-related initiatives, with a focus on “Business Understanding” and “Data Understanding”. In addition, there should be a renewed emphasis on the preparatory phases, recognizing their importance in determining the success of a project. In conclusion, this transition from description to critical analysis highlights the significance of combining well-established frameworks, such as CRISP-DM, with the particular demands of patent analytics. By doing so, we can unlock the full potential of data science to fuel innovation and strategic decision-making in the domain of intellectual property. Patent analytics can evolve into a more effective and strategic tool for organizations and researchers equally by evaluating the approaches critically and considering their practical implications.

3.2 RQ2-algorithms and techniques used for patent analytics

Patent-based measurements are quantitative indicators obtained by studying patent data. They are useful tools to evaluate many elements of innovation and technology. Measures include technological novelty, which seeks to capture a patent's newness and inventiveness by assessing characteristics such as citations, technical keywords, and claim structure. Technological impact is a measure that analyzes the importance and relevance of a patent by examining forward citations, which are citations obtained from other patents. Innovation activity is the measure that tracks the overall innovation output and trends within a certain area or organization by tracking patents, patent applications, or inventor activity. For these measures to be calculated and analyzed, many techniques from patent analytics could be used. Table 3 shows example of extracted data from patent files, calculated measures and used techniques.

Table 3 Common patent-based measurements

The realm of patent analytics is far from monolithic, characterized by a diverse toolbox of techniques and algorithms aimed at extracting meaningful insights from patent data. While K-Means Clustering and Latent Dirichlet Allocation (LDA) dominate the field, a comprehensive view necessitates a deeper understanding of the multifaceted approaches employed. We observed that most papers mentioned the use of data mining, machine learning and deep learning. Nevertheless, we know that text mining is an essential process of exploring and analyzing large amounts of unstructured text data using software that can identify concepts, patterns, topics, keywords, and other attributes in the data. Text mining is the basis of patent analytics, enabling us to deal with enormous quantities of unstructured textual data. An interpretive lens reveals the crucial function of natural language processing (NLP), a powerful tool that simulates human text comprehension. NLP combines computational linguistics with statistical, machine learning, and deep learning models to enable the extraction of nuanced insights. The incorporation of NLP into patent analytics not only improves the efficiency of data processing, but also reveals latent patterns and connections that would otherwise remain hidden using conventional methods. This in-depth examination of textual data increases the potential for innovation by revealing concealed opportunities and threats.

In structuring patent data, clustering techniques such as Self-Organizing Map (SOM) and the Louvain algorithm play a crucial role. These algorithms provide a nuanced view of technological trends and competitive landscapes by categorizing patents according to their inherent similarities. It is crucial to comprehend the strengths and limitations of clustering algorithms. Effective application of the SOM or Louvain algorithm can enlighten strategic decisions, such as portfolio management or the identification of prospective partners. However, improper application can result in erroneous conclusions, highlighting the significance of algorithm selection and parameter tuning.

Association Rule Mining (ARM) arises as a potent technique for patent analytics, revealing complex relationships between patents. This technique surpasses basic clustering by disclosing hidden patterns of co-occurrence and interdependence among patent filings. The strategic value of ARM's ability to unearth concealed associations is significant. It can aid patent examiners in more effectively identifying prior art and help companies identify potential infringement risks. To distinguish between false associations and meaningful patterns, however, a nuanced approach is required. Word Embedding and Multi-Layer Perceptron (MLP) are gaining traction as the field of patent analytics advances. Word Embedding facilitates a deeper comprehension of the semantic relationships between terms, whereas MLP can model complex, nonlinear interactions in patent data. Exploration of these emerging techniques can provide a competitive advantage. Word Embedding can enhance patent search and classification, thereby improving the accuracy of patents that are retrieved. MLP, on the other hand, enables more sophisticated patent valuation and forecasting predictive models. The remaining analytics techniques and algorithms used in patent analytics are as listed in Table 4.

Table 4 RQ2 patent analytics techniques

A critical examination of patent analytics techniques and algorithms exposes a dynamic landscape with profound practical implications. Integration of NLP, comprehension of clustering nuances, exploitation of ARM's potential, and exploration of emerging methods are crucial next stages. Patent analytics can become a transformative instrument for innovation, intellectual property strategy, and informed decision-making in an ever-changing technological landscape by navigating this complex terrain.

3.3 RQ3-databases used in patent analytics

Patent analytics is based on databases, which provide access to an extensive repository of intellectual property data. However, a simple count of their popularity is insufficient for elucidating their significant impact. We delve deeper to comprehend not only which databases are frequently utilized but also why they matter in the ecosystem of patent analytics. Figure 6 depicts the most used database in patent analytics. Based on our SLR analysis, USTPO was ranked highest, followed by DII/DWPI, EPO and WIPO. We only included databases which had at least 3 frequencies to be displayed in Fig. 6. Other databases used in patent analytics are CNIPA, ESPACENET, INNOGRAPHY, OECD, SIPO, Acclaim, CCMT Database, CITES, CrunchBase, CureVac, DPMA, EC, FIPS, FoundationIP, GPI, Inteum, IPD, IPTECH, Ipzen, IWT, Nanosats Database, NewSpace Hub, PatBase, Patentinspiration, Patseer, Questel-Orbit, SciVal, Symphony Innovate, ThomsonIP Manager, WO-PCT and PCT.

Fig. 6
figure 6

RQ3 most used databases in patent analytics

The United States Patent and Trademark Office (USPTO) is the national patent office and trademark registration authority for the United States, and it is part of the Department of Commerce. According to the most recent report issued by the USPTO in May 2021, a total of 388,900 patents were granted in 2020-underscores its significance. It was also reported that over 7.8 million US patents have been issued since the first was granted on July 31, 1790, including over 7.2 million utility patents ("patents for inventions"). To make use of this vast body of technology easier, these documents have been "classified" (categorized) into roughly 470 broad technological categories (called classes) and approximately 159,000 specific technological categories (called subclasses) that comprise the United States Patent Classification System (https://www.uspto.gov/). The USPTO's role in patent analytics extends beyond borders. Its extensive collection empowers researchers worldwide to access, analyze, and draw insights from American innovation. For businesses, it serves as a valuable resource for competitive intelligence and innovation strategy.

Derwent Innovation, specifically the Derwent World Patents Index (DWPI), offers more than just a repository of patents. Derwent Innovation has two main features: the Derwent World Patents Index (DWPI) and the smart search capability. After specialists have studied the whole official patent disclosure materials, DWPI perform the translation, rewrite the important abstracts, content debugging, and normalization of patent holders. Novelty, use, benefit, technical focus, full description, drawing description, activity, and mechanism are among the DWPI revised components (Walter et al. 2022b). It is also claimed that DWPI improved keyword search results by 79% when compared to patent searches performed on other patent search platforms without DWPI (https://clarivate.com/derwent/). This could be one of the reasons why DWPI is one of the top five databases used in patent analytics. The data refinement provided by DWPI transforms patent analytics into a precise instrument. Researchers can effectively navigate the complexities of patent jargon, while businesses gain a competitive advantage in discovering pertinent prior art and identifying emerging trends.

The European Patent Office (EPO) examines European patent applications, allowing inventors, researchers, and businesses from all over the world to obtain protection for their inventions in up to 44 countries via a centralized and uniform procedure requiring only one application. According to a report issued in May 2022, the EPO has published over four million patent applications (https://epo.org). With over four million patent applications published, it promotes a unified procedure, thereby facilitating the acquisition of European patents. EPO provides a streamlined entry point to European markets for businesses pursuing international expansion. It simplifies the complex process of obtaining patent protection in multiple jurisdictions, thereby facilitating effective portfolio management.

The World Intellectual Property Organization (WIPO) boasts a unique global reach, serving 193 member nations. It is dedicated to intellectual property (IP) services, policy, information, and collaboration. The PATENTSCOPE database of WIPO provides access to international Patent Cooperation Treaty (PCT) applications in full text format on the day of publication, as well as to patent documents of participating national and regional patent offices. There are more than 104 million patent documents, including international patent applications submitted under the PCT (https://www.wipo.int/patentscope/en/). The WIPO database exemplifies global cooperation in patent analytics. It enables researchers and businesses to navigate a maze of international patent data, fostering innovation and facilitating global decision-making.

3.4 RQ4-tools used for patent analytics

This section guides us in understanding not just the tools frequently employed but also how they shape patent analysis, innovation, and intellectual property strategies. Tools are the core component of patent analytics, bridging the chasm between data deluge and actionable insights. However, an in-depth comprehension of the tools necessitates a critical evaluation of their impact on patent analysis and its practical implications, in addition to a listing of their popularity. Figure 7 shows the most used tools in patent analytics. Our study reveals that although commercial software dominates (e.g. PatentSight, VantagePoint, UCINET, etc.), The top three widely used tools in patent analytics studies are open-source software. Other tools which are not displayed in the graph of Fig. 7 are Bibliometrix, Chinese Knowledge Information Processing (CKIP), CiteNet, Core NLP, Cytoscape, Derwent Smart Search (SSTO), Doc2vec, EconSight, Essential Science Indicators, ggplot2, Hive, IPTECH software, Leximancer, Mathematica, NetMiner, NVivo, Open NLP, Patent iNSIGHT Pro, PatentsView, spaCy, Tableau, TextBlob, Vensim® PLE, Weka and WordStat.

Fig. 7
figure 7

RQ4 most used tools in patent analytics

Gephi is the industry standard for graph and network visualization and exploration software (Bastian et al., n.d.). Gephi was chosen as a tool for building, visualizing, and analyzing patent networks because it is open-source software with a user-friendly interface that allows for agile and accurate analyses based on Social Network Analysis (SNA) theory (de Paulo et al. 2018). Miao et al. (2022) used Gephi to build a multilayer network of Technology Relationship Technology (TRT) and then thoroughly analyzed the structural relationships between the dimensions to extract the composition, functions, means, space, time, and advantages of technical components. Gephi was chosen by Zhang et al. (2021) for its ability to calculate and visualize large and complex networks. The network analysis prowess of Gephi enhances the capacity to extract vital insights from patent data. Researchers and innovators can decipher complex relationships, identify emergent trends, and make decisions based on data to promote innovation and competitiveness.

R is a statistical computing and graphics language and environment. It is highly extensible and offers a wide range of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, and clustering) and graphical techniques. R is also available as Free Software under the Free Software Foundation's GNU General Public License in source code form, which can be compiled and run on a wide range of UNIX platforms and similar systems (including FreeBSD and Linux), Windows, and MacOS. Yu et al. (2020) performed preprocessing through R programming which include tokenizing and removing stop-words. These processes were conducted to increase the data reliability. Yalcin and Daim (2021) used R programming for cleaning, compiling, and analysis of data.

VOSviewer is a tool for creating and visualizing bibliometric networks. These networks can be built using citation, bibliographic coupling, co-citation, or co-authorship relationships, and can include journals, researchers, or individual publications. VOSviewer also includes text mining functionality for creating and visualizing co-occurrence networks of key terms extracted from scientific literature. Dehghani et al. (2021) used VOSviewer software to construct and visualize patents keyword networks. With aid of domain experts, LDA and the VOSviewer mining software, Feng et al. (2021) performed data preprocessing to extract 26 topics with few other keywords from 255 documents. The text mining functionality of VOSviewer equips patent analysts with a potent instrument for discovering hidden patterns in patent data. This results in more accurate technology landscapes, which aids in identifying strategic directions and innovation opportunities.

UCINET is a comprehensive package for the analysis of social network data. It can read and write a multitude of differently formatted text files, as well as Excel files. Shi and Zhang used UCINET for their data processing (Shi and Zhang 2018), while Yuan and Li (2021) used UCINET to partition countries by the CONCOR algorithm. On the other hand, UCINET can be used for visualization (Nordensvard et al. 2018). The ability of UCINET to read and write multiple data formats empowers researchers at multiple phases of the analysis procedure. UCINET enables a comprehensive perspective on patent networks, enabling the identification of key actors, communities, and structural insights. This comprehension guides strategic partnerships, analysis of the competitive landscape, and innovation strategies.

Another popular tool is VantagePoint (VP), a professional-grade desktop text mining application offering analysts a broad suite of powerful refining, analyzing, and reporting tools for scientific, technical, market and patent information. Alvarez-Meaza et al. (2020) utilized VP for data cleaning, while VP also can be used for visualization (Zhao 2018). The capabilities of VP refine patent information, transforming it into a dependable source of scientific, technical, market, and patent intelligence. The VP's role in data cleansing ensures that patent analytics are founded on a solid foundation, thereby reducing the number of inaccurate insights. Researchers and analysts can have confidence in the veracity of their findings, which facilitates more informed innovation and IP strategies.

In conclusion, patent analytics tools are catalysts for innovation and strategic decision-making rather than mere utilities. Each tool contributes a distinct set of capabilities that shape the insights and pathways derived from patent data. However, the selection of tools is a complex decision that is influenced by data type, analysis requirements, and desired outcomes. Recognizing the interplay between these tools and other criteria is essential for maximizing the potential of patent analytics in a technological landscape that is swiftly evolving. By critically scrutinizing their roles, we shed light on the path to more efficient and consequential patent analysis practices.

3.5 RQ5-visualization and insights in patent analytics

In the domain of patent analytics, innovators, businesses, and strategists seek insights as prized assets. However, the true essence of these insights goes beyond enumeration; it beckons us to discover their transformative effect on innovation, competition, and intellectual property strategies. Figure 8 depicts the major insights generated by the patent analytics. According to our analysis, descriptive insights serve as the cornerstone of patent analytics. Among these, insights on patents developed by country and assignee occupy the center of research papers and practical implementations. The prevalence of descriptive insights demonstrates their usefulness. Businesses can use this data to assess the global innovation landscape, identify potential collaborators or competitors, and make informed decisions regarding geographic expansion or partnership opportunities.

Fig. 8
figure 8

RQ5 Insights in patent analytics

In addition to descriptive insights, patent analytics has evolved to uncover deeper insights, such as technological trends. These insights delve deep into innovation, providing a nuanced comprehension of emerging technologies and their trajectories. Understanding technological trends equips organisations with the foresight necessary to align their strategies with emerging markets and technologies. This enables them to remain ahead of the curve, make wise investments, and seize opportunities in industries that are rapidly evolving. Patent analytics now includes analysis of the competitive landscape, a crucial aspect for businesses. This involves evaluating the patent portfolios of competitors, identifying vacant spaces for innovation, and estimating the likelihood of patent infringement. Businesses attempting to maintain a competitive advantage must conduct a competitive landscape analysis. It guides IP strategies, assisting businesses in protecting their innovations and averting legal pitfalls, thereby securing their market positions. The mapping of the innovation ecosystem is an expansion of patent analytics. This involves identifying key actors, innovators, and influential entities in specific technological domains, thereby shedding light on potential collaboration or acquisition opportunities. The innovation ecosystem can be mapped for strategic advantage. The data-driven insights derived from patent analytics can assist businesses in fostering collaborations, forming partnerships, and exploring acquisition opportunities.

In conclusion, patent analytics yield insights that transcend mere data elements. Businesses and strategists use them as a compass to navigate the complex terrain of innovation and intellectual property. A foundation of descriptive insights provides a panoramic view of patent landscapes. However, it is the deeper insights—technological trends, competitive analysis, and mapping innovation ecosystems—that provide the transformative power to shape innovation strategies, enhance competitiveness, and drive success. We reveal the true potential of patent analytics as a catalyst for informed decision-making, innovation, and intellectual property strategies in an ever-changing technological landscape by critically evaluating the nature and practical implications of these insights.

3.6 RQ6-big data and cloud used in patent analytics

Patent data is uniquely suited for big data tools and techniques, because of the high volume, high variety (including related information) and high velocity of changes. However, surprisingly, we discovered through the analysis of RQ6, we observed that, not many papers clearly mentioned the used of big data tools or cloud platform. As shown in Fig. 9, only 18% specifically mentioned the used of it. For businesses and researchers, the underrepresentation of big data and cloud computing in patent analytics is a missed opportunity. These technologies have the potential to revolutionize patent data analysis by facilitating real-time processing, scalability, and deeper insights. The limited mention emphasizes the need for increased awareness and implementation.

Fig. 9
figure 9

RQ6 Application of big data and cloud computing in patent analytics

The velocity of patent analytics is revolutionized by big data and cloud computation. They facilitate real-time data processing, ensuring that businesses and researchers have access to the most recent data, which is essential for making prompt decisions. Patent analytics are significantly affected by real-time insights. Businesses can monitor competitive landscapes, identify emergent trends, and respond rapidly to patent landscape shifts. This adaptability can be a game-changer in innovation and intellectual property strategy. The volume and diversity of patent data necessitate the inherent scalability of cloud computing platforms. This facilitates the exponential development of patent applications by enabling patent analysts to manage enormous datasets. Scalability ensures that patent analytics can keep up with the continually expanding volume of patent data. This is crucial for organizations seeking to maintain complete and accurate patent landscapes, which facilitates strategic decision making and innovation.

Cloud computing reduces costs by eliminating the need for substantial infrastructure investments. Researchers and businesses have on-demand access to powerful computational resources, paying only for what they consume. Patent analytics become accessible to a broader spectrum of organizations as a result of cost-effectiveness. Smaller businesses and entrepreneurs can utilize patent analytics without incurring significant up-front costs. In patent analytics, the potential of big data and cloud computing is mainly uncharted. Although these technologies promise real-time insights, scalability, and cost-effectiveness, their limited mention in research and practice demonstrates the need for greater awareness and adoption. By recognizing the transformative power of these tools, patent analytics can become a more agile, comprehensive, and accessible discipline, empowering businesses and researchers to navigate the complex intellectual property and innovation landscape of the twenty-first century.

4 Limitation

The systematic literature review (SLR) conducted between 2018 and 2022 sheds light on the transforming patent analytics landscape. However, as we interpret the findings, it is essential to recognize the limitations imposed by our research's scope and methodology. Our findings are derived from a subset of the available patent analytics literature due to our reliance on the Web of Science database contained within UTM's subscriptions. To encompass the entirety of this dynamic field, a broader investigation is required, as innovations may occur outside of this confined area.

The 169 selected papers in our SLR constitute a diverse tapestry of research on patent analytics. Nonetheless, while the quantity is substantial, the crucial issue is whether this body of work has explored the field's complexities and depths. The overwhelming number of chosen papers is evidence of the expanding interest in patent analytics. It also encourages researchers and practitioners to delve deeper, nurturing a culture of inquiry that seeks to unearth hidden insights and propel innovation in the field.

Our research scope, which spans 2018 to 2022, provides a temporal context for the evolution of patent analytics research. It provides a summary of the state of the art, encompassing the most recent developments, methodologies, and applications in the field. The context of time emphasizes the dynamic character of patent analytics. Businesses and researchers must acknowledge that the field is in a constant state of change, necessitating adaptability and a forward-thinking approach to maximize its potential.

While our SLR provides vital insights, it is inherently limited. The limited focus on a particular database and subscription journals may omit innovative work published in other repositories and outlets. This restriction necessitates a more expansive approach to patent analytics research. It is essential to investigate a variety of sources and platforms in order to comprehend the breadth of insights and methodologies influencing the field. Our SLR summarizes both the vitality and limitations of patent analytics research. By acknowledging the limits of our study, we open the door to broader horizons and unrealized possibilities. We chart a course for the field to evolve, adapt, and drive intellectual property strategies and innovation in the ever-changing landscape of the twenty-first century by embracing these limitations as innovation catalysts.

5 Conclusion and future work

This article significantly contributes to the development of patent analytics research. Through a systematic literature review (SLR) of 169 articles published between 2018 and 2022, we intend to redefine the direction of the discipline. Our investigation into the methodologies employed in patent analytics has yielded crucial insights, resulting in a deeper comprehension of its practice and practical implications. As we reach the conclusion of our SLR, we find ourselves at the intersection of description and interpretation, where the richness of patent analytics becomes apparent. Our primary objective was to discover the essence of patent analytics, and our journey has revealed both clarity and complexity. The lack of a standardized data science framework emphasizes the inherent diversity and adaptability of patent analytics. This realization compels us to consider a more adaptable strategy, one that can accommodate the specific requirements of innovation and intellectual property strategies.

While a standard framework for data science remains elusive, our analysis reveals the essence of patent analytics. The majority of papers presented their own process flow or framework, with a focus on the fundamental data science processes of data acquisition, data cleaning, data analysis, and data visualization. This decentralization suggests that patent analytics is a dynamic field that is continuously evolving to meet specific challenges and needs. Patent analytics are dynamic, as evidenced by the prevalence of customized frameworks. Businesses and researchers should embrace this adaptability by customizing their approaches to meet their objectives and effectively navigating the complexity of patent data.

Our SLR provides a panoramic view of the landscape of patent analytics, illuminating the most popular databases, analysis techniques, and tools. We summarized all our SLR findings with the alignment towards CRISP-DM framework as shown in Table 5 in Appendix. This comprehensive comprehension equips businesses and researchers with the skills necessary to effectively leverage the power of patent data. The insights into frequently utilized resources enable stakeholders to make informed decisions regarding database subscriptions, technology adoption, and tool selection. This choice can have a substantial effect on the quality and breadth of patent analytics. The scant mention of big data and cloud platforms in patent analytics is a startling discovery uncovered by our research. This absence raises concerns about unrealized potential and the need for a broader understanding of these transformative technologies. This distinction functions as a call to action. Businesses and researchers should investigate the potential of big data and cloud computing to improve patent analytics in terms of real-time insights, scalability, and cost-effectiveness.

In conclusion, this SLR bridges the gap between patent analytics descriptions and crucial insights. Our journey reveals a field that challenges a one-size-fits-all framework, emphasizing its adaptable and dynamic nature. From adopting customized approaches to leveraging the power of big data and the cloud, the implications are vast. As we move forward with these findings in hand, our vision transcends the confines of this investigation. We aim to devise a comprehensive framework and set of guidelines for patent analytics, not just for innovators but for all stakeholders. This framework will serve as a beacon, directing the field towards deeper insights, more informed strategies, and greater innovation in the constantly evolving intellectual property and technology landscape.