Introduction

The rapid changes in science and technology shorten the process of technological innovation, which provides tremendous opportunities and competitive advantages for enterprises. The ability to excavate deeply and judge precisely future technological opportunities has become vital for enterprises (Ren & Zhao, 2021). How to detect and predict future technology trends given a better understanding of their emergence is also a key concern for both countries and enterprises.

As an important tool to track the development and evolution of technology, the technology evolution paths can improve the accuracy of technology opportunities identification. At present, the definition of technology evolution is not unified (Liu et al., 2022a, 2022b). Most researchers consider that the technology evolution refers mainly to the evolutionary changes in the specific technology over time. For the targeted technology area, the technology evolution generally encompasses all technological activities and the accumulation of innovations in the field from its inception to its current stage (Liu et al., 2020). Specifically, the technology evolution pathways not only reflect the nature of technological innovation but also effectively predict technology opportunities for problem-oriented (Huang et al., 2021). Therefore, clarifying the technology evolution paths is crucial to stimulate the technological innovation of enterprises through technology forecasting.

In the study of technological evolution path, discovery and linkage of technological topic information is a crucial problem. The diversified approaches have been proposed to identify technology evolution paths, including International Patent Classification (IPC) analysis, citation network analysis, and co-word analysis. Although the series of analysis methods for technology evolution paths are mature, there are still some limitations such as (1) The structured information such as patent codes and citation information are used as analysis data, ignoring the key technical terms contained in unstructured information, which makes it difficult to obtain fine-grained technology evolution results (Han et al., 2021; Lee et al., 2021). (2) The relationships between technical terms are usually focused on word frequency analysis rather than semantic analysis, which may not be sufficient to reveal the changes in the technology trends (Li et al., 2019).

To overcome these shortcomings, an integrated approach using SAO semantic analysis combined LDA topic modeling to process data in the targeted technological field is proposed. The SAO structure can effectively acquire the inner connections between independent words to obtain semantic linkages between technical terms. This approach makes up for the shortcomings of existing studies in the semantic analysis of technical terms, which provides complete connectivity information for the evolution paths.

As the two foremost forces jointly determining the development of technology innovation, science and technology contribute to the development of the technology innovation together. The relationship between science and technology is complex, and there is a one-way flow from basic scientific research to applied research and products (Xu et al., 2020). However, the non-linear relationship where technological progress drives scientific progress may also exist (Ba & Liang, 2021). Since the different purposes, statement, and quality of scientific papers and patents, there are certain commonalities and specialties between them (Xu et al., 2021). The analysis and comparison of scientific papers and patents can be tracked to discover technological opportunities and construct high-quality roadmaps (Shen et al., 2020). Currently, some studies adopt patents or scientific papers alone as the data source for technology opportunity analysis, whereas few studies combine these two sources for evolution analysis (Liu et al., 2022a, 2022b). To some extent, only analyzing scientific papers or patents is not enough to deeply understand the future trend of technology (Hajikhani & Suominen, 2022). Therefore, we explore and map the high-quality technology evolution paths based on scientific papers and patents. The final technology trends and opportunities are obtained based on the gap analysis and potential topic analysis between scientific papers and patents, which provides a more overall perspective into evolution research.

In summary, the key contributions of this paper can be summarized as follows:

  1. 1.

    In identifying technology evolution pathways, this paper proposes an integrated method combining LDA topic clustering model and SAO semantic analysis methods to mine specific technology terms and technology topics. Comparing with the traditional LDA topic clustering method, this method can clearly represent the technology topic information and its changes in different periods in the context of semantics, and can also effectively identify the “problems” and corresponding “solutions” in technical information. Finally, this paper takes the field of all-solid state lithium battery as an example to conduct an empirical study to verify the effectiveness of the proposed method. Meanwhile, it also points out the direction of future technological innovation in the field of all-solid state lithium battery.

  2. 2.

    This paper uses patent and scientific papers data sources and analyses the evolutionary pathways and clustering topics between different data sources. The gap analysis and potential topics analysis are also designed to predict the technology trends and technology opportunities in the future, which analyses the technology trend from the relevance and difference of multi-data sources.

The rest of this paper is organized as follows. The “Related work” section briefly reviews the related works. The “Methodology” section presents the overall research process and method introduction. The “Empirical study and results” section conducts a case study in the field of all-solid-state lithium-ion battery research. The “Discussion and Conclusions” section lays out our key findings and future works.

Related work

Technological opportunity analysis based on multiple data sources

Most of the previous studies on technological opportunity analysis revolve around patent data, which represents the contributions of technological development and innovation. While in the knowledge-intensive technologies, scientific research is the basis for technological development and is considered the origin of technology and innovation (Lee et al., 2015). As the main research method to disseminate and communicate the scientific knowledge, the scientific papers reflects the process of the scientific development and the results achieved La et al. (2020) It is thus clear that science and technology are inextricably linked. At present, there are also many scholars who conduct technology opportunity analysis from a science-driven perspective and apply the results of scientific research to technology development.

Previous scholars have conducted technological opportunity analysis by mixing two kinds of data or using two kinds of data separately. The related studies can be broadly classified into three categories.

The first type of study mainly focuses on the mixture use of scientific papers data sources and patents, which does not distinguish between them. For example, Viet and Kravets (2022) analyze the technological trends in asset performance management in the smart energy sector using appropriate data mining methods through extensive experiments on scientific papers and patent data. Balland and Boschma (2022) identifies regions at the frontiers of science and technology based on patents and scientific literature. Meanwhile, based on the identification results, an in-depth analysis of the question of whether scientific capabilities within a region can be translated into technological leadership is carried out. Shi et al. (2022) draw on scientific papers and patents related to onboard hydrogen storage systems (OHSS) and develop a novel methodology for investigating the past, present, and future development trends in OHSS. The second type of study mainly focuses on the identification of technological opportunities using dissertation data and patent data separately, and overall analysis of the combination the identification results. For example, Wang et al. (2015) applies text mining and a descending dimensional clustering algorithm to cluster papers and patents within the field of micro-algal biofuels. Finally the potential technological opportunities are explored in both theory and practice. Jiao et al. (2023) uses the RAKE algorithm to extract keywords from scientific papers and patents and calculates the cosine similarity between cluster technology domains. And finally the potential technology opportunities are identified by calculating the semantic similarity of the cluster technology domain vectors. Leon-Silva et al. (2020) Perform co-author network arrays, patent research and commercial tracking by analyzing silver nanoparticle-related scientific papers and patents using bibliometric methods. Finally, the technology transfer pathway of silver nanoparticles from academic part to industrial application is shown in the paper, which provides opportunities for research, development, application and academic collaboration in the field of nano-materials.

The third type of study mainly focuses on forming scientific and technological topics based on scientific papers and patents separately for comparative analysis. This type of study overcomes the shortcomings of the first two types of studies that do not take full advantage of the relationship between science and technology. For example, Li et al. (2023) proposes a new technology opportunity identification method that uses scientific papers and patents as data resources, and integrates SAO semantic mining and outlier detection method. The method can effectively and comprehensively identify technology opportunity from the two levels of technical problems and technical solutions. Feng et al. (2023) applies the LDA model to identify the topics between papers and patent data. All of topics are divided into several innovation dimensions according to their attributes and the TEMPEST model to identify specific technology opportunities of proton exchange membrane fuel cell. Takano and Kajikawa (2019) construct the scientific paper and patent citation networks separately and calculate the cosine similarity between the paper and patent clusters. Then, the emerging clusters with low similarity between them are selected for technology opportunity analysis. Hernandez-Quintanar and Rodriguez-Salvador (2019) apply the competitive technology intelligence methodology to analyze the scientific papers and patents in the 3D printing field. The main technology trends regarding materials and uses are determined according the analyzing results, which gives a new window of opportunity for exploring the use of 3D bio-printing in a new area.

To sum up, there is a close relationship between scientific research and technology development, and scientific research has a driving force for technology development. Comparing with the using a single scientific papers or patents, combining the scientific paper data and the technology patent data will facilitate a more comprehensive technology opportunity analysis. However, many studies have conducted joint analyses of papers and patents with separate analyses of the two data sources, despite the fact that they use both data and cover more scientific and technical information. This may result in a lower degree of correspondence between the clusters based on the two data sources and a coarser technical granularity in the final result, which is not conducive to mining more technological opportunities.

Technology evolution path analysis

Technology evolution refers to the evolutionary changes in a specific technology field over time, which can predict future trends in technology. For the specific technological domain, the technology evolution generally encompasses all technological activities and their accumulation of innovations from the initial to the current stage of the field (Liu et al., 2020). As the visualization of technology evolution results, technology evolution paths can not only map the nature of technological innovation but also effectively predict problem-oriented technology innovation opportunities (Coccia, 2019). Therefore, clarifying the technology evolution path is crucial to stimulate the technological innovation of enterprises.

Currently, numerous scholars have studied technology evolution paths using diversified methods, including patent classification codes-based, main path-based, and text mining analysis methods. Firstly, the research based on the International patent classification (IPC) analysis method mainly explores the technology evolution pathways through the technology classification among different IPC codes (van der Pol & Rameshkoumar, 2017). For instance, Fernandes et al. (2020) discussed the solid oxide fuel cell technology paths in the United States and Japan based on the analysis of relevant IPC codes. Secondly, the main path-based analysis method mainly tracks the knowledge trajectories and correlations in complex citation networks to find out the main technology evolution pathways (Lai et al., 2021). For instance, Kumar et al. (2020) focused on the mobile payment system innovation and identifies the main evolution process using the main path analysis. Kim et al. (2021) developed an improved knowledge persistence-based main path approach to identify technology evolution paths of block chain technology. Thirdly, the text mining methods mainly analyze unstructured data efficiently in large datasets to identify areas in the wider innovation research landscape (Chen & Ho, 2021). For instance, Li et al. (2020) designed a hierarchical visual analysis system and combined the advanced text mining technologies, which could help comprehend the evolution of one discipline. Miao et al. (2020) used a novel approach based on text-mining to identify technology evolution pathways in 3D printing technology field.

However, there are still some limitations in using the above methods for technology evolution path analysis. In terms of data information extraction, only using structured information (such as IPC, citation information) ignores the important elements contained in unstructured information (such as text abstract, semantic information), which makes it difficult to obtain more detailed technology opportunities. In terms of semantic analysis, text mining-based methods mostly focus on the single word rather than semantic relationships, which may lead to inaccurate linkage analysis. Therefore, in this study, we use SAO semantic analysis to process data information in the target technology domain (He et al., 2019). A more complete semantic understanding can be obtained by deeply analyzing the inner connections between independent phrases.

Subject-action-object (SAO) semantic analysis

SAO semantic analysis is developed based on TRIZ theory, which can effectively reflect the semantic relationship among different technology elements (Wang et al., 2017). Furthermore, SAO semantic analysis can form a problem–solution model for the identification of core components in the technology innovation process (Yang et al., 2017). In this model, the S structure represents the new solution, the A–O structure represents the old technology point or the problem to be solved (Kim & Yoon, 2021).Therefore, the SAO semantic analysis is not only able to show the semantic relationship between technical elements, but also to mine the actual technical question and corresponding solutions in a more complete way.

SAO structures also can be visualized to depict the technology evolution paths. This method effectively makes up for the shortcomings of the above methods, which facilitates the track of technological elements and the technology trends. Now, this method is widely used to identify technology opportunities and technology components and to calculate patent similarity. For instance, Kim et al. (2018) developed the new method SAO-x, which enabled an in-depth examination of the purpose and effect of the technology. Choi et al. (2011) presented a procedure that formulates an SAO network by using SAO models extracted from patent documents and demonstrates its effectiveness. Wang and Yang (2019) calculated the weight of the SAO in patents in the robotics area and measure the similarity between patents by introducing the calculation index DWSAO.

In this paper, a hybrid method to construct technology evolution maps is proposed by identifying specific SAO relationships. The SAO semantic analysis is used to determine the semantic association relationship between technical elements in the specific technological domain. Identifying meaningful links between two SAO structures results in more accurate obtainment for technology opportunities.

Research methodology

In this section, the whole research process is summarized. The process includes the use of the SAO semantic analysis based on LDA topic model to capture semantic relationships among topics, the construction of the technology evolution pathways based on scientific papers and patents separately. The gaps analysis and the potential technical topics analysis are also introduced to identify technology trends.

The framework of the integrated method for technology evolution is shown in Fig. 1, which includes four phases.

Fig. 1
figure 1

The framework of mapping the technology evolution paths

Data collection and preparation

The field of all-solid-state lithium batteries technology (ASSLIBs) is selected for a empirical study. In recent years, with the rapid development of the electric vehicle era, ASSLIBs are being considered as attractive and promising technologies for next-generation clean energy storage (Ke et al., 2020). Therefore, analyzing the evolution trends of ASSLIBs technology is of great importance for R&D personnel in chemical industry enterprises and policymakers.

The web of science (WOS) and Derwent innovations index (DII) databases have been selected as the data sources for collecting data. Based on literature surveying and expert knowledge, the search queries “((all-solid-state lithium batteries) or (all-solid-state lithium-ion batteries) or (all-solid-state Li-ion batteries))” is used to search the scientific papers from Web of science, and after manual elimination of scientific papers not related to all-solid-state lithium batteries, 3274 scientific papers are retrieved from 2002 to 2021 (the search time is January, 2022). The types of articles include Articles, Proceedings Paper and Reviews. The search term “((ASSLIBs) or ((all-solid-state) and (lithium-ion batteries)) or (all-solid-state Li-ion batteries)) is used to search the patents from Derwent Innovations Index database, and 1876 patents are retrieved from 2002 to 2021 (the search time is January, 2022).

During the data preprocessing, according to the overall time span based on the time interval proposed by Liu et al. (2020), the scientific papers and patents datasets are divided by year and the overall time scale is divided into five periods. Among them, each period includes four consecutive years. The final period setting and the corresponding number of papers are shown in Table 1. Second, the word-tokenize tool in NLTK natural language package of python program is used to is used for word segmentation. Then the POS tagging is also performed on the divided datasets. Meanwhile, unifying the singular and plural number, merging synonym, transforming full name and abbreviation, and removing irrelevant data are performed in turn. In specially, we remove some publisher information (such as “(C) 2018 Elsevier B.V. All rights reserved.”), punctuation (such as “%”、 “(”、 “ < ”), and other irrelevant characters (such as “this” “paper”) from the data based on the constructed stop word list, which ensure objectivity and reasonability of the data.

Table 1 Number of documents each period

As shown in Table 1, with the technology’s development and social demands entering a phase of sharp increment, the number of scientific papers and patents grows rapidly each year. Particularly, the number of scientific papers and patents increased sharply in T5 period. (333.62% increase for papers and 144.17% increase for patents compared with T4 period). These growing rates show that the research and development of ASSLIBs have been energetic in last four years. Besides, the number of scientific papers has been more than the number of patents since 2002. This indicates the development process of acquiring scientific results has been faster than the process of applying the related technologies.

SAO semantic analysis based on LDA topic model

In this phase, the LDA topic model is firstly used to determine the number of topics and the corresponding technical terms of scientific papers and patents. Then the technical terms above are tagged in the raw datasets and the SAO semantic analysis method are used to extract the semantic structures corresponding to each technical term. Finally, through semantic similarity calculation, the SAO structures with high similarity are clustered to obtain the final technical topics and technical elements, which provide the data basis for the construction of the technology evolution pathways. This phase is conducted using three specific tasks which are applied one after the other. The three detailed tasks are as follows.

Extracting technical terms based on LDA topic model

Based on the acquired data under the time slice, the appropriate clustering number is firstly determined by calculating the function perplexity (Sugimoto et al., 2011). The perplexity is a popular indicator of the LDA model to evaluate the language model, applied to automatically evaluate the quality of topics. This paper is mainly based on the Gensim library in Python to calculate the perplexity value for different periods to ensure the optimal number of topics, where the smallest perplexity value indicates the best performance of the model. The mathematical formula for Perplexity calculation is as follows.

$$Perplexity(D) = \exp - \frac{{\sum\limits_{d = 1}^{M} {\log (p(w))} }}{{\sum\limits_{d = 1}^{M} {N_{d} } }}$$
(1)

where D denotes the whole data set, M denotes the number of texts contained in it, \(\sum\nolimits_{d = 1}^{M} {N_{d} }\) denotes the number of words in the data, and the \(p(w)\) denotes the probability of that document generating a word in the data.

Then set the α and β parameters of the Dirichlet which denote the per-document topic distribution and the per-topic word distribution. Finally, the technical topics under different periods are obtained after multiple topic training. The core of the LDA topic model is using the Dirichlet distribution to reduce the dimension of documents and achieve efficient clustering of documents and terms. The specific process is to first construct the document-topic matrix by generating topics of each document, and secondly construct the topic-word matrix by generating corresponding word distribution of each topic. The above process is repeated until the topic and subject words of all documents are generated. The core formula is as follows.

$$p(\theta w_{i} \left| {\alpha ,\beta } \right.) = p(\theta \left| \alpha \right.)\coprod\nolimits_{n - 1}^{N} {p(z_{n} \left| \alpha \right.)} p(w_{n} \left| {z_{n} ,\beta } \right.)$$
(2)

where d represents the current patent document, k represents the generated topics, which obey the Dirichlet prior distribution \(\theta_{{\text{d}}} \sim Dir(\alpha )\). \(Zdi\sim Multinomial(\theta_{d} )\) represent the topic of word i in document d, \(\phi_{k} \sim Dir(\beta )\) represents the distribution of lexical items generated for each topic k, and \(w_{di} \sim Multinomial(\phi z_{di} )\) represents the lexical item of word i in patent d. For a detailed explanation of the LDA, researchers can refer to Jelodar et al. (2018).

For this process, firstly the best appropriate term clustering number is obtained by calculating the perplexity of the scientific papers and patents datasets. Secondly, the LDA topic model is run to get the several technical terms clustering results based on scientific papers and patents. However, there are still some problems when using the LDA topic model results for analysis: simple noun words/phrases contain limited semantic information and may belong to several different technical topics. Therefore, without understanding the semantic context in which the technical terms are embedded, the researcher is unable to accurately interpret the technical topics at a certain depth and breadth, resulting in inaccurate topics recognition. Therefore, in this paper, we mark the technical terms obtained from the LDA topic model in the raw datasets, and then use the SAO semantic analysis method to extract corresponding SAO structure. This method improves the ambiguity of determining technical topics relying only on the LDA topic model, and improves the semantic relevance and accuracy of technical topic identification.

Extracting the SAO structure and topic clustering

Based on the identified technical terms, we use the Stanford Parser software to mine the SAO structure of the sentence in which the technical terms are located in the raw datasets. When the characteristics of SAO are applied to technical topic analysis, the relationships between technical terms are specified as the technology “problem” and corresponding “solution”. This allows researchers to mine information not easily identified by common literature analysis methods such as bibliometrics and citation analysis, and provides researchers with more detailed and accurate technical topic analysis and trend prediction. In this paper, the specific SAO structure extraction process is roughly divided into two steps to meet the special requirements for natural language processing. The more detailed steps are shown in Table 2.

Table 2 SAO structure extraction steps

In this process: a. expand the technical terms lists based on morpheme, obtaining nouns, verbs, participles and other forms of technical terms to ensure the integrity of SAO structure screening. b. the python program is used to tag sentences containing technical terms. c. SAO structures are extracted from tagged sentences containing technical terms based on the Stanford Parser tool. d. we clean the SAO structures based on expanded technical terms and stop word lists, and integrate of SAO structure based on fuzzy matching. It is worth noting that we should filter and retain the valid SAO structure. The extracted SAO structure should be directly related to the studied technical terminology, which corresponds to the S structure or O structure in the SAO structure. In this way, the SAO structures obtained finally can preserve their semantic features while emphasizing technical terms.

The Scientific papers and patents are usually used to solve problems in manufacturing or research and to provide corresponding solutions. A scientific paper or patent can be composed of multiple topics, each of which can be described by a series of “problems” and their “solutions” related to that topic. Compared with single word and phrase, the SAO structure takes into account the specific semantic information, i.e., the “problem” of a technology and the corresponding “solution”, which facilitates topic identification and interpretation.

Therefore, in this paper, we construct the SAO structure matrix based on the extracted SAO structure and introduce it into the LDA topic model as input data. Then, the SAO structures are clustered again based on the LDA topic model operation principle and Eqs. (1), (2). Finally, we complete the technical topics based on the clustering results. The detailed LDA topic model based on the SAO structure are shown in Fig. 2.

Fig. 2
figure 2

The LDA topic model based on SAO

Generating the technological subsystems

After obtaining the technical topics and the corresponding SAO structures, the technology systems are constructed to provide data information for the construction of subsequent technology evolution pathways. The technical topics system based on the SAO structure is shown in Fig. 3.

Fig. 3
figure 3

The technical topics system

The specific steps are as follows. First, based on the technical topics and the structure of the battery components, all SAO structures are divided into three categories: electrodes, electrolytes and interfacial layers. These three categories constitute the first level of the technology system, which can be defined as the content of the vertical axis. Second, the technical topics are used as the technology evolution topics, and the technology elements S or O in the SAO structure are used as the development node in this topic. And the association relationship among the technology elements is judged by the A structure in SAO sentences to highlight the inheritance and improvement relationship between the technology elements. These nodes from the S and O structure and the A structure serves as the inter-node linkage, which constitute the second layer of the technology subsystem.

Technology evolution map construction

The combination of the topic clustering method and SAO semantic analysis method is used to map the evolution path of science and technology topics. This hybrid method can not only clearly show the evolution relationship between terms, but also be more systematic and comprehensive than the single method analysis. In this paper, all technological topics and terms from SAO structures are divided into different dimensions for generating the subsystem of the whole ASSLIBs technology. In the evolution map, the vertical structure indicates the division of the technological subsystem, and the horizontal axis indicates successive slices of time. The detailed steps to construct the map are as follows.

Firstly, based on the subsystem of the ASSLIBs technologies, we classify the SAO structures into appropriate subsystem layers, and locate the topics and related terms on the map. Secondly, the topics or related terms with semantic relations are connected and the relationship between them is marked on the line. Finally, we construct the evolution pathway mappings of ASSLIBs technologies based on the scientific papers and patents separately.

Technology evolution trends analysis

After the technology evolution pathway mappings based on the scientific papers and patents separately, we carry out the difference analysis of the technological topics in recent period, so that the potential technology opportunities could be forecasted. Secondly, the topic status is determined in the most recent phase to tap potential topics and served as a technical opportunity for future development. In this paper, the potential topics are determined by growth rate indicator and topic intensity proportion at this stage.

To be specific, the definitions of the two parameters are listed as follows:

  1. (1)

    Growth Rate Indicator (GRI): This indicator reflects the growth rate of the topics. The topic life state can be judged by comparing the number of documents on one topic in this period with the sum of documents on this topic in five years. If the value increases over successive years, it indicates that the topic is in the newborn or growth stage. This paper refers to the study of Li et al., (2019), the GRI is calculated:

    $$GRI_{{\text{i}}} = \frac{{a_{{\text{t}}}^{i} }}{{\sum\nolimits_{t - 4}^{t} {a_{{_{t} }}^{i} } }},(t \ge 4)$$
    (3)

    where \({\text{a}}_{t}^{i}\) represents the document number of topic i in period T.

  2. (2)

    Topic Intensity Proportion Indicator (TIP): This indicator reflects the topic popularity and heat, which can be used to reveal the development potential of a topic in the future. Specially, in the t + 1 period, the heat of a certain topic exceeds the average heat of different topics in the same period, and the intensity of the topic is higher than the average intensity of the topic. After the t + 1 period, this topic can be considered as a topic with development potential and needs to be focused on. According to the (Li et al., 2019), first, the Average Topic Intensity is calculated with the following equation:

    $$ATI_{{\text{t}}} = \frac{{\sum\nolimits_{1}^{N} {TI_{t}^{z} } }}{N}$$
    (4)

In Eq. (4), the \(TI_{{\text{t}}}^{z}\) represents the intensity of topic Z in period T, which can be calculated by the topic probability model. N represents the number of research topics detected in period T. \(ATI_{{\text{t}}}\) represents the average topic intensity of all topics in the analysis data source within time period T. If \(TI_{{\text{t}}}^{z}\) > \(ATI_{{\text{t}}}\), the topic is higher than the average topic intensity value and has an excellent development potential in the next stage. In order to better represent the evolution and development of the topic at different stages, the normalized processing method is selected and the intensity ratio index of the topic is introduced in Eq. (5).

$$TIP_{{\text{t}}} = \frac{{TI_{t}^{z} }}{{ATI_{t} }}$$
(5)

In Eq. (5), Topic Intensity Proportion (TIPt) represents the normalization result of the topic intensity value and the average topic intensity value. If TIPt > 1, it indicates that the topic is higher than the average topic level and has better development potential in the future.

Results and discussions

SAO semantic analysis based on topic extraction

The topic extraction based on SAO semantic analysis consists of three parts: technical terms extracting based on LDA topic model, SAO structures extracting and analyzing, the topic clustering and the construction of the technological topic system.

Technical terms extraction based on LDA topic model

The corresponding titles and abstracts are extracted from each period data of the scientific papers and patents. The reprocessed data are converted into CSV format as input datasets in LDA topic modeling. First, the perplexity value of the model under different topic number is calculated, and the optimal number of topics in each period is obtained. Then, set the optimal number of topics in the LDA topic model. According to the Gibbs sampling method, the experimental parameters in the model are specifically set as follows: alpha = 0.01, beta = 0.01, minimum_probability = 0.001, chunksize = 100 and the max_iterations = 10,000. The specific data results are shown in Table 3.

Table 3 Perplexity calculation results in different time domains

Then, the LDA topic model is used to cluster the scientific papers datasets and patent datasets separately. The results of technical terms clustering are shown in Tables 3 and 4. In Tables 4 and 5, it reflects the technical terms and their probabilities under a technical topic under different periods.

Table 4 Scientific papers clustering results in ASSLIBs (partial)
Table 5 Patents clustering results in ASSLIBs (partial)

Table 5 shows the patent technological terms with their probabilities for each period from T2 to T5.

At different periods, each topic contains 7–10 technical terms with maximum probability. The above technical terms are indexed in raw data by the specific python program.

SAO structure extraction and topic extraction

The top related technological terms from Tables 3 and 4 above are indexed in the original data sources by python program. The indexing process is shown in Table 2. Then we use the Stanford Parser software to identify SAO structures from the flagged data. Meanwhile we clean and merge SAO structures using fuzzy matching and subject headings. As a result, the SAO structures from scientific papers and patents data sources are extracted separately, and we obtain 378 and 289 groups of SAO structures. After extracting and cleaning the SAO structures, we use this processed data as an input into the LDA topic model and run it again to identify the technical topics.

In this process, we calculated the number of the similar SAO structures to distinguish and identify key technical elements. In a SAO structure, the verb part “Action” in SAO structure is as the relationship among topics which include “increase”, “improve”, “reduce” and so on. We invited four researchers from Lomon Billions Group Co., Ltd in Henan, China to screen and technological topics and the SAO structure. These four researchers have been researching in the ASSLIBs field for over 5 years and have rich theoretical and practical experience to help identify the technology topics with the help of LDA topic model. Synthesizing the expert’s opinion, we identify the technical elements with high frequencies and core evolutionary relationships among the scientific topics and patent topics from the whole SAO structures. Tables 6 and 7 summarize several important semantic relationships based on scientific papers text mining and patent text mining with examples.

Table 6 Important topics semantic relationships with examples based on scientific papers
Table 7 Important topics semantic relationships with examples based on patents

Technological subsystems generation

Firstly, the experts in ASSLIBs technology domain preview the topic clustering results. They divide the technology topics into three categories based on the research experience and battery components. The three categories also fully reflect the main development trends of ASSLIBs technology. The three categories of the technology are considered: “Electrode”, “All-solid Electrolyte” and “Electrode–Electrolyte interface”. The “Electrode” addresses the positive and negative electrode materials and advanced processes. “All-solid Electrolyte” addresses all types of solid electrolytes and their optimization. “Electrode–Electrolyte interface” addresses all technologies that can solve the poor conductivity in electrolyte layers. Then, the experts subdivide all topics and terms into the corresponding categories and layers based on the three categories that have been divided. Finally, the technical subsystems and corresponding topics of the ASSLIBs are generated, as shown in Table 8.

Table 8 Partial topics division in the technological subsystem

Table 8 shows the division of partial scientific topics and patent topics for each period. This division is useful for generating the technology evolution map, as it directly defines the vertical structure of the map of technology evolution.

Technology evolution map construction and analysis

It’s understanding the path of the technology evolution fully that is significant for forecasting technology development trends. The map of the technology evolution based on the scientific papers and another based on patents are constructed in this step.

Firstly, according to the classification of the topic subsystem in Table 8, we construct the vertical axis of the evolution map. The vertical axis represents the technical subsystem obtained above, and the horizontal axis represents the different time periods. Then according to the relationships of topics in Tables 6 and 7, we finish the evolution map of ASSLIBs technology as shown in Figs. 4 and 5, and the different technical topics are distinguished in different colors in each period. Meanwhile, we use the yellow five-pointed star symbols to label the high frequency technical elements in the above SAO results in Figs. 4 and 5. From these related terms, we can understand the development process of ASSLIBs technology by analyzing the variation of these topics and their top related terms in each layer over time.

Fig. 4
figure 4

Evolution map of ASSLIBs technology based on scientific papers

Fig. 5
figure 5

Evolution map of ASSLIBs technology based on patents

In Fig. 4, in the electrode layer, there are mainly three technical topics including the negative electrode active material, positive electrode, and the new electrode materials. Top related technical terms such as metal-doped, Fes additive, Nanocomposite material, Nanostructured Si/C fibers have appeared with changes over time. That shows the negative electrode active materials are mainly divided into two evolution paths: adding metal and designing Nanostructure. Adding metal to form a protective layer can reduce the generation of dendrites and dead lithium in the cycle of the battery. As one of the most mainstream modification methods of the negative electrode material, it has been widely studied. Compared to the addition of metals, the design of nanostructures has been developed later, with ongoing research beginning around 2010. Designing and manufacturing the Nan micro structure such as the Nanopore array can change the surface of the electrode, to improve the overall performance. Top related technical terms such as LiMn2O4, Nis-VGCF composite, LiCoO2, Ni-rich cathode material have appeared with changes over time, which shows that the application of positive electrode materials have changed from LiFePO4 in 2010 to the Ni-rich, Li-rich, sulfur and so on. Compared to LiFePO4, Ni-rich, Li-rich and sulfur have higher energy density. Therefore, the combination of lithium metal with high load and high specific energy cathode material is considered to be a reliable way to improve the energy density of ASSLIBs. Top related technical terms such as metal polysulfides, Titanium hydride, MgH2, TiFe-hydride have appeared with changes over time, showing that metal hydride has gradually become a new electrode material for ASSLIBs. MgH2 with a tetragonal TiO2 structure has attracted much attention because of its good dynamic performance at high grow rate discharge since 2008. We predict that the clean, pollution-free, and safe materials may become future development opportunities, and the electrode materials will be multi-doped and more economical. The electrode microstructure design such as 3D structure and metal protective film will be paid more attention to in the future.

In the all-solid electrolyte layer, there are eight types of solid electrolytes related to ASSLIBs. The polymer electrolytes appeared in 2002, the sulfide-based electrolytes appeared later and the solid electrolyte based on oxide including the garnet-type, NASICON-type and Perovskite-type appeared around 2010. Currently, there are a variety of all-solid electrolytes, and the coexistence of multiple structures will continue to be a trend of the technology in the short term. In oxide-based electrolytes, LLZO is considered an ideal electrolyte material due to its high electrical conductivity and stability. Compared with oxide-based electrolytes, sulfide solid electrolyte also has been paid attention to since 2006 for its high ionic conductivity, low grain boundary resistance and high oxidation potential. In recent years, the organic–inorganic composite solid electrolyte has been more focused on, which combines the advantages of inorganic electrolyte and polymer electrolyte. This shows that the current research trend has shifted from polymer electrolyte and oxide electrolyte to organic–inorganic composite solid electrolyte. We predict that the development of organic–inorganic composite solid electrolytes will become a research focus in the future. At the same time, LLZO-type electrolyte interface manufacturing method will also receive continuous attention.

In the Electrode–electrolyte interface layer, the methods to improve the electrode–electrolyte interface mainly include adding a coating layer and adding conductive additives. Since 2010, researchers have modified the interface by introducing additives or modifiers to effectively reduce the interface resistance. In recent years, researchers have begun to focus on the inhibition of lithium dendrite growth by coating electrolytes. Top related terms such as single thin polymer coat, LiNbO3-coated and Li2ZrO3 nanolayers have appeared with changes over time, which show that researchers pay more attention to reducing the interface resistance by coating protective layer.

In Fig. 5, the research on the application of ASSLIBs technology has been widespread since 2006, a period later than the theoretical research of ASSLIBs. From the emergence of the new topics and top related terms in different layers and different periods, we can see that the applied research moves from electrode materials to the all-solid electrolytes and then to the electrode–electrolyte interface design. From this, we are able to understand the development process of ASSLIBs by analyzing the variation of technical terms in each layer over time.

In Fig. 3, in the electrode level, there are mainly about the positive electrode materials, negative electrode active materials and the electrode collector layer. Top related terms such as oxide-based positive electrode, core–shell structure and metal siloxane skeleton/silazane skeleton have appeared with changes over time, which shows that the research of positive electrode materials has gradually developed from oxide coating in 2010 to metal oxide particle doping in 2014, and then to the application of nanocomposite electrodes in 2018. It has great potential to use oxides as cathode coating targets to improve battery cycle performance. Moreover, the carbon-based nanocomposites can not only improve the electrochemical performance, but also improve the stability of the battery structure. From the shifting of the related terms, we can see that the design and development of high ionic conductivity, stable and safe cathode materials are the future research trends, and the nanocomposite electrodes will get more researchers’ attention. Top related terms such as compound material and silicon-type alloy have appeared with changes over time, which shows that the negative material gradually changes from lithium metal to composite material and lithium alloy. For example, lithium silicon alloys appearing in 2016 are considered to be an alternative to lithium anodes. The shifting of electrode materials indicates that the materials with high stability and high conductivity are the future development trend. The organic–inorganic composite materials and alloy materials are a direction for future development.

In the all-solid electrolyte layer, there are seven types of structures related to the electrolytes. The polymer electrolyte and the sulfide-based electrode have developed since 2006. Since 2014, the oxide-based electrolyte such as LLZO-type and NASICON type has been widely researched. These results show that polymer electrolyte, sulfide electrolyte and oxide electrolyte are the three most important electrolytes in current research. New polymer electrolytes appeared in 2018, indicating that researchers began to focus on coating and modification of the traditional polymer electrolytes. A variety of all-solid-state electrolyte structures indicates that the coexistence of multiple structures will continue to be the trend of the technology in the short term.

In the Electrode–electrolyte interface layer, there are mainly the additive and the modifier to improve the conductivity of the electrode–electrolyte interface. The solid electrolyte film appeared in 2010, and the additive and binder appeared later. This development indicates that researchers began to focus on improving the overall performance of the battery by adding functional additives. Compared with adding film coating, the application of additives is more convenient and has comprehensive advantages. Therefore, the electrode–electrolyte interface development trend will involve a composite additive with comprehensive advantages in capacity, stability and conductivity.

Technology evolution trends analysis

Gaps analysis between scientific papers and patents

As shown in Figs. 4 and 5, there are evident differences between the evolution path based on scientific papers and patents. This indicates the research and development of the ASSLIBs technology in scientific papers and patents are also diverse. Based on the comprehensive research on the similarities and differences between scientific research and technology (Qi et al., 2018; Wang et al., 2015), we compare the technical topic clusters in T5 period, as shown in Table 10. In Table 10, the appearance time in papers and the appearance time in patents mainly refer to the earliest appearance of the technical topics in the T5 phase, which can be obtained from the patent publication data.

Due to the large period of the evolution paths, the relationships among technological topics have become increasingly complex. To analyze the latest development trend of ASSLIBs technology, this paper selects the topic clusters in the T5 period (2018–2021), which the topic clusters are the newest and the most comprehensive. As shown in Table 9, the Time lag referred to the time when the top related terms in clusters first appear. The visualization results are shown in Fig. 6.

Table 9 Comparison of technology trends analysis based on scientific papers and patents
Fig.6
figure 6

Technology trends analysis based on the gaps between scientific papers and patents

In Fig. 6, every topic cluster includes related topics and the top related terms, represented by the dotted circles and the top related terms are expressed in solid circles. The correspondence relationships of the topic clusters are connected by dotted lines. In particular, red dotted lines represent potential technological opportunities.

As shown in Table 9 and Fig. 6, from 2018 to 2021, the topic clusters based on scientific papers were always ahead of the clusters based on patents, but the time lag was basically about one year and even less. Therefore, the topics which was only appeared in the scientific papers topic clusters or only appeared in the patents also could be considered as potential technological opportunities in the future. Specifically, the cluster Current collectors and the Electrode collector layer are the field in which technological developments appear but lack scientific research, also presenting a gap. This shows that these two topic clusters are widely used in practical technology R & D, but lack the supplement of scientific research. Therefore, these two topics can be regarded as technical opportunities in the field of scientific research after 2020. The topic cluster Electrolytes modifier appeared in scientific papers in 2020, and the Additive with 3D microstructure was first put forward to improve interface conductivity. However, there is no research on the additive of 3D microstructure that emerged in 2020 in patents, which also can be regarded as technological opportunities after 2020. In 2021, new topic clusters appeared in scientific papers, such as metal hydride-based, sheet-style anodes and modified Li3PS4, while there is none in patents. Therefore, these three topics can be regarded as technological opportunities after 2021.

Potential technical topics identification

In this stage, we calculate the growth rates and the intensity proportion of technical topics obtained from the scientific papers and patents and tried to find the topics with development potential.

Similarly, we select the technical topics in T5 period (2018–2021) including 13 topics based on scientific papers and 12 topics based on patents. The growth rate and intensity proportion are computed for the 25 topics from two aspects, as shown in Table 10. Then the two-dimensional evaluation system is established with the growth rate and intensity proportion values, where the 25 topics were divided into four quadrants, as shown in Fig. 7.

Table 10 The growth rate and the intensity proportion of the 25 topics
Fig. 7
figure 7

The scatter diagram of the topics of ASSLIBs

As shown in Table 10 and Fig. 7, conductive additive, metal hydride-based, perovskite-type, new polymer electrolytes and the interfacial coatings in the scientific papers are the most important topics with the highest growth rate. Identically, the perovskite-type and the new polymer electrolytes in patents also keep the high intensity proportion and high growth rate. Based on these results, we could see that the corresponding topics in the scientific papers and the patents follow the same trend. Another seven topics are identified with relatively high intensity proportion shown in Quadrant II, reflecting the future development opportunities of ASSLIBs to a degree.

Based on the above analysis, we can predict the ASSLIBs technology development trends by combing the evolution maps and the scatter diagram based on the scientific papers and the patents. Therefore, as shown in Figs. 4, 5, 6, 7 and Tables 9, 10, the final technology trends and opportunities could be obtained, and the results are as follows.

  1. (1)

    Perovskite-type electrolytes have emerged since 2010 and have been widely used in practical production. This topic shows a high growth rate and high theme intensity both in scientific papers and patents. Particularly, the non-metallic modification of LLTO will receive more attention in the future, which can be an innovation opportunity.

  2. (2)

    The new polymer electrolyte came into public view after 2018. Similarly, this topic also shows a high growth rate and high intensity in scientific papers and patents, and the growth rate and intensity in patents are higher. According to the above analysis, we predicted that the new polymer mainly composed of PEO derivatives will have higher growth potential after 2021.

  3. (3)

    Interfacial coating technology first appeared in 2018, showing the highest topic growth rate and topic intensity in scientific papers. In recent years, the method of reducing the resistance between battery and electrolyte by interfacial coating has been widely recognized. Therefore, we can believe that this topic will have a higher growth potential in scientific papers after 2021.

  4. (4)

    Glass–ceramic electrolytes, as one of garnet-type electrolytes, are widely used in the preparation of solid electrolytes, which also shows the highest growth rate and the highest intensity in patents. In recent years, the modification technologies for glass–ceramic electrolytes have been gradually enriched. Therefore, we can predict that it will have a higher growth rate and topic intensity after 2021.

Conclusion

The exploration of the technological evolution paths and identification of the technology innovation opportunities can be crucial to stimulate the technology innovation of enterprises. However, existing studies oriented to the technological evolution paths seldom consider the semantic linkages of technical terms. At the same time, current researches related to technology evolution lack a more in-depth and complete exploration of the technology development trends. In this paper, an integrated method including the LDA topic modeling, SAO semantic analysis and technology evolution mappings has been introduced. The multiple methods are combined to quantify the path information, which improves the accuracy of technical terms descriptions, and provides a new perspective for technology evolution research. The gaps analysis between science and technology is used to forecast technology development trends, and the growth rate and intensity proportion indicator are calculated to obtain potential technology topics. All-solid-state lithium batteries technology is selected as a case study, through which the proposed framework is proven to be valid and flexible. This study contributes to the technology forecasting methodology and sheds light on the emergence and future trends of technology studies.

However, there also exist some limitations that need to be improved in the future. Firstly, for the topic clustering method, we use only the LDA topic modeling to cluster technical topics. Although the LDA modeling can be used to cluster the documents which have a similar topic, it is not enough to capture the whole valuable topics. The existence of meaningless topics not only increases the computational workload, but also affects the accurate process of the technology evolution paths. In the future, the algorithm improvement and parameter tuning will be applied to further improve the accuracy of technical topic extraction. Secondly, comparing the gaps between topics based on scientific papers and patents is a meaningful method to predict technology trends. Although the gap analysis in this paper can be used to analyze the time lag of similar topics, it is not enough to justify the differences in aspects of the topics. The reason is that the differences in content and context of similar topics may also affect future trends. In the future, more quantitative analysis methods will be applied to calculate the deeper relationship between similar topics, which may be more comprehensive for forecasting technology trends. In addition, it is still not sufficient to consider only scientific papers and patents as the research data, which needs to be supplemented by multiple data.