Knowledge organization of node enterprises’ technological innovation under supply chain environment

Zhang, Qianqian; Liu, Shifeng; Tu, Qun

doi:10.1007/s40747-021-00388-9

Knowledge organization of node enterprises’ technological innovation under supply chain environment

Original Article
Open access
Published: 12 May 2021

Volume 9, pages 2459–2473, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Knowledge organization of node enterprises’ technological innovation under supply chain environment

Download PDF

1379 Accesses
1 Citation
Explore all metrics

Abstract

An improved text classification method based on domain ontology is proposed in this paper to organize the mass information that records node enterprises’ innovation activities under the supply chain environment. This method can classify the documents of node enterprises under the supply chain without a training set. It achieves a precision of 80% for documents’ classification, which outperforms the baseline method. Besides, the paper constructs a domain ontology of enterprises’ technological innovation under the supply chain that effectively enhances the semantic relationship between words. Therefore, it can summarize and classify the textual information generated by node enterprises in product design, production, storage, logistics, and sales.

Internet Articles Classification by Industry Types Based on TF-IDF

Combining machine learning and main path analysis to identify research front: from the perspective of science-technology linkage

Article 28 June 2022

Enriching BERT With Knowledge Graph Embedding For Industry Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the development of technology and the economy, a new supply chain management model has been formed in the global business community. As a result, the enterprises’ technology innovation model has been changed from a single enterprises’ original independent innovation to a collaborative innovation model of upstream and downstream enterprises in the supply chain. The supply chain involves multiple entities such as suppliers, manufacturers, retailers, and customers. The innovation activities and processes of all entities form the enterprises’ technological innovation in a supply chain. Therefore, the supply chain entities should innovate collaboratively to improve the entire supply chain’s competitiveness. Figure 1 shows the main entities in a supply chain and the process of technological innovation. The node enterprises and upstream enterprises in the supply chain need to convey the market supply and cost information. Node enterprises in the supply chain need knowledge sharing and integration. Node enterprises and downstream enterprises or customers need to transfer the market demand and product information. Therefore, the information collection, classification, and knowledge system reconstruction of the entire supply chain is the key to promote an enterprise’s technological innovation, which is conducive to enhance product competitiveness and even the whole supply chain.

Due to the complexity and the huge amount of information produced by innovation under the supply chain, a text classification method that can automatically process, organize and mine textual data is highly demanded. However, most of the existing studies focus on exploring influencing factors and cooperation modes of innovation under the supply chain by the empirical study [1,2,3,4]. A complete knowledge system should be built to search, organize, and analyze each node enterprise’s knowledge to develop the text classification method. Furthermore, it makes information sharing, synchronize planning, and process coordination between members across different regions and industries come true.

Therefore, this paper constructs an ontology of the enterprise’s technological innovation field under the supply chain environment and classifies and summarizes the textual information. This method can generate the influential factors of innovation under the supply chain dynamically, providing researchers or managers with influential factors of innovation under the supply chain and understanding the production knowledge dynamically in this field. Besides, the knowledge organization and sharing for node enterprises can realize enterprises’ continuous innovation and enhance the entire supply chain’s competitiveness. The remainder of this paper is organized as follows. The second section presents the literature review on existing semantic text classification algorithms and semantic similarity methods based on ontology. The third section provides the implementation process of the proposed methodology. The fourth section presents the experiments, results analysis, and performance evaluation. The final section concludes the work and contribution of this paper and presents the limitation and future works.

Literature review

Supply chain management and enterprises’ technological innovation

The supply chain is a functional network built around by the core enterprises [5, 6]. It revolves around the core enterprise and connects suppliers, manufacturers, distributors, retailers, and end-users through information flow, logistics, and capital flow. Supply chain management spans all activities from raw materials to final products. The synergy of demand and supply brings competitive advantages for enterprises in terms of value and cost. Technological innovation in supply chain management helps enterprises to reduce procurement costs and production costs.

Most researchers explored the association between efficient supply chain management and enterprises’ innovation by empirical inquiry or survey methods [1,2,3,4,5,6,7]. An increasing number of scholars recently realized the importance of data analysis and text mining for supply chain management. Schniederjans et al. [8] enhanced the supply chain digital research paradigm through a large-scale literature review and a textual analysis of digitization technologies and topics. Kim et al. [9] explored sustainable supply chain management trends and firms’ strategic positioning and execution based on news articles and sustainability reports with text-mining algorithms. Chu et al. [10] proposed a text-mining-based global supply chain risk management framework to identify region-specific supply chain risks. Chircu et al. [11] presented research examining the use of business analytics, big data, and business intelligence methods in operations and supply chain management by analyzed 625 published papers with text mining. Rozados et al. [12] concluded the trend and related research of big data analysis in supply chain management.

Semantic text classification algorithms

Text classification is a text-mining algorithm that automatically assigns the analyzed document to one or more pre-defined categories based on its content [14]. Traditional supervised text classification methods such as Support Vector Machines (SVM), Naïve Bayes, decision trees, and Latent Semantic Analysis (LSA) K-Nearest Neighbor (KNN) generally presented by the terms and their feature weights, also known as the “Bag of Word” (BOW) representation model. The number of words determines the word vector dimension in the vocabulary, which usually results in a very high and sparse dimensional document vector [15,16,17,18,19,20,21].

Ontology is a conceptual, structured, and standardized knowledge representation and organization method that can describe semantics and hidden knowledge from enormous amounts of information. Using domain ontology for knowledge representation can explore similar topics or events in the documents. Hence, it can construct a text representation model with the pre-defined semantic relationships between recognized entities and knowledge from the ontology and augment it with important background facts that are not directly present in the document. With this knowledge, the system can distinguish which terms or concepts are more important and focus on categorizing more precise information.

1.
Some researchers utilized the domain ontology to enrich the semantic feature vector representation and improve text classification accuracy. For example, Elhadad et al. [22] proposed building the feature vector for web text document classification based on the WordNet ontology. Abdollahi et al. [23] utilized the UMLS domain ontology to extract the key features and classify the medical text document.
2.
Some researchers utilized the hierarchical taxonomy of domain ontology in the text classification task. For example, Cerri et al. [24] classified proteins in functions organized according to the Gene Ontology hierarchical taxonomy. Liu et al. [25] proposed the text classification method based on the ontology graph and structure.
3.
Some researchers proposed a method based on the semantic similarity of concepts in the ontology for text classification. For example, Albitar et al. [26] proposed new text-to-text semantic similarity measures to replace classical similarity measures for text classification.

There is no research that utilizes the big data techniques for knowledge organization of enterprises’ technological innovation under the supply chain environment from the above literature survey. The traditional text classification methods are usually represented by BOW, which ignores the semantic relationship between terms and usually requires a large number of labeled training texts, which increased manual annotation workload. Using the hierarchy of knowledge from domain ontology directly in the text classification process can obtain the semantic relations between terms and directly skip classifier construction training steps without any pre-categorized training sets.

Therefore, there are two research points in our paper. First, use the big data techniques to automatically process, organize, and mining the large amounts of textual data generated by the node enterprise’s technological innovation and realize the knowledge service among node enterprises in the supply chain. Second, an improved text classification method does not require a large amount of training text to automatically organize and analyze the large amounts of textual data generated by the node enterprises’ technological innovation to realize the knowledge classification of enterprises’ technological innovation.

Methodology

Therefore, this paper utilizes the semantic concept model based on the domain ontology of enterprises’ technological innovation under the supply chain to improve the text classification and proposes an enhanced text classification method based on the semantic similarity and relatedness between keywords and categories. This paper mapped the target categories and the keyword sets extracted from the collected textual documents to constructed domain ontology concepts. Then, the mapped target category-concept set and keywords-concept set are obtained. The domain ontology-based semantic similarity calculation and the concept distribution-based relatedness calculation are used to obtain the weight matrix of semantic similarity and relatedness between keywords and categories. Compared to the maximum weighted value of semantic similarity and relatedness between keywords and categories in the matrix’s transverse space, the document categories can be obtained by the category corresponding to the keyword with the maximum value. The framework of the process on the improved text classification method based on the semantic conceptual model is shown in Fig. 2. According to the framework, there are mainly four steps in the improved methodology, and the detail is as follows. The main parameters used in the following equations are shown in Table 1.

Table 1 The main parameters in equations’ definition

Full size table

Text preprocessing

The module of text preprocessing mainly includes word segmentation, part-of-speech tagging, and stop word removal. First, utilize the Python software Jieba to segment the collected textual documents. The result of Chinese word segmentation will lead to the problem that Chinese phrases are incorrectly divided into multiple words, such as the phrase “enterprises technological innovation,” which were divided into three small-grained words “enterprises,” “technology,” and “innovation.” Hence, the custom dictionary utilized to defined particular terms in the field, such as “enterprise technological innovation,” “product innovation,” and “mechanism innovation”. Furthermore, tagged the text with part-of-speech (POS), where nouns are more representative and essential to the source document’s semantic information. Therefore, nouns, gerundial phrases, adjective-noun collocation were selected as the research objects. Finally, the useless words were filtered through the stop word dictionary, such as “a, the, we, us, they” and other terms with high frequency without meanings. The index structure’s size can be significantly reduced by stop word removal, and the keyword sets can be obtained. The general process of text preprocessing was shown in Fig. 3.

Domain ontology-based concept mapping

The key to constructing an improved semantic conceptual vector representation model based on domain ontology is the concept mapping from text keywords to ontology. The concepts of domain ontology are usually defined by attributes, keywords, or synonyms in the texts. Hence, there are four situations when mapping text keywords to domain ontology as follows.

1.
when the keywords in the dataset cannot be directly mapped with any concepts in the domain ontology, retained the keywords as the unregistered words while the frequency of the keyword is high. Calculate the frequency TF of the keyword, if $TF > \mu$, keep the keyword in the unregistered word set $w\{ w_{1} ,w_{2} ,w_{3} ,...,w_{l} \}$, otherwise delete the keyword.
2.
1:1 mapping. When the keywords in the text can directly be matched with the attributes in the domain ontology, the keywords can be directly replaced by the ontology concepts.
3.
1: n mapping. When the keyword $t_{j}$ corresponds to multiple concept attributes $c_{i}$ in the domain ontology, the mapping concept is determined by the matching degree between the keyword and each concept attribute in the domain ontology shown in formula (1). Selected the maximum value of the concept in $S$ to replace the keyword $t_{j}$, where $nc_{i}$ represents the number of multiple concepts $c_{i}$ in the ontology that matched with the keyword $t_{j}$.
$$ S_{tc} = \frac{{\sum\nolimits_{i}^{n} {\sum\nolimits_{j}^{m} {s_{tc} (t_{j} ,c_{i} )} } }}{{|nc_{i} |}}. $$
(1)
4.
The mapping relationship between keywords and concepts in n:1 and n:m, since the concepts in the domain ontology are usually composed of professional compound words. It is not easy to find concepts that directly and exactly matched the keywords. Therefore, utilize the maximum matching method to map multiple feature items to the same concept in mapping keywords to the domain ontology concepts. There are two situations for mapping keywords to multiple concepts. First, when one or more keywords are cross-mapped to multiple concepts, keep the multiple concepts from multiple keywords mapped to the domain ontology. For example, keywords t₁, t₂ mapped to concept c₁, while keywords t₁, t₂, t₃ mapped to concept c₂ and then kept the concepts c₁ and c₂. Second, when one or more keywords are mapped to multiple concepts without cross-over, keywords are unique in the text and retain the concepts directly.

Semantic similarity and relatedness calculation based on domain ontology

According to the previous literature review on ontology-based semantic similarity measures, this paper proposes a new calculation method that combines domain ontology-based semantic similarity and concept distribution-based relatedness. The proposed method obtained the semantic similarity matrix between concepts by calculating the semantic distance of concepts in the domain ontology, then calculating the relatedness matrix between concepts by co-occurrence frequency in the text, and fused the semantic similarity and the correlation matrix to obtain the final weight matrix.

First, assigned value to each node’s path in the ontology and calculated the semantic distance between concepts with the following formula:

$$ w[{\text{sub}}(c_{i} ,c_{j} )] = \frac{1}{{2K^{{{\text{depth}}(c_{j} )}} }} + 1, $$

(2)

where K represents the rate at which the weight value decreases with the ontology hierarchy, ${\text{depth}}(c_{j} )$ represents the depth from root to c_j in the ontology, and the ${\text{depth}}({\text{root}}) = 0$. Therefore, the semantic distance ${\text{Dist}}$ of the two concepts can be defined by assigned the path weights between two concepts and shown as follows:

$$ {\text{Dist}}(c_{i} ,c_{j} ) = \left\{ {\begin{array}{*{20}l} {0{, }\quad c_{i} \equiv c_{j} ;} \\ {w[{\text{sub}}(c_{i} ,c_{j} )]{, }\quad c_{i} \to c_{j} \, ;} \\ {\sum\nolimits_{{c \in s{\text{Path}}(c_{i,} c_{j} )}} {wc[{\text{sub}}(c_{i} ,c_{j} )],\quad{\text{others}}} } \\ \end{array} } \right., $$

(3)

where when the concept nodes c_i and c_j are the same concept, the semantic distance is 0; when there exists a direct path between the concept node c_i and c_j, the semantic distance is the path weight value between the two concepts; when there is an indirect path connected the two concept nodes c_i and c_j, the semantic distance is the sum of the path weights. The path weight assignment formula proposed above has the following properties.

1.
The value of the semantic distance between concepts at the upper level in domain ontology is bigger than that at the lower level because that the more abstract concepts in the ontology hierarchy have less similarity, and the more specific concepts have a greater similarity.
2.
The semantic distance between concepts in the parent class and subclass is smaller than the value of the sibling concepts, which indicates that different types of concepts have different weights.
3.
There is symmetry in the distribution of path weights between concepts.

The relationship between semantic distance and semantic similarity is inversely proportional. Hence, the semantic similarity ${\text{Sim}}(c_{i} ,c_{j} )$ can be calculated according to the semantic distance between concepts. The semantic similarity generally has the following properties.

${0} \le {\text{sim}}(c_{i} ,c_{j} ) \le 1$ defined the scope of the semantic similarity. When the c_i and c_j are the same concept, the semantic similarity is 1; when the concept c_i and c_j have nothing in common, the semantic similarity is 0.
$\forall c_{i} :{\text{sim}}(c_{i} ,c_{j} ) = 1$ defined the semantic similarity between c_i and itself as 1.
$\forall c_{i} ,c_{j} ,c_{k} :{\text{if dist}}(c_{i} ,c_{j} ) > {\text{dist}}(c_{i} ,c_{k} ),$${\text{then sim}}(c_{i} ,c_{j} ) < {\text{sim}}(c_{i} ,c_{k} )$ defined the relationship between conceptual semantic distance and semantic similarity. If the semantic distance between concepts c_i and c_j is greater than the semantic distance between concepts c_i and c_k, the semantic similarity between concepts c_i and c_j is less than that of concepts c_i and c_k. Therefore, the calculation of semantic similarity is shown as the following formula:
$$ {\text{Sim}}(c_{i} ,c_{j} ) = \frac{1}{{1 + \lambda {\text{dist}}(c_{i} ,c_{j} )}}, $$
(4)
where $\lambda$ is the influence factor of semantic distance on semantic similarity, $0 < \lambda \le 1$.

After the preprocessing of the texts, selected the most representative keywords as the keywords set. To calculate the relatedness of a given keyword pair, the calculation formula of the $i \times j$ co-occurrence matrix E_ij generated for the terms in a certain window size k of the corpus is shown as the following formula:

$$ E_{ij} = f^{k} (c_{i} ,c_{j} ), $$

(5)

where $f^{k}$ represents the number of times that concept c_i and concept c_j appear simultaneously in a window containing k words at the entire corpus. The generated co-occurrence matrix E_ij was further processed by the mutual information method based on word distribution. The relatedness matrix of concept c_i and c_j was obtained, and the calculation formula is shown as the following formula:

$$ {\text{rel}}(c_{i} ,c_{j} ) = \left\{ {\begin{array}{*{20}c} {{1, }c_{i} \equiv c_{j} ;} \\ {\log 2\frac{{f^{k} (c_{i} ,c_{j} )}}{{f^{c} (c_{i} ) \times f^{c} (c_{j} )}}} \\ \end{array} } \right.{\text{, others,}} $$

(6)

where $f^{k}$ represents the number of times that concept c_i and c_j appear simultaneously in the k words window at the entire corpus. $f^{c} (c_{i} )$ and $f^{c} (c_{j} )$ represent the frequency of the concepts c_i and c_j at the entire corpus.

The co-occurrence frequency information represents the strength of the content relatedness between concepts in the corpus. The similarity of concepts in the domain ontology represents the strength of the semantic relationship between concepts. Combined the semantic similarity and relatedness between concepts can represent documents more accurately. The following formula is used to normalize and fuse the similarity matrix and co-occurrence matrix of concepts which $\alpha$ represents the weight of semantic similarity:

$$ \begin{gathered} {\text{Sim}}\_{\text{Rel}}(c_{i} ,c_{j} ) = \alpha \times {\text{sim}}(c_{i} ,c_{j} ) + (1 - \alpha ) \times {\text{rel}}(c_{i} ,c_{j} ) \hfill \\ \quad = \alpha \times \frac{1}{{1 + \lambda {\text{dist}}(c_{i} ,c_{j} )}} + (1 - \alpha ) \times \log 2\frac{{f^{k} (c_{i} ,c_{j} )}}{{f^{c} (c_{i} ) \times f^{c} (c_{j} )}}. \hfill \\ \end{gathered} $$

(7)

Improved text classification algorithm

In this paper, the concept model based on the domain ontology proposed above was applied to text categorization. An improved text classification method based on the semantic similarity and relatedness between keywords and categories was proposed. The corpus D contains j documents and denotes as $D = \{ d_{1} ,d_{2} ,...,d_{j} \}$. First, constructed a vector space model for each text, extracted the keywords whose TF is greater than the threshold $\mu$, sorted the keywords according to TF weight, selected the top 20 most representative keywords and the document d_j can be represented as $d_{j} = \{ (t_{1} ,tf_{1} ),(t_{2} ,tf_{2} ),...,(t_{20} ,tf_{20} )\}$. Obtained the keywords set ${\text{Dic}} = \{ t_{1} ,t_{2} ,t_{3} ,...t_{{|{\text{Dic}}|}} \}$ by deleting the repeated words and the pre-defined categories denotes as $C = \{ C_{1} ,C_{2} ,...,C_{m} \}$. Mapped the keyword set ${\text{Dic}} = \{ t_{1} ,t_{2} ,t_{3} ,...t_{{|{\text{Dic}}|}} \}$ with the target categories set $C = \{ C_{1} ,C_{2} ,...,C_{m} \}$ with the constructed domain ontology O. And then constructed the similarity and correlation matrix between each keyword t_i in the keyword set $T_{j} = \{ t_{1} ,t_{2} ,...,t_{i} \}$ of d_j and the category C_m, then t_i can be expressed as an m-dimensional vector $\{ w_{i1} ,w_{i2} ,...,w_{im} \}$. The semantic similarity matrix M between the keyword t_i and the category C_m is calculated by the formula (4). The relatedness matrix Q of the keyword t_i and the category C_m based on the word distribution is calculated by the formula (6). The matrix W is obtained by fusing the matrices M and Q through the formula (7). The element $W_{im}$ in the matrix W represents the semantic similarity and relatedness of the keyword t_i to the category C_m the matrix W generated by the keyword t_i and the category C_m in the text d_j can be denoted as shown in Table 2.

Table 2 Matrix W generated by keyword t_i and category C_m in document d_j

Full size table

Finally, the weighted sum of each transverse dimension vector in d_j is obtained by the formula (8), took the maximum value W_im corresponded category C_m as the text category. The improved text classification method based on semantic similarity and relatedness of keywords and categories is described as follows, and Table 3 defines the main elements:

$$ H_{j} = \sum\limits_{i = 1}^{m} {w_{im} } . $$

(8)

Table 3 The main element definition

Full size table

Compared with the traditional text classification method based on machine learning, the improved text classification method based on the semantic similarity of keywords and categories has the following advantages. First, the proposed improved text classification method does not require enormous amounts of labeled training text. The method is friendly to the textual data without the label. Second, this method uses the domain ontology to map concepts, convert text into low-dimensional space vectors, and reduce space complexity. Thirdly, this method calculates the semantic similarity and relatedness between keywords and categories through domain ontology and overcomes the defect of ignoring the semantic relationship between concepts in the traditional vector space representation method.

Experiment and analysis

This paper presents a methodology for text classification of enterprise’s technological innovation under supply chain without the training set. However, to compare the performance with other text classification methods, labeled the collected textual data with pre-defined categories. The result analysis and performance evaluation are as follows. The structure of the enterprise’s technology innovation domain ontology under the supply chain environment is shown in Fig. 4.

Dataset

This paper’s experimental data mainly consist of enterprises’ application form for technical center certification, provided by the Beijing Municipal Commission of Economic Informatization. The textual data consist of 400 enterprises in Beijing, and after data cleaning and selection, there are 867 valid texts, and the overall data size is about 20 M. Table 4 briefly shows the details of the data collection result. The experimental operating environment is Windows 10 system, 2.70 GHz core processor, 8.0 GB memory, and the Python 3.6.2 used for programming. There are seven pre-defined categories: manufacturing capability, innovation resource, mechanism innovation, innovation output, market innovation, protection measures, and innovation strategy. The labeled textual document set was divided into 70% training set and 30% test set. Part of the labeled textual dataset is shown in Table 5.

Table 4 Summaries of data collection

Full size table

Table 5 Labeled textual dataset of node enterprises

Full size table

Performance comparison with KNN

According to the different application backgrounds, scholars have proposed various indicators for evaluating text classification systems’ performance, including Accuracy, Precision, Recall, F-measure, and Macro-averaging, etc. The most commonly used indicators include precision rate, recall rate, and F-measure. Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. The recall is the ratio of correctly predicted positive observations to all observations in the actual class. The F-measure combines precision and recall, which is the harmonic mean of precision and recall. The following formulas represent the definition of the three methods.

$$ {\text{Precision}} = \frac{{\text{true positive}}}{{{\text{true positive}} + {\text{false positive}}}}, $$

(9)

$$ {\text{Recall}} = \frac{{\text{ture positive}}}{{{\text{true positve}} + {\text{false negative}}}}, $$

(10)

$$ F{\text{-measure}} = \frac{{2 \times {\text{Presicion}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}. $$

(11)

This paper used Precision, Recall, and F-measure indicators to compare the proposed text classification method’s performance based on the semantic similarity of keywords and categories and the KNN classification method based on TF*IDF.

The above performance comparison analysis shows that the value of Recall rate, Precision, and F-measure on the improved text classification based on semantic similarity and relatedness was higher than the KNN method based on TF*IDF (Table 6; Figs. 5, 6, 7). The mean value of precision of the improved text classification method proposed in this paper over 80%. Therefore, compared with the KNN classification method based on TF*IDF, the proposed text classification based on semantic similarity and relatedness between keywords and categories presented in this paper has better classification performance on texts related to enterprise technology innovation under the supply chain environment.

Table 6 The comparison of experimental performance results

Full size table

The improved semantic text classification method proposed in this paper used domain ontology concept sets instead of keywords as each element of the feature vector, enhancing the semantic relationship between words, highlighting the semantic expression, and improving the classification precision rate. The text representation based on domain ontology reduces the space vector's dimension and saves the calculation time. Furthermore, the improved method can also realize the text classification in node enterprise’s technological innovation under the supply chain environment without a labeled training set and has a better classification effect. To some extent, this method solves the problem of text classification that lacks a training set due to the enormous workload of manual labeling in reality.

Result analysis

The semantic similarity and relatedness between keywords and categories are calculated based on the domain ontology of node enterprise’s technological innovation under the supply chain environment. The result is shown from Figs. 8, 9, 10 and 11. The following shows part of the semantic similarity and relatedness matrix between concepts and categories due to space limited.

The improvement semantic text classification method proposed in this paper can effectively classify node enterprises’ collected information in the supply chain and organize the concepts based on semantic similarity and relatedness of enterprise’s technological innovation in the supply chain. The concepts here are the key influencing factor of the node enterprise’s technological innovation within the supply chain. The classification system for key influencing factors can be obtained through the above experimental analysis of semantic similarity and relatedness between keywords and categories. There are seven types of influencing factors of node enterprise’s technological innovation under the supply chain, including manufacturing capability, innovation resources, mechanism innovation, innovation output, market innovation, protection measures, and innovation strategy. According to the semantic text classification, the seven types of first-class factors can be divided into 20 kinds of second-class factors, shown in Table 7.

Table 7 The influencing factors system of node enterprise’s technological innovation in supply chain

Full size table

The influence of manufacturing capability on enterprises’ technological innovation is mainly reflected in transforming the R&D results into manufacturing production. The word “quality control” has a high value of similarity and relatedness with the manufacturing capability, reflecting product quality management’s content. Hence, the item belongs to the category of manufacturing capability. The innovation resources mainly refer to the enterprises’ investment in technological innovation resources. For example, the investment in staff, funds, and equipment in R&D. The mechanism innovation is an innovation activity in various operating mechanisms to enhance the whole enterprise’s competitiveness. The innovation output reflects the production of enterprises’ innovation and the innovation benefits. Market innovation refers to that innovation in product sales and promotion made by enterprises to meet market demands. Protection measures reflect the content of protection measures of intellectual property. Protection measures reflect the protection measures of intellectual property. Technical knowledge protection can promote technology diffusion and attracting foreign capital and technology introduction. The innovation strategy refers to integrating and arranging the enterprise’s internal and external innovation resources and technologies from the overall system with enterprise operation.

Conclusions

The knowledge of enterprises’ technological innovation under the supply chain environment is the information source such as the database or documents collected from the supply chain’s node enterprises. The knowledge organization is a process of classification and analysis of messy, complex, and huge information. This paper introduces domain ontology to make the knowledge organization system semantic and knowledgeable and constructs an ontology of the enterprises’ technology innovation under the supply chain. It utilizes the relationship between the domain ontology concepts to describe the existing enterprises’ knowledge management system’s semantic information. An improved semantic text classification method was proposed in this paper, which can obtain a document’s category by calculating the weighted maximum value semantic similarity and relatedness of the text’s key feature words and categories. This method enhances the semantic relationship between words, reduces the space vector's dimension, and saves calculation time. Furthermore, this paper’s improved method can classify the document based on the domain ontology hierarchy without a labeled training set—the mean value of precision of the improved text classification method is over 80%.

The contributions of this study are twofold. From an academic perspective, the improved text classification method proposed in this paper had a better performance than the KNN classification method based on TF*IDF. From a practical standpoint, this paper constructs a domain ontology for enterprises’ technological innovation under the supply chain from a practical standpoint. It helps to summarize and classify the innovation information under the supply chain, providing researchers or managers with influential factors of innovation under the supply chain and understanding the production knowledge dynamically in this field.

However, there are still some limitations in this paper that future researches should solve. For example, first, the method proposed in this paper requires domain ontology to provide background knowledge and concept mapping. Future researchers may consider using the general ontology that can be applied to more fields. Second, future researchers can consider more influential factors of similarity and relatedness between concepts to increase the word association and improve text classification accuracy.

References

Modi SB, Mabert VA (2010) Exploring the relationship between efficient supply chain management and firm innovation: an archival search and analysis. J Supply Chain Manag 46(4):81–94
Article Google Scholar
Li G, Li L, Choi TM, Sethi SP (2020) Green supply chain management in Chinese firms: innovative measures and the moderating role of quick response technology. J Oper Manag 66(7–8):958–988
Article Google Scholar
Ju KJ, Park B, Kim T (2016) Causal relationship between supply chain dynamic capabilities, technological innovation, and operational performance. Manag Prod Eng Rev 7(4):6–15
Google Scholar
Lee VH, Ooi KB, Chong AYL, Sohal A (2018) The effects of supply chain management on technological innovation: the mediating role of guanxi. Int J Prod Econ 205:15–29
Article Google Scholar
Squire B, Burgess K, Singh PJ, Koroglu R (2006) Supply chain management: a structured literature review and implications for future research. Int J Oper Prod Manage 26(7):703–729
Article Google Scholar
Mentzer JT, DeWitt W, Keebler JS, Min S, Nix NW, Smith CD, Zacharia ZG (2001) Defining supply chain management. Int J Oper Prod Manage 22(2):1–25
Google Scholar
Saleem H, Li Y, Ali Z, Ayyoub M, Wang Y, Mehreen A (2020) Big data use and its outcomes in supply chain context: the roles of information sharing and technological innovation. J Enterp Inf Manag. https://doi.org/10.1108/JEIM-03-2020-0119
Article Google Scholar
Schniederjans DG, Curado C, Khalajhedayati M (2020) Supply chain digitisation trends: an integration of knowledge management. Int J Prod Econ 220:107439
Article Google Scholar
Kim D, Kim S (2017) Sustainable supply chain based on news articles and sustainability reports: text mining with Leximancer and DICTION. Sustainability 9(6):1008
Article Google Scholar
Chu CY, Park K, Kremer GE (2020) A global supply chain risk management framework: an application of text-mining to identify region-specific supply chain risks. Adv Eng Inform 45:101053
Article Google Scholar
Chircu A, Kononchuk N, Li G, Qi Y, Stavrulaki E (2016) Business analytics and supply chain and operations management–a text mining-based literature review. In: Proceedings for the northeast region decision sciences institute, NEDSI, pp 1–24
Rozados IV, Tjahjono B (2014) Big data analytics in supply chain management: trends and related research. In: 6th International conference on operations and supply chain management, OSCM, pp 10–13
Sathya S, Rajendran N (2015) A review on text mining techniques. Int J Comput Sci Trends Technol 3(5):274–284
Google Scholar
Thangaraj M, Sivakami M (2018) Text classification techniques: a literature review. Interdiscip J Inf Knowl Manag 13:117–135
Google Scholar
Kim HJ, Kim J, Kim J, Lim P (2018) Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning. Neurocomputing 315:128–134
Article Google Scholar
Goudjil M, Koudil M, Bedda M, Ghoggali N (2018) A novel active learning method using SVM for text classification. Int J Autom Comput 15(3):290–298
Article Google Scholar
Wang Z, Qu Z (2017) Research on Web text classification algorithm based on improved CNN and SVM. In: IEEE 17th International Conference on Communication Technology (ICCT). IEEE, pp 1958–1961
Azam M, Ahmed T, Sabah F, Hussain MI (2018) Feature extraction based text classification using k-nearest neighbor algorithm. Int J Comput Sci Netw Secur 18(12):95–101
Google Scholar
Moldagulova A, Sulaiman RB (2017) Using KNN algorithm for classification of textual documents. In: 8th International conference on information technology (ICIT), pp 665–671
Thorleuchter D, Van den Poel D (2013) Technology classification with latent semantic indexing. Expert Syst Appl 40(5):1786–1795
Article Google Scholar
Kou G, Peng Y (2015) An application of latent semantic analysis for text categorization. Int J Comput Commun Control 10(3):357–369
Article Google Scholar
Elhadad MK, Badran KM, Salama GI (2018) A novel approach for ontology-based feature vector generation for web text document classification. Int J Softw Eng Appl 6(1):1–10
Google Scholar
Abdollahi M, Gao X, Mei Y, Ghosh S, Li J (2019) An ontology-based two-stage approach to medical text classification with feature selection by particle swarm optimization. In: 2019 IEEE congress on evolutionary computation (CEC). IEEE, pp 119–126
Cerri R, Barros RC, de Carvalho AC (2015) Hierarchical classification of gene ontology-based protein functions with neural networks. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Liu JNK, He Y, Lim EHY, Wang XZ (2014) Domain ontology graph model and its application in Chinese text classification. Neural Comput Appl 24(3):779–798
Article Google Scholar
Albitar S, Fournier S, Espinasse B (2014) An effective TF/IDF-based text-to-text semantic similarity measure for text classification. In: International conference on web information systems engineering, Springer, pp 105–114

Download references

Acknowledgements

This work was supported by Beijing Social Science Foundation under Grant 18JDGLA018, 19JDGLA002, MOE (Ministry of Education in China) Project of Humanities and Social Sciences under Grant 19YJC630043, and was partially supported by Beijing Logistics Informatics Research Base. We appreciate their support very much.

Author information

Authors and Affiliations

School of Economics and Management, Beijing Jiaotong University, No.3 Shangyuancun, Haidian, Beijing, 100044, China
Qianqian Zhang, Shifeng Liu & Qun Tu

Authors

Qianqian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shifeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qun Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun Tu.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, Q., Liu, S. & Tu, Q. Knowledge organization of node enterprises’ technological innovation under supply chain environment. Complex Intell. Syst. 9, 2459–2473 (2023). https://doi.org/10.1007/s40747-021-00388-9

Download citation

Received: 20 November 2020
Accepted: 24 April 2021
Published: 12 May 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s40747-021-00388-9

Knowledge organization of node enterprises’ technological innovation under supply chain environment

Abstract

Similar content being viewed by others

Internet Articles Classification by Industry Types Based on TF-IDF

Combining machine learning and main path analysis to identify research front: from the perspective of science-technology linkage

Enriching BERT With Knowledge Graph Embedding For Industry Classification

Introduction

Literature review