Keywords

1 Introduction

With the development of data creating, releasing, storing and processing technologies, data is showing a rapid growth trend in all society areas. Of all the data available to the human civilization, 90% were produced in the past two years, the big data era has arrived [16]. Knowledge is awareness and understanding about people or things in the objective world, which is generated by feeling, communicating and logic inference activities in the course of practice and education and maybe facts, information or skills. The information chain, formed with “fact \(\rightarrow \) data \(\rightarrow \) information \(\rightarrow \) knowledge \(\rightarrow \) wisdom”, indicates that big data contains huge amount of information, from which large knowledge can be extracted. Big data gives rise to the emergence of large scale knowledge bases. Famous knowledge base research projects, e.g. DBpedia, KnowItAll, NELL and YAGO, use information extraction techniques acquiring knowledge from high quality network data sources (e.g. Wikipedia), and automatically realize its construction and management [22]. Meanwhile, big data brings about information overload and pollution too, in which knowledge presents characteristics of heterogeneity, diversity and independence. In the era of data, with rapidly increasing of information and knowledge, knowledge discovery has become the research focus in various disciplines, including data science and information science [25]. Therefore, in order to improve the efficiency and quality of knowledge service, issues of analysing and utilizing knowledge existing in big data, eliminating the inconsistency between different knowledge sources, and extracting, discovering and inducing the potential valuable connotations, have become important in knowledge management research.

The continuous formation and evolution have brought about autonomous, heterogeneous and multi-source features of knowledge. Knowledge Fusion (KF) is a process of acquiring and utilizing knowledge aiming at the problem of knowledge service. Operated by KF activities, implicate and undiscovered valuable knowledge is mined from various distributed and heterogeneous data sources. KF converts autonomous knowledge into new one with higher levels of intension and reliability, helps users to find potential associations between knowledge and fact, and improves decision-making levels by making more efficient, objective and scientific judgments. KF becomes a new growth point for knowledge service [23].

As an important part of knowledge management and engineering, KF has been widely received the attention of scholars in many fields, such as computer science, knowledge engineering and information science. Smirnov et al. [21] investigates patterns for context-based KF In the decision support systems. Dong et al. [7] analyses differences and relations between data fusion and KF, and realizes KF processes by combining knowledge extraction and traditional data fusion methods together. Tang and Wei [23] discusses the requirement of big data KF and its basic framework. Liu et al. [15] defines a structure of multi-domain ontology and provides dynamic ontology based on KF demands through mappings between different domain ontologies. Xu et al. [24] designs a KF framework based on ontology, which is consists of several parts, such as constructing meta knowledge set, determining knowledge measurement indicators, designing fusion algorithm, applying fused knowledge, and so on. Qiu and Yu [20] summaries the KF implementation path as four types based on semantic rules, Bayesian networks, D-S theories and knowledge mining, with which Zhou et al. [26] discusses various KF processing algorithms. Guo et al. [9] reviews and evaluates research trends and theoretical developments of KF, and indicates that, there is not yet a formed general framework for KF systems, as well as directly applicable KF algorithms and standardized KF procedures. The existing research mainly focuses on specific KF frameworks, algorithms, and practical theories.

In terms of time distribution of related literatures, KF is a new research topic which is produced with the change of knowledge service requirements and the development of knowledge management research. In order to implement KF in practice, it is necessary to correctly understand KF connotation by analysing relations and differences among various relative notions, i.e. knowledge fusion, knowledge integration, information fusion and data fusion, and analyse KF implementation patterns and its process models.

2 Knowledge Fusion Connotation

2.1 Conception of Knowledge Fusion

KF is a new concept developed on the basis of information fusion. There are many intersections between the two research areas. The early definition of KF is given by Preece in the KRAFT project [19], refers to a process locating and extracting knowledge from multiple, heterogeneous on-line sources and transforming it so that the union of the knowledge can be applied in problem-solving. The KF system in KRAFT project includes three layers of services: knowledge retrieval, transformation and fusion, in which KF is defined to associate, link and simplify the transformed distributed knowledge with a unified model, and provide solutions for the problem under specific conditions.

Smirnov et al. [21] proposes that the aim of KF is to integrate multi-source information and knowledge into a unified knowledge structure model, in order to allow decision-makers to understand and look insight into the decision-making environment and provide the needed knowledge to solve problems. Hou et al. [11] and Xu et al. [24] believe that KF is the process of intelligently processing distributed databases, knowledge bases and data warehouses, and acquiring new knowledge by transformation and integration procedures. It aims to realize the sharing and cooperation between different knowledge resource systems, and apply knowledge mining among knowledge bases. These definitions have carried on the inheritance and development to the Preece’s KF concept, which is emphasized that fusion results are productions of new knowledge.

Guo et al. [9] and Tang and Wei [23] propose that KF is mainly studying the transformation, integration and aggregation processes in distributed knowledge base systems in order to generate new knowledge, and investigating optimization processes of knowledge structures and contents to provide knowledge service. This definition concerns processes of knowledge innovation and knowledge optimization, indicates the KF aim as providing knowledge services, and extends the KF object from traditional resources (such as databases, knowledge bases, fact parameters acquired by sensors, etc.) to the one including rules, models, methods, and even experiences, ideas, etc. In other words, the object of KF includes not only explicit knowledge, but also tacit knowledge.

Dong and Srivastava [8] considers KF as the issue assessing and measuring the accuracy of extracting knowledge. In the process of building a knowledge base, it is required to extract knowledge from distributed data sources, and integrate it into the base. A number of different knowledge extractors might be used during knowledge extraction, and each extractor generates its corresponding knowledge results. So, it is required to evaluate the accuracy of each extracted result to improve the correctness of knowledge bases.

Hu and Cao [10] extracts and transforms sentences in Web page texts into triple semantic nets for representing knowledge. It defines KF as the process eliminating contradictions among extracted knowledge and integrating its structures in accordance with user constraints and rules, which solves problems of incomplete, fuzzy, redundant and inconsistent knowledge contained in Web page texts.

Kampis and Lukowicz [12] proposes the notation of Collaborative KF, and indicates that traditional KF assumes informational completeness, while collaborative KF is a version of KF where traditional fusion events are local, e.g. happen upon the meetings of individual knowledge providers, and global fusion happens due to the collective (hence “collaborative”) interaction dynamics. In collaborative KF, there is no guarantee that different knowledge sources were keeping unchanged and available at any time.

To sum up, concepts of KF are different in different periods and research fields. In the field of computer science and database research, KF emphasizes on the representation, transformation, cleansing and integration of explicit knowledge, focuses on eliminating the inconsistency, incompleteness, redundancy and uncertainty of knowledge among different knowledge sources, which mainly investigates on KF algorithm design and implementation so as to improve the standardization and credibility of fused knowledge. In the field of library and information science, knowledge refers to the sum of cognition and experience in the practice of changing the world, in which both explicit knowledge and tacit knowledge are concerned. KF research is to construct theory and method systems, which emphasizes on the integration of tacit knowledge and its impact.

2.2 Knowledge Fusion and Knowledge Integration

KF and knowledge integration are both knowledge object-oriented in terms of dealing with different structure and multi-source knowledge, which have connections and differences to each other. Literally, “integration” is the process of aggregating multiple individual objects to form a whole one, while “fusion” is the process of recombining multiple individual objects, splitting and dismantling it into a complete one. Integration emphasizes on aggregation and combination, while fusion more on merging and reorganizing. After fusion process, knowledge objects are supposed to have new emerging features relative to original ones.

Scholars have given definitions of knowledge integration from various perspectives. In the field of management, library and information science, Liu and An [13] indicates that knowledge integration refers to the process of dynamically enhancing the core competitiveness of an organization though different merging levels between knowledge and knowledge, knowledge and people, and knowledge and procedures, which aims to realize the knowledge innovation. Cai and Chen [6] gives a review of knowledge integration research, and proposes that knowledge integration is a comprehensive process of technology organization and human resource management, in which the initiative and creativity of the integrated entity need to be emphasized. Knowledge integration is an essentially important step in the dynamic process of knowledge innovation.

In the field of computer science and automatic control, knowledge integration research emphasizes on handling organizable and expressible explicit knowledge. Liu and Ma [14] indicates that, knowledge integration is mainly to identify, process, evaluate and reform new knowledge, to realize interactions between new knowledge and original one, and to provide users with an unified knowledge access interface and intelligent knowledge service by integrating different knowledge structures. Bohlouli et al. [4] investigates a knowledge integration framework based on big data analysis platform, divides knowledge integration processes into acquisition, representation, evaluation, transformation, aggregation and matching of knowledge, which is to provide services for intelligent knowledge retrieval.

In the field of library and information science, relative research is gradually changing from resource integration to resource aggregation. Resource integration refers to combination of all the relative independent resources to a new organic whole, through reorganizing, coordinating, recombining and optimizing the existing status of resource portfolio, which aims to solve the problem of information redundancy, content duplication and inconsistence between primary and secondary documents, while resource aggregation is borrowed from the concept of organic chemistry and refers to fusing knowledge elements to generate new ones by using artificial intelligence technologies, which aims to discover internal semantic associations among resources. Resource aggregation constructs a multidimensional and multi-level resource system with content correlation, and forms a solid knowledge network combining concept themes, subject contents and research objects as a whole [5]. At the conceptual level, KF and resource aggregation have the similar connotations.

Therefore, this paper argues that KF is the advanced stage of knowledge integration. KF applies fusion algorithms and matching rules over the result of knowledge integration to implement deduction, discovery and innovation of knowledge. Furthermore, KF is also difference from knowledge aggregation, in which KF has no need to keep and remain all knowledge concepts, relationships and instances from the original sources, but need to construct the required objects meeting knowledge service demands.

2.3 Fusion of Data, Information and Knowledge

In practice, the term “data”, “information” and “knowledge” are not strictly distinguished in statements, and can even be used interchangeably. However, there is a general consensus on distinguishing between the three concepts. A commonly held view, including minor variants is that data is raw numbers and facts without processing, information is processed data, and knowledge is the result of learning and reasoning [1].

The concept of data fusion is mostly in the field of computer science and engineering science. Bleiholder and Naumann [2] indicates that data fusion is the last step in a data integration process, where schemata have been matched and duplicate records have been identified. Data fusion merges duplicate records into a single representation and, at the same time, resolves existing data conflicts. Dong and Gabrilovich [7] also indicates that data fusion aims at resolving conflicts from data and increasing correctness for data integration.

Information fusion is a multidisciplinary research field widely concerned by academic and industrial scientists, and in lots of literature, terms of information/data fusion and information/data integration are used interchangeably. Typically, information fusion refers to the study on efficient methods for automatically or semi-automatically transforming information in time from different sources and different points into a representation that provides effective support for human or automated decision making [3].

Thus, generalized information fusion involves intersections of multiple disciplinary for the processing different information objects. According to application scenarios and processing objects, data/information/knowledge fusions can be regarded as the different levels of abstraction for realizing generalized information fusion. Data fusion is the process of removing noise and redundancy, reducing uncertainty and improving accuracy and reliability of original data at signal and pixel levels. Information fusion is the process of extracting features from multi-source raw data and eliminating contradictions between data contents to improve the consistency and reliability of fused information providing local supports for decision-makers. Data fusion handles raw data on the signal level, and so does information fusion on the feature level. Both of them are belonging to the low-level fusion, while the high-level KF is on the decision level, which involves processes of situation awareness and assessment, influence degree evaluation, fusion optimization, mining implicit information, reasoning and judgment of decision conditions, and so on.

3 Knowledge Representation Based on Ontology

Knowledge representation is the process of symbolizing, formalizing and modeling knowledge, which is the foundation of knowledge organization and the prerequisite for realizing knowledge management. Traditional knowledge representation technologies include state-space, predicate logic, generative rule and frame methods. Along with the discipline crossing and increased complexity of knowledge, methods of neural network, fuzzy set, object-oriented and ontology are developed for knowledge representation. Different knowledge representation methods lead to heterogeneities of knowledge, which is an emerging issue addressed in the research of KF systems.

Although the expressive power and reasoning ability of ontology is less than the traditional formal methods, in order to solve the problem of heterogeneous knowledge, many researches use ontology to represent knowledge and construct knowledge bases [9]. As a structured knowledge representation method, ontology is able to abstractly express a domain as a set of concepts and relationships between the concepts, and unify the domain concepts for sharing the formal specification of the conceptual model, exchanging and reusing knowledge between human and computers. In the Web Ontology Language, OWL 2 Footnote 1, recommended by W3C, the basic modeling elements of ontology are Classes, Properties, and Individuals. All entity objects are represented as individuals, while type of entities as classes, and entity relationships as attributes. Attribute can be further refined as sub-attributes, such as object relationships, object features, object value ranges, and so on. Pérez and Benjamins [18] classifies five ontology modeling primitives: Concepts, Relations, Functions, Axioms and Instances. A concept can be anything including the description of a task, function, action, strategy, reasoning process, etc.; Relations represent a type of interaction between concepts of the domain; Functions are a special case of relations in which the n-th element of the relationship is unique for the n-1 preceding elements; Axioms are used to model sentences that are always true; and instances are used to represent elements.

Based on the OWL 2 definition and Pérez’s five modeling primitives, we define a knowledge ontology as the form of five-tuple: \(ontology(O) = \langle C, A, R, D, I\rangle \), where C is a set of concepts or classes with hierarchical structure; A is a set of attributes describing features of concepts, and usually defined as attributes of classes; R is a set of relationships, including functions, axioms and other constraints, representing effective associations between concepts, such as father, son and equality relationships, functional relationships and True assertions; D is a set of attribute domains, describing fields or value ranges of attributes; and I is a set of instances, containing entity objects of concept classes.

For example, if \(\langle C_H, A_H, R_H, D_H, I_H\rangle \) is defined as an ontology for describing hypertension, set \(C_H\) may contain concepts such as \({\langle \!\langle }\mathsf {HBP}{\rangle \!\rangle }\), \({\langle \!\langle }\mathsf {Cause}{\rangle \!\rangle }\), \({\langle \!\langle }\mathsf {Symptom}{\rangle \!\rangle }\), \({\langle \!\langle }\mathsf {Therapy}{\rangle \!\rangle }\), \({\langle \!\langle }\mathsf {Patient}{\rangle \!\rangle }\), etc.; set \(A_H\) contains attributes of the concepts such as \({\langle \!\langle }\mathsf {HBP},\mathsf {type}{\rangle \!\rangle }\), \({\langle \!\langle }\mathsf {HBP},\mathsf {level}{\rangle \!\rangle },{\langle \!\langle }\mathsf {Cause},\mathsf {humoral}{\rangle \!\rangle }, {\langle \!\langle }\mathsf {Cause},\mathsf {nervous}{\rangle \!\rangle }\), etc.; set \(R_H\) indicates relationships between concepts, e.g. \(father({\langle \!\langle }\mathsf {HBP}{\rangle \!\rangle },{\langle \!\langle }\mathsf {PrimaryHBP}{\rangle \!\rangle })\) means that \({\langle \!\langle }\mathsf {HBP}{\rangle \!\rangle }\) is the father class of \({\langle \!\langle }\mathsf {PrimaryHBP}{\rangle \!\rangle }\); and if any, \(D_H\) and \(I_H\) may contain concept value ranges and its instances.

The five-tuple form reflects the process of hierarchically modeling knowledge from entities to concepts. If only knowledge entities or concepts are separately considered to be merged, the KF process is not comprehensive and completed. In other words, all elements of the knowledge ontology form need to be handled in KF processes, which will be discussed in the next section as KF patterns.

4 Patterns of Knowledge Fusion

So far, there are not many literatures about KF patterns. Xu et al. [24] classifies KF into active and passive types. Qiu and Yu [20] and Zhou et al. [26] discuss several kinds of KF processing algorithms. Smirnov et al. [21] proposes seven context-based KF patterns, i.e. Simple, Extension, Configured, Instantiated, Flat, Historical and Adaptation Fusion, which are classified upon the problem solved by each KF process for satisfying the requirement of the decision support system.

In this section, we classify KF patterns, from the perspective of knowledge representation, according to the five-tuple ontology form.

Instance Fusion is the process of removing redundancy, deducing noise, correcting error and merging content for entity objects and producing a new set, in which knowledge sources usually have the same modeling structure, or can be converted into the same one. After Instance Fusion, the modeling structure of source knowledge is totally or partly inherited into the fused target in accordance with user definitions and requirements, where the pertinence, consistency and correctness of knowledge entities are improved. There is a substantial overlap between Instance Fusion and traditional information fusion, so that the former can be implemented by using the latter fusing methods as references.

Domain Fusion is the process of applying set operations like UNION, INTERSECT, MINUS and EXCEPT on attribute fields or value ranges of source knowledge entities, resulting in attribute definitions of fused knowledge entities. When Instance Fusion is applied, knowledge sources might be in the same modeling structure but different domains, which is required to redefine the attribute domain of fused knowledge. Domain Fusion remains the modeling structure of source knowledge, but change its attribute fields or value ranges, which is an extension and expansion of Instance Fusion.

Relationship Fusion is the process of merging relationships in source knowledge by removing redundancy and combining structures, as well as applying inductive and deductive reasoning over relationships for inferring and mining a new one. Relationships in knowledge ontology include interactions between concepts, affiliations between concepts and attributes, functions defining particular mappings, and axioms representing true assertions. Relationship Fusion explores and derives new relationships according to original ones in the source, in which modeling structures might be different from either each other, or the fused one where the new knowledge is generated.

Attribute Fusion is the process of comparing, analysing, transforming and merging attributes of knowledge concepts, in terms of classifying, selecting and reorganizing the object features according to users requirements. In the situation of Attribute Fusion, there are usually differences between modeling structures of knowledge sources, especially including complementary, contradiction and homograph differences in attribute definitions. After Attribute Fusion, new attributes appear in the fused knowledge, and new relationships are also required to correspond with them. Thus, Attribute Fusion and Relationship Fusion are two complementary and alternately iterative processes, both are important parts of knowledge discovery and innovation processes.

Concept Fusion is the process of constructing new knowledge concepts, which might bring about new attributes and new relationships as well. Therefore, it is not possible to individually produce Concept Fusion separately from the other KF patterns, which have to be based on Instance Fusion, iteratively and incrementally applying Domain, Relationship and Attribute Fusions to achieve a whole fusion process. Concept Fusion is considered as the high level of the KF hierarchy, where Domain, Relationship and Attribute Fusions are middle levels between the low level Instance Fusion and the high level Concept Fusion. It is difficult to directly apply traditional information fusion methods for Concept Fusion to generate new knowledge, thus new KF approaches need to be developed, and participations of domain experts are also required for the completion of knowledge innovation.

5 Process Model of Knowledge Fusion

As discussed above, different KF patterns meet different requirements and produce different fusion results. This section proposes two types of process models to analyse the operational mechanism of KF patterns.

5.1 One-Dimension KF Process Model

Relationship, Attribute and Concept Fusions are processes of knowledge innovation, to a certain extent, by changing the original knowledge models and generating a new one; Instance Fusion changes knowledge objects in terms of consistency, correctness, validity and quantities, which is a process of manifesting and discovering knowledge; and Domain Fusion is the transitional phase from knowledge discovery to knowledge innovation, which does not change the original knowledge model but the value range of the concepts.

Fig. 1.
figure 1

One-dimension KF process model

Figure 1 gives the one-dimension KF process mode to illustrate relationships among the five KF patterns. The requirement of Domain Fusion is generated on the basis of Instance Fusion. In different knowledge sources, value ranges of concepts might be different from each other, which is required to be adjusted, merged and redefined, i.e. producing Domain Fusion, to meet the demand of Instance Fusion. After changes of concept domains, relationships between the concepts may also need to change so as to affect the inferring results of Relationship Fusion. E.g. the increase or decrease of a concept value ranges is likely to affect the establishment of equal relationships between the concepts. At the same time, Relationship Fusion and Attribute Fusion are also two interactive and complementary processes. The production of new attributes might lead to the generation of new relationships, and vice versa.

Therefore, the three KF patterns, i.e. Domain Fusion, Relationship Fusion and Attribute Fusion, are performing in a way of loop iterations. In order to eventually achieve Concept Fusion, each iteration makes a further step in the progress of generating new knowledge. Thus, KF processes could not be completed only by a single fusion pattern, nor by a stepwise linear procedure. All fusion patterns need to be comprehensively considered, and KF is realized in a way of loop iteration, incremental progression and spiral development.

5.2 Two-Dimension KF Process Model

As mentioned above, KF generates new knowledge and produces knowledge innovation, while the aim of knowledge innovation is to provide better knowledge service. Nonaka et al. [17] summarizes knowledge innovation processes into four stages: Socialization, Externalization, Combination and Internalization, as known as the SECI model, describing transformations between tacit and explicit knowledge. Socialization is the process of converting new tacit knowledge through shared experiences; Externalization is the process of articulating tacit knowledge into explicit knowledge; Combination is the process of converting explicit knowledge into more complex and systematic sets; Internalization is the process of embodying explicit knowledge into tacit knowledge.

Fig. 2.
figure 2

Two-dimension KF process model

In the SECI model, knowledge is created through a spiral by applying the four processes in a way of circular loop rather than a stepwise linear procedure, which is similar to the implementation of KF patterns. Although it is not able to directly map the KF patterns with the SECI stages, the common characteristic makes it possible to organically combine the two processes accordingly, as shown in Fig. 2, in order to achieve the accurate, personalized and effective knowledge service in accordance with the user requirement. In particular, during the stages of Socialization and Externalization, methods for fusing instances and domain can be used to discover tacit knowledge objects, and methods for fusing relationships and attributes can be used to articulate it into an explicit one, while during the stages of Combination and Internalization, the fusion patterns are naturally involved since they are both supposed to handle explicit knowledge.

The two-dimensional KF process model shows relationships between the innovation stages and the fusion patterns and indicates that, although KF patterns proposed in this paper are based on the ontology representation of explicit knowledge, it have the potential to expand to tacit KF, which is one of the research issues in our future work.

6 Conclusion and Future Work

The big data era brings distributed, heterogeneous and autonomous knowledge, from which KF integrates, discovers and exploits valuable knowledge for achieving a high quality service. This paper discuss the KF connotation in terms of giving the definition of KF and analysing the relation and difference between KF and various notions, such as knowledge integration, information fusion and data fusion. Then, we introduce five KF patterns, i.e. Instance, Domain, Relationship, Attribute and Concept Fusion, and indicate that the KF process is implemented in a way of loop iteration, incremental progression and spiral development, rather than only by a single step, nor a stepwise linear procedure. Finally, two types of dimensional KF process models are proposed to illustrate relationships between knowledge innovation stages and KF patterns. In future, we will implement the KF patterns in a specific application domain, e.g. chronic disease domain, and extend it to handle tacit knowledge.