Keywords

1 Introduction

Linked data technologies enable the integration of bibliographic data into the web and allow the web users to navigate through the bibliographic universe. Library linked data initiatives have already been launched in various countries all over the world. Each initiative was developed within the framework of different projects aiming to address different needs. Therefore, entities of the bibliographic universe are perceived, defined and described in different manners. Definitions of these entities may be found either in bibliographic conceptual models (e.g. FRBR, BIBFRAME, etc.) or in the local schemata used by the projects (e.g. Linked Open BNB/British Library, data.deichman.no/Oslo Public Library).

Navigating through the bibliographic universe is often an intricate process due to the relationships, explicitly or implicitly defined, that interlink bibliographic entities. Content relationships may explicitly or implicitly exist between bibliographic entities generating bibliographic families. The term bibliographic family has been coined by Professor Smiraglia to describe ‘a set of related bibliographic works that are somehow derived from a common progenitor’ [1]. Works or Expressions within the same bibliographic family may share the same intellectual content and be related to the progenitor through different types of relationships. The identification of bibliographic families and the clustering of all related entities are extremely important and one of the main functions that library catalogs need to deliver [2,3,4].

Library data conceptual models include constructs that enable the description of such content relationships. A key requirement for successful mappings between different conceptual models is to preserve content relationships and hence to approach the model’s compatibility degree to the bibliographic families, after the mapping and the data transformation [5,6,7,8]. Preservation of bibliographic families, based on the Smiraglia definition [1, 4], means the preservation of information that two or more Works originate from a common progenitor. This study investigates whether and how content relationships could be preserved when transforming data from FRBR to BIBFRAME 2.0 (hereafter referred as BIBFRAME), as well as their bibliographic families. We focus on these two data models because FRBR is a major milestone in the evolution of bibliographic data conceptualization; BIBFRAME is being developed by the Library of Congress and is expected to supersede the MARC21 standard.

Due to the models’ different conceptualizations, mappings should be refined by revealing content relationships and bibliographic families. A content relationship and a bibliographic family within the semantics of a library data conceptual model are instantiated following representation patterns. Therefore, in order to evaluate whether content relationships and bibliographic families are preserved after their transformation from a source to a target model, their representation patterns in the source and the target models have to be defined. Then, the target representation pattern should be compared with the representation pattern resulted from the transformation mappings. Representation patterns have been studied by other scholars in terms of identifying good practices for the representation of specific bibliographic cases using a model’s semantics [9, 10]. It should be clarified that a representation pattern does not express uniquely a bibliographic description case, because there exist alternatives of expressing the same semantics using the terms of a model.

In the next section some definitions are given and the background of our research. In Sect. 3, mappings for selected content relationships and bibliographic families using their representation patterns are presented. Conversions from FRBR to BIBFRAME are studied following the proposed methodology. Key findings are presented in the discussion and conclusions section. It must be noted that for clarity reasons the names of models’ classes/entities and properties are written in the text in italics.

2 Background

In the environment of the different conceptual models for the library data and the volumes of data that have been published to the World Wide Web, the development of automated mechanisms for their transformations and interlinking requires the development of mappings between the conceptual models. Mappings is one way of tackling interoperability problems and enable either the transformation of instances of a source model to instances of a target model or the integration of data that are expressed by the terms of different models.

Successful mappings preserve the semantics of the source model into the target model. Bibliographic relationships are important for navigation in the bibliographic universe and both FRBR and BIBFRAME models include constructs to describe bibliographic entities and the relationships between them. Bibliographic relationships between works have been studied by Tillett in [11]. Tillett created a taxonomy of bibliographic relationships and identified seven types of them: equivalent, derivative, descriptive, whole-part, accompanying, sequential and shared characteristic ones [11]. The equivalent, derivative and descriptive relationships have been characterized by Tillett [12] as “close content relationships that can be viewed as a continuum starting from an original work”. The derivative bibliographic relationship is “broad ranging” [13]. Therefore, Smiraglia [14] focused on derivation and identified eight types of derivative bibliographic relationships. He also coined the term bibliographic family to express Works that somehow derive from a common original Work, also known as the progenitor. Smiraglia also found that older and/or popular Works tend to have large and complex families [1, 14]. Such families formulate information networks consisting of nodes, which are instances of bibliographic entities, and arcs, that interconnect the instances and denote their relationships. Therefore, Smiraglia has extended the concept of bibliographic families using the new term instantiation network [7].

The preservation of content relationships and bibliographic families in the mapping and data transformation process between two library data conceptual models is not a straightforward issue due to semantic and structural heterogeneities between the models [15]. Therefore, representation patterns for both FRBR and BIBFRAME need to be identified so as the semantics of the content relationships and bibliographic families in the terms of each conceptual model is described. We use the term representation pattern for the representation of each relationship/bibliographic family in each conceptual model. We define the concept of representation pattern for a bibliographic family F a graph G fm (C fm , P fm ), where C fm is a subset of the set C of the classes of a conceptual model M and P fm a subset of the set P of the properties of a conceptual model M, such that for every triple (C fmd , P fmi , C fmr ) in G fm , C fmd is the domain class and C fmr is the range class of the property P fmi .

The methodology followed in this paper for developing mappings between library data conceptual models is presented below:

  1. 1.

    Description of the bibliographic relationships and family (e.g. translation)

  2. 2.

    Definition of their representation pattern(s) in each model.

  3. 3.

    Mapping between Source representation pattern and Target representation pattern. Due to the semantic and structural heterogeneities of FRBR and BIBFRAME, it is important in particular cases to define the conditions that enable proper mapping, e.g. the existence of a specific attribute of a class or a specific value to an attribute.

The mappings are tested using a real example, the Homer’s ‘Odyssey’ bibliographic family and some of its members, for the cases where FRBR is the source model and BIBFRAME is the target model.

3 Mapping Content Relationships and Bibliographic Families

The paper gradually leads the examination process from simple to more complex bibliographic families. The cases studied are Works with a single Expression, Works with multiple Expressions, and derivations, namely translations and adaptations. When representation patterns are depicted, the nodes symbolize the corresponding classes, while the edges illustrate the properties between the classes of each model. Each node is divided in two smaller boxes: the upper one denotes the class, while the lower one provides its instance. For readability reasons the lower box denoting a class’ instance includes a small description and neither the instance’s full title and/or related details, nor its complete URI.

3.1 Work with a Single Expression

The simplest and the most frequent bibliographic case [16] is a Work with a single Expression and a single Manifestation, e.g. a monograph (book) in a language. In FRBR the Work entity is an abstract entity that delimits a distinct intellectual creation, as initially intended by its author(s). The Work is realised through an Expression, a realization of the Work in a specific form and set of signs. It must be noted that due to the abstract nature of the Work entity, a Work is mainly recognized through its various Expressions. These Expressions are embodied in Manifestation entity instances. An exemplar of all identical copies exemplifying a Manifestation, is represented by the Item entity. The representation pattern of this bibliographic description case in the terms of the FRBR model is presented in Fig. 1.

Fig. 1.
figure 1

Representation pattern for a Work with a single Expression in FRBR.

BIBFRAME defines different conceptualizations. A Creative Work instance represents both the idea of an intellectual creation and its form of realization. The material embodiment of the Creative Work (bf:Work) is expressed with the bf:Instance class. A copy of the bf:Instance held at a library is represented by a bf:Item class instance. BIBFRAME does not define different classes for differentiating between the abstract idea of an intellectual creation and therefore Creative Work ‘seems to be semantically closer to the (union of the) FRBR Work and Expression entities’ [17, 18]. This difference in conceptualizing basic bibliographic entities is likely to prove crucial to prospective transformations of bibliographic data between the two models. The representation pattern of this bibliographic description case in terms of BIBFRAME is presented in Fig. 2.

Fig. 2.
figure 2

Representation pattern for a Work with a single Expression in BIBFRAME.

Mapping FRBR entities to BIBFRAME shall ensure preservation of semantics. FRBR uses two entities, namely Work and Expression, to represent intellectual creation and the signs used for its realization, while BIBFRAME uses only one class, Creative Work. Physical embodiment is represented in both models in the same way. FRBR represents embodiments with the Manifestation entity and Manifestation exemplars with the Item entity. Likewise, BIBFRAME defines the bf:Instance class for embodiments and bf:Item class for bf:Instance exemplifications. This mapping is depicted in Fig. 3, which is actually a generalization of the mapping for the Work with a single Expression example presented in Figs. 1 and 2. The instances of two FRBR classes, namely the Work and Expression instances, are semantically subsumed by instances of the class bf:Work in BIBFRAME. The Manifestation entity is mapped to the bf:Instance class and the Item entity to the bf:Item class. Moreover, in Fig. 3 the mapping rules between the core classes of FRBR and BIBFRAME are presented. These rules also refer to the “inherent relationships” [19] among FRBR Group 1 entities.

Fig. 3.
figure 3

Mapping from FRBR to BIBFRAME 2.0 representation pattern for a Work with a single Expression.

Specialization by attributes.

While BIBFRAME uses the Creative Work class to represent both the intellectual content and its realization, it specializes its semantics by a set of 10 subclasses. Accordingly, the bf:Instance class has 5 subclasses. The mapping of the representation patterns presented in Fig. 3 is generic and involves the high-level classes of the target model. Hence, in order to achieve closest similarity between the source and the target classes and properties, more detailed representation patterns regarding the FRBR triple Work - is realized through - Expression and the Manifestation entity should be generated. Such patterns are generated by exploiting information lying in the attributes of the FRBR Expression and Manifestation entities. Moreover, controlled vocabularies from the Library of Congress Linked Data Service (http://id.loc.gov/) should be used for the values of the attributes so as the mapping rules to be precisely expressed.

Regarding the mapping of the FRBR triple Work - is realized through - Expression to the bf:Work class and subclasses, we have identified the form of expression attribute. This attribute of the Expression entity describes the way a Work has been realized, e.g. text, still image, notated music, etc. The LC Content Types Scheme (http://id.loc.gov/vocabulary/contentTypes) may be used for the values of the attribute form of expression. Depending on these values, the FRBR triple Work - is realized through - Expression shall be mapped to a different bf:Work subclass. The attribute form of expression, along with the values of the LC Content Types Scheme, enables more precise mappings for all bf:Work subclasses. In some cases these values may even determine the mapping to a bf:Instance subclass. As an example, some mapping rules triggered by this attribute’s values are exhibited in Table 1.

Table 1. The values of the FRBR attribute form of expression trigger the mapping of the FRBR ‘Work – is realized through – Expression’ triple to different bf:Work subclasses.

Concerning the mapping of the FRBR Manifestation entity to the bf:Instance class and subclasses, the attribute form of carrier has been identified. This attribute of the Manifestation class describes the physical carrier in which an Expression of a Work is embodied. The Carriers Scheme (http://id.loc.gov/vocabulary/carriers), already used in RDA cataloging, may be also used as the vocabulary for the values of the form of carrier attribute. These values adjust the mapping of a Manifestation instance to a bf:Instance subclass. Some examples of mapping rules that are triggered by the form of carrier attribute values are presented in Table 2. It must be noted though that the form of carrier, along with the values of the Carriers Scheme, enables some mappings but not for all bf:Instance subclasses, such as the bf:Manuscript subclass.

Table 2. The values of the FRBR attributes form of carrier trigger mapping of the FRBR Manifestation to different bf:Instance subclasses.

3.2 Work with Multiple Expressions

The mapping rules of the previous section preserve information when transforming ‘Work with single Expression’ data from FRBR to BIBFRAME. In FRBR the classes Work and Expression are correlated by the relationship is realized through, having an one to many cardinality, meaning that for a Work several Expressions might exist. Indeed, classical works tend to have great bibliographic families. For instance, there are different editions of Homer’s ‘Odyssey’ and many translations in a variety of languages. In Fig. 4 two Expressions of The Essential Homer by Stanley Lombardo are represented: the English text and the audio narration of the text (sound recording).

Fig. 4.
figure 4

Mapping from FRBR to BIBFRAME 2.0 representation pattern for a Work with more than one Expressions.

Using the rules in Fig. 3, each one of the two triples Work-is realized through-Expression depicted in the upper side of Fig. 4 will be mapped to an instance of a bf:Work class in BIBFRAME. It is worth noting that the same instance of the FRBR Work entity ‘Odyssey’ participates in two different mappings. However, following the aforementioned rules to transform the FRBR representation pattern for the Work with multiple Expressions to BIBFRAME, the semantics of the origination of the two instances of the bf:Work class from the same Work (intellectual idea) are lost. BIBFRAME provides the property bf:hasExpression to correlate the two Expressions, as depicted in the BIBFRAME side of Fig. 4 for the two ‘The Essential Homer’ editions. In this case, in order to indicate in the target representation that the bf:Work class originated from the same intellectual idea the rules must be extended and connect all pairs between these two bf:Work instances with an instance of the bf:hasExpression property. The additional semantics incorporated by the bf:hasExpression property in the target pattern, preserve the content relationship. Yet, the information that the bf:Work instances have the same progenitor (Work) is not preserved.

3.3 Derivation Patterns: Translation and Adaptation

The bibliographic family of ‘Odyssey’ has become really great due to derivatives; there are many translations, as well as adaptations, dramatizations, imitations, etc. There are many types of derivation, as described in [15]. In this paper, the case of literal translation is studied. In Fig. 5 an example for the literal translation case is represented using the well-known translation of ‘Odyssey’ by Alexander Pope. Literal translation is represented at the Expression level in FRBR (Fig. 5). Two Expression instances of the same Work are related to each other with the has translation property, where one instance of the Expression entity (ancient text edited by D.Chalcocondylis) has a translation in another language represented by an instance of a second Expression entity (English translation by A.Pope). In BIBFRAME, translation is represented as a relationship between two Creative Work instances, as depicted in the BIBFRAME side of Fig. 5.

Fig. 5.
figure 5

Mapping from FRBR to BIBFRAME 2.0 representation pattern for the translation case. The bf:Work with the long dash-dot outline has been added in the mapping to preserve the progenitor bf:Work of the Odyssey bibliographic family.

As in the case of the Work with multiple Expressions, the same FRBR Work entity instance of ‘Odyssey’ participates in two mappings (Fig. 5). Moreover, to transform the FRBR translation representation pattern to BIBFRAME, the FRBR has translation property has to be utilized in order to correlate the two different Expressions. Then, the property will be mapped to the bf:translation property. Thus, in the derivation-translation case information regarding the content relationship between the two Expressions is preserved in the two bf:Works. However, following this mapping the information that the bf:Work instances have the same progenitor (Work) is not preserved. In order to preserve information about the common progenitor, mappings should be changed. More specifically, an additional bf:Work instance will be created (bf:Work with the long dash-dot outline in Fig. 5). Then this additional bf:Work instance will be linked with the others bf:Work instances using the bf:hasExpression property (also depicted with a long dash-dot line).

In case the Expression of derivation is not known, there will be Expressions in different languages of a Work. These Expressions will not be related with a has translation property, but the translation could be implied due to the different values between the language of expression attributes of each Expression instance. Since there is no explicit representation of the translation relationship, mapping of this representation would be similar to Fig. 4. Ideally, the mapping would be similar to the adaptation case depicted in Fig. 7 where the representation is made with an Expression-agnostic Creative Work instance related to another bf:Work instance through a bf:translation property. In order to achieve such mapping, new rules must be implemented taking into account the existence of differing values for language of expression attributes. Differences between the entity Person/Family/Corporate Body that created the Work instance and the Person/Family/Corporate Body that realized an Expression of the same Work instance must also be considered.

A derivation that results in a new Work is represented in FRBR at the Work level with various properties, namely has adaptation, has a transformation, has an imitation, has a paraphrase, has a dramatization. By contrast, BIBFRAME utilizes only the bf:hasDerivative property at the bf:Work level. Hence, all these FRBR properties are mapped to a single property in BIBFRAME.

In FRBR adaptation may be represented by the has adaptation property at either the Work or Expression level. When information regarding which Expression has been used for creating an adaptation is not known, then the representation of adaptation is preserved at the Work level and hence it is Expression-agnostic. The has adaptation property is used at the Expression level, when there is information about the particular Expression used to create both the Work and the Expression of the new adaptation.

In Fig. 6 an adaptation of ‘Odyssey’ for children is represented. Charles Lamb used the English translation of George Chapman and then “turned… [Odyssey] into prose, simplified the order of the narrative, abbreviated or combined episodes, and deleted descriptions and whole books in order to … eliminate anything inappropriate for young readers” [20]. As depicted in Fig. 6, the progenitor Work ‘Odyssey’ along with one of its Expression instances (English translation by G.Chapman) is mapped to one bf:Work instance, while its derivative Work ‘Adventures of Ulysses’ with its Expression instance is mapped to a second bf:Work instance. The has adaptation relationship at the Expression level is mapped to the bf:hasDerivative property instance that relates the two bf:Work instances. In this case both content relationships and the bibliographic family are preserved.

Fig. 6.
figure 6

Mapping from FRBR to BIBFRAME 2.0 representation pattern for the adaptation case.

In Fig. 7 an Expression-agnostic adaptation at the Work level is depicted. The exact Expression of ‘Odyssey’ used by Anne Terry White to create her adaptation for children is not known. Therefore, the progenitor Work ‘Odyssey’ is mapped to a bf:Work instance that lacks Expression-related information (e.g. language), while the derivative Work “Odysseus comes home from the sea” along with its Expression is mapped to a second bf:Work instance. The bf:Work on the left side of the bf:hasDerivative property may serve as an abstract bf:Work and it cannot have any bf:Instances because its Expression-related information is not known. In this case both content relationships and the bibliographic family are preserved.

Fig. 7.
figure 7

Mapping from FRBR to BIBFRAME 2.0 representation pattern for the derivation-adaptation case. The exact Expression used to produce the adaptation Expression is not known.

4 Discussion and Conclusions

The navigation in an ever-changing overloaded bibliographic universe that preserves the contextual semantics of the bibliographic descriptions largely depends on the control of content relationships and bibliographic families. Library conceptual models include constructs to describe and control bibliographic families. This paper examines if and how information about content relationships and bibliographic families may be preserved under mappings. It focuses on FRBR and BIBFRAME models, and on mappings where FRBR is the source model and BIBFRAME is the target one. The cases of a Work with a single Expression, as well as bibliographic family cases (e.g. Work with multiple Expressions, Works with derivative relationships) are studied and some interesting findings were derived.

The generic mapping of the simplest case of a Work with a single Expression may be considered straightforward (Fig. 3). Additionally, more precise mapping rules may be applied combining FRBR attributes and values from controlled vocabularies (Tables 1 and 2). The utilization of controlled vocabularies for mapping purposes and automated exchange of library data demands a shift in working culture and an adoption of new cataloging rules and policies. From now on librarians shall perform cataloging having in mind collaboration and reuse of data, not just indexing their library’s collection for local purposes. This may affect cataloging systems, as well as workflows.

An interesting finding of this study is that the relationships between members of a bibliographic family may be preserved in BIBFRAME only when FRBR Expressions are related by a particular property (has translation, has adaptation, etc.). In the case of mapping an FRBR Work with multiple Expressions to BIBFRAME there is no relationship between the FRBR Expression instances. Therefore, the information regarding the common progenitor is lost in BIBFRAME. The mapping has been extended with the insertion of two bf:hasExpression property instances (Fig. 4), to preserve the content relationship. Still the common progenitor is not explicitly represented. Information about the progenitor Work may be preserved in BIBFRAME following the practice shown in Figs. 5 and 7, where a new bf:Work is generated to hold the information of the Work entity as progenitor. In both cases an Expression-agnostic bf:Work instance has been used as the progenitor. Then, this progenitor bf:Work is related to the other members of the family with bf:hasExpression property instances (Fig. 5) or with another property (bf:hasDerivative in Fig. 7), if such exists based on the mappings.

This Expression-agnostic bf:Work is similar to the superwork expressed by Svenonius in [21] and may be used to group all bf:Works that are somehow derived by it. Expression-agnostic bf:Works are not expected to have any bf:Instances. At this point, it must be noted that BIBFRAME does not impose cardinalities regarding the triple bf:Work-bf:hasInstance-bf:Instance. This may provide flexibility in some implementations of BIBFRAME, but at the same time may cause ambiguity. Totally different mapping rules can be defined when different cardinality constraints exist, if for example a bf:Work must or may have one or more bf:Instances.

This study uses a limited set of cases and data. More bibliographic relationships need to be studied and findings shall be checked using larger and more complicated datasets. The mappings produced in this study need to be converted through a mapping language in conversion rules. A follow-up study shall compare the transformation based on these rules in contrast to the MARCXML to BIBFRAME Transformation software [22]. Moreover, existing software tools should be selected and adapted to evaluate the degree of preservation of bibliographic relationships after mappings. Interesting findings are also anticipated for testing the opposite mappings, where BIBFRAME is the source model and FRBR is the target one. Updates of the two models are likely to cause changes in mappings. The consolidated FRBR-LRM is expected to be announced in 2017. BIBFRAME model is regularly updated and its second version has already included FRBR conceptualizations to enable mappings, e.g. the bf:Item class. There is the possibility that prospective BIBFRAME versions shall include more changes for interoperability reasons.