Keywords

1 Introduction

Data traceability is crucial for optimizing the quality of a product or a service. For example, in food industries, product traceability is important to help identify the processes that each product undergo. Meanwhile, agricultural industries require crop traceability to ensure optimized quality in large batches of crops. To add, service or tourism industries need data traceability to ensure proper transactions and the security of customers’ data (Nyaletey, et al. 2019). However, with the existence of many layers of the whole process, traceability of data can be very difficult to manage. Due to this, many large industries with large production capabilities experience difficulties in tracing each of their products for quality checks.

Traditional traceability methods, such as barcodes and quick response codes (QR), are cost-effective and easy to implement (Qian et al. 2012; Tarjan, et al. 2014). However, they lacked real-time monitoring capabilities and do not provide enough information to conduct detailed analysis and optimize the traceability process.

Artificial intelligence (AI) technology has emerged as a potential solution to enhance data traceability. AI can provide early warnings of potential errors in the traceability process, allowing organizations to proactively address any issues beforehand. For example, AI algorithms can analyze data from multiple sources, including sensors and tracking systems, to identify anomalies and deviations from expected patterns. However, AI technology is complex, therefore it requires highly skilled workers to operate and maintain it (Zhang and Lu 2021).

The Internet of Things (IoT) is another technology that has the potential to enhance data traceability. IoT involves connecting devices and sensors to the internet, allowing for remote monitoring and real-time data collection. When integrated with traceability systems, IoT can provide valuable insights into the journey of a product or service, helping organizations to optimize their processes and improve the quality of their offerings. However, implementing IoT can be costly and the devices can be vulnerable to hacking, which could compromise the traceability of the product (Cui, Chen, et al. 2019).

Differing from IoT, blockchain (BC) technology is based on its implementation. It is a method that is less susceptible to external attacks due to its decentralized nature. In general, blockchains are a chain of information that are stored in the form of hashes within a block. Information blocks will be created, and a newer block will be generated following the previous block for new information. This creates a link of information blocks that can easily be accessed and traced. As more blocks are created in the chain, the system is considered more stable and less susceptible to data attacks (Jiang et al. 2022).

Since blockchain technology has emerged as a promising solution to address the data traceability challenges, this review paper analyzes existing methods and frameworks for establishing traceability, covering industries such as food, pharmaceuticals, luxury goods, and electronics. It evaluates the benefits and challenges associated with implementing blockchain-based traceability systems, addressing scalability, interoperability, and regulatory considerations. Additionally, the paper identifies emerging trends and future research directions in the field, emphasizing the need for interdisciplinary collaboration, standardization efforts, and advanced analytics. Overall, this review serves as a valuable resource for researchers, practitioners, and policymakers interested in harnessing blockchain technology for transparent and accountable traceability.

2 Blockchain Traceability Methods

Applications of BC in traceability are still at infancy stage and under continuous improvement. There are several methods discussed from other studies as shown in Table 1. Each of the methods was used in conjunction with data traceability for products and industries.

Data traceability in IoT involves the tracing of data from various devices that are connected to the internet. All kinds of data are continuously generated from different fields and industries. A large amount of data will require sufficient security measures and handling. However, due to the enormous amount of data that stems from various devices, data traceability can be a challenge if conventional methods are used. Hence, Qiao et al. (2018) proposed the traceability of data using consortium blockchain. Consortium blockchains are accessible to the appointed members of an organization. These members can set the read-write access permission and assign additional rules to the blockchain. In the study, the blockchain mechanism follows a verification nodes list which is formulated based on a set of boundary conditions. Each node possesses its own credibility based on its behavior. The credibility of the nodes was maintained based on how well they served other nodes in data verification. A node’s credibility will be negatively impacted if it is inactive or denied service for a period of time. The node with low credibility will be removed from the list specified by the governing agency.

The proposed method by Qiao et al. (2018) allowed the formation of a high-quality data chain in terms of a node of trustfulness and security. The formation of a data blockchain is achievable due to the boundary conditions applied to the consensus nodes. The boundary conditions are formulated to allow nodes with high credibility to the main data chain. Any foreign, dishonest, or attacking nodes will have lower credibility scores and will most unlikely to be accepted in the main chain. However, the boundary conditions for the consensus nodes do not eliminate the possibility of attacking nodes gaining acceptance into the main chain. Hence, the proposed data traceability method is still risky due to dishonest nodes having a chance to infiltrate the main chain. The node verification could lead to a high run time of the system and data standardization from mixed parameters can also be difficult.

In the food industry, data traceability with blockchain technology has been proposed (George, et al. 2019) with the aid of the Food Quality Index (FQI). Data traceability in the food industry has been implemented with different technologies throughout the years. The tracking of various information in food production such as product tracking, raw material information, and logistics allows continuous improvement of the overall operations. Barcodes and QR codes have been implemented to allow data traceability for products and materials. However, this technology can be difficult to implement, especially for large production-scale industries. It requires a carefully structured algorithm build that leads to high design time especially if data standardization is required. Therefore, with blockchain, it is possible to implement within an industry for data traceability due to its long-term effectiveness and scaling. When information is stored in a blockchain, its accessibility is more efficient compared to centralized storage.

The study proposed by Hao et al. (2020) was similar to Qiao et al. (2018) in terms of the boundary condition implemented. It was a model centered around parameters such as the shelf life, nutritional value, and storage time of the product. These data are collected via sensors and manually keyed in by farmers. The data is recorded and processed within the blockchain to determine its food safety. The determination of food safety is based on a food index constructed from prescribed standards made by the regulatory authority. A curve estimation based on shelf life versus weightage was made. Extremum points are derived based on the curve estimation and serve as the range of values determining whether the food is safe for consumption.

The proposed blockchain traceability method (George et al., 2019) based on the food index has potential use in the food and beverage (F&B) industries. The boundary conditions prepared based on the shelf life of food allow the construction of the food index. Since the food index can be different for each type of food, the method allows for additional food indexes to be updated within the blockchain boundary conditions. However, this method can only be used for food products. Its workability is not explained for other kinds of production such as crops, electrical components, data services, etc. A different boundary condition with different algorithms will be required for other industries.

Blockchain technology is constantly being improved and combined with other technology for data traceability. A combination of blockchain technology and visualization technology has also been proposed. This blockchain traceability method was built around Hyperledger Fabric technology. This technology enables flexibility in terms of the number of parties involved in the development of a blockchain platform. The main component of Hyperledger Fabric is (Hao et al. 2020):

  1. a)

    Orderer – Receives all transactions throughout the main designated network before packing the data into blocks.

  2. b)

    Client – The user that is utilizing a software development kit (SDK) to access and request transactions from the Hyperledger network.

  3. c)

    Endorser – These nodes endorse clients that requested a transaction initiation. A client must obtain enough endorsements from the endorser node to fully initiate a transaction.

  4. d)

    Committer – Receives the packaged blocks from Orderers. These nodes verify the block’s transaction validity before updating the ledger.

The proposed method in the mentioned study utilized several layers in its framework. A business layer serves as a contact point between a human user and a computer. The operation of uploading data and visualizing them through the display is the primary purpose of this layer. Users can write or update their smart contracts through the Application Programming Interface (API) that can be accessed through this layer. The communications layer of the framework houses the main protocols of the P2P network and network structures. Synchronization data between all nodes occur in the communications layer via the Gossip data communication protocol. The communications layer is a bridge between the business layer and the database layer. The stored data ledgers are kept within the database layer in the form of data blocks. The sophisticated framework with hyper ledger technology allows visualization mapping of data. The visualization is useful for mapping out data in terms of food quality to assist in the prediction of shelf life or identification of defects. However, the framework suffers from the speed of data processing, high program run time and block generation scaling. The added visualization of data suffers from extended delays as the number of data blocks increases.

The utilization of blockchain in the pharmaceutical and medical industry can be potentially beneficial. For medicinal supplies, it is required by the Global traceability standards that all items are assigned a unique identification number. The identification number aids in tracing the movement of pharmaceutical products throughout the supply chain. Zhu et al. (2020) proposed a traceability method with the aid of blockchain infrastructure. The pharmaceutical products will be assigned their ID number after manufacturing and this number will be recorded into the blockchain. Wholesalers and retailers serve as an intermediate node that can update the status of the medication on the blockchain (its location within the supply chain) via a peer-to-peer (P2P) network.

The system proposed by Zhu et al. (2020) consisted of a smart contract, a blockchain network, and a web client for the user. Smart contract coded using Python contains a mechanism of practical Byzantine Fault Tolerance (PBFT) for establishing a consensus among participating nodes. In the PBFT mechanism, the amount of network traffic can increase quickly due to nodes attempting to reach a consensus. This can lead to a slower time for information processing and transfer. This is remedied with the introduction of a score-keeping system. This system aids the nodes to reach consensus quickly before the network traffic is congested. Each node can receive a form of reward or punishment depending on the process of reaching a consensus. A dishonest node (carrying modified/tampered/forged information) from a set of nodes would cause the set to be deducted of score points. A set of nodes with low score points will have a lower probability to be selected for the consensus operation process. This ensures that the set of nodes with higher scores are nodes that can be trusted the most.

Network traffic will then be reduced since the low-scored nodes that can hinder the consensus process have less probability to participate. The system does show its potential as an organized BC-based data traceability system, however, the overall system is good for one form of data at a time. In the mentioned study, only the medication ID forms the hash before being participated throughout the system of nodes for validation. The scoring system would collapse if different forms or data parameters were to be validated. The system would suffer from a data standardization issue due to the nodes’ inability to be given proper scoring or the potential in receiving penalties (Zhu, et al. 2020).

Besides the creation of an organization’s own smart contracts and blockchain systems, it is also possible for data traceability to utilize public commercial blockchain networks for data traceability. It can be desirable to utilize an established and well-known blockchain network for data traceability for stability purposes. Marbouh et al. (2020) proposed a data traceability method during the COVID outbreak in 2019. Resource deployments are a challenge, especially in the mass mobilization of vaccines and medical equipment. The proposed data traceability method involves the use of Ethereum blockchain network. The public blockchain network eliminates the need for internal, intermediary nodes for validation. Therefore, data traceability and validation can be achieved at less cost. In addition, the Ethereum blockchain also allows multiple smart contracts to be implemented for certain desired purposes.

There were three Ethereum smart contracts created by Marbouh et al. (2020) for tracking of COVID-19-related data; a registration contract that handles the web information of participating stakeholders and web sources, a reputation contract that assigns scores based on the quality and trustfulness of web sources and an aggregator smart contract tasked with the retrieval of the latest information updates from nodes. The system allows for multiple information and parameters to be tracked within the blockchain network. Unlike privately developed blockchain networks, public network like Ethereum requires gas fees for each successful data transaction and smart contract code execution. More complex smart contracts with additional codes may lead to an increase in costs. Furthermore, with the price movement of Ethereum coins, its gas fee will be greatly impacted leading to difficulties in operational costing calculation.

For sensitive information traceability, a method was proposed by Jiang et al. (2020). The system was centered on a service provider, certification center and the user whose information is to be transacted. For added security purposes, the system does not involve users to immediately upload their personal information to the certification center (can be government bodies, employers, etc.). The user is required to register and generate an identity in the system before the certification provides the new user with a unique identification number. The user then needs to encrypt his personal information with the provided identification number before sending the data to the service provider. The service provider then generates a random symmetric record key and encrypts the information package (that includes all the sensitive personal information).

The certification center (original information requestor) will decrypt the record key with the information package and sends the user the same record key for safekeeping. The certification center will check the record key and identification number. If both number matches and no tempering on all the backups of the package transactions, the certification center will release the data key to the service provider. This technique demonstrated a secure method for tracing sensitive data. However, it can be challenging in terms of scalability due to the numerous amounts of encryption involved. It also does not allow ease of data access for each node which can be difficult for organizations with multiple intermediary bodies serving as nodes in between.

A batch of information can be traced within the blockchain network with the proper mathematical modeling. Within the overall production process, various items and raw materials need to be tracked at once (within one transaction) and this can be challenging to achieve with minimal use of smart contracts and basic mathematical models of data traceability. One needs to tailor specific mathematical modeling depending on how many steps or phases are involved in the processing of several different raw materials into their respective end products. Most products nowadays average on undergoing 4–6 major processes before a raw material reaches its commercial form. A stochastic batch dispersion model was proposed by Maity et al. (2021) to achieve data traceability for food processing optimization. The batch processing of products makes it suitable for use with mathematical models that minimize the sum of links between the packaged product and its respective raw materials. This way, the inconsistency of data due to completed products from external sources can be minimized. The stochastic batch dispersion model considers 5 levels of process within the production supply chain. The individual levels represent the state of the product; therefore level 1 is the raw material, and level 5 is the finished product. There are two stages of the mathematical model with the first being the determination of fixed proportions of the finished raw materials of the product. The second stage of the model assist in which raw materials are required to be outsourced to fulfill any remaining customer demand. Secondary stage acts are the corrective measure for the mathematical model to ensure product traceability and the correct proportion of raw material used.

Blockchain systems can be integrated with sensors for quality inspection and data traceability, and this has been proposed (Liao and Xu 2019). The Ethereum blockchain is used as the underlying architecture for the system and is the data entry point. The information regarding the product is stored within the public blockchain and can be retrieved via a QR code. This system architecture shares a similarity with other studies (Marbouh et al. 2020) that involve separate layers of traceability system via the business layer, data layer, and user interfacing layer. Although the system can have data standardization due to having the QR code as the form of information retrieval technique, the efficiency of the QR code falls behind newer tracking technologies such as VR/AR, IoT, Big Data, and AI.

Table 1. Various methods for blockchain traceability methods.

3 Category of Blockchain Traceability

Based on the literature, there are at least three different categories of BC traceability methods. These methods vary greatly in terms of implementation, algorithm, and system structures.

3.1 Consortium Centric Blockchains

A consortium blockchain is one of the methods in BC-based traceability used. This technique has been utilized by Qiao et al. (2018). This technique involves several specialized nodes or trusted nodes for the verification of data chains. These networks of nodes are governed by personnel who is the member of an organization (alliance). The personnel can access the nodes through the agency gateway. The read and write access permissions can be granted based on the preferences of the alliance. The data in the blockchain can be made to be either private or public to all alliances. The system can be reliable and flexible due to the full control of alliance members and their transparency, but consortium-based blockchains can lead to slow processes due to the nodes waiting for the approval of other nodes. The distrustful nodes can still be chosen by other nodes to be accepted into the main chain. It is possible that consortium-centric blockchains on their own can suffer from reliability issues.

3.2 Smart Contracts

Blockchain networks such as Ethereum rely on proof of stake for node verification. Ethereum possesses smart contracts that can be utilized in data traceability. Smart contracts are software agents that are inserted in blockchain networks. Figure 1 is an example of a smart contract-based blockchain system operation. These agents enable the automatic verification of transactions with no interference from third-party entities. This increases the transaction approval efficiency and transaction processing time. In a work proposed by Marbouh et al. (2020), an Ethereum smart contract was utilized for data traceability of COVID-19 cases. The study aims to solve the data traceability issues regarding medical supply chains and cases of COVID-19. These data can be overwhelming and lack security for a standard centralized data storage to handle. Therefore, the proposed work considered using three smart contracts within an Ethereum blockchain ecosystem to aid in data traceability. Smart contracts can be beneficial since it is open to flexibility for the user to add additional parameters to suit the industry traceability needs. It is easy to apply smart contracts to existing public domain networks such as Ethereum. However, the gas prices (fee) for transaction processing are inconsistent as the value of Ethereum fluctuates depending on the cryptocurrency market. Estimation of cost can be challenging due to the ups and downs of the gas fee pricing.

Fig. 1.
figure 1

A smart contract assisted blockchain data traceability process for COVID-19 tracking (Dounia, et al. 2020).

3.3 Custom Algorithms and Mathematical Modeling

Another method with data traceability with BC is the modification of custom algorithms. This requires detailed programming with a strong emphasis on the mathematical model built around the data traceability process and its elements. For instance, the Byzantine fault tolerance algorithm can enhance the blockchain process for nodes to achieve consensus (Zhu et al. 2020). This means that the algorithm can enhance other blockchain methods if well-coded. However, a strict and complex rule must be specified for each item or type of information. As more types of different information exist in the blockchain, the algorithm can suffer from errors. Hence, multiple different types of items, transactions, and information lead to a data standardization issue and limit the true potential of the blockchain traceability method based on the algorithm.

4 Discussion

Several different methods in blockchain traceability methods involve the modification of different parts of a system. The consensus-centric method focuses on the roles of different nodes within the system. Therefore, it is implemented within the blockchain network itself. This technique can be suitable for traceability methods where a transaction of information is uniform, and trustfulness is required. An example of such potential use can be the traceability of sensitive data of employees or financial information. It is compatible to be integrated within various systems, however, it may not be able to ‘plug and play’ from one system to another. It is possible that this method must work with another blockchain method to enhance its usability options.

The use of the public blockchain domain with the support of smart contracts is another viable method for agricultural traceability. Since more additional conditions can be set to nodes, the system can be made to be more accessible for use with traceability of various products and raw materials movement within a supply chain. Furthermore, since the public domain is used, the system is more stable than a custom-made blockchain due to more nodes can participate in the proof of stake/works. Some potential examples of public blockchains include Ethereum, Polygon Matic, and Tron. The system would require a transacting fee that usually costs a particular cryptocurrency acting as the ‘gas fee’(Marchesi, Marchesi, Destefanis, Barabino, & Tigano, 2020). Since cryptocurrency fluctuates greatly over time, estimation of system operational costs can be proven challenging.

Another modification can be made to the algorithm of a blockchain itself. For this type of blockchain traceability method, proper mathematical modeling must be constructed from scratch for most use cases. It requires highly trained personnel to develop but offers a high degree of customization for its use case in data traceability. It can be extremely complex for data integration due to the modeling becoming increasingly convoluted. For instance, for growing only one type of crop, not only the seeds and fertilizers need to be considered. A specific type of fertilizer must also be considered for maintaining the crops. Furthermore, other parameters such as humidity-adjusting water sprinklers, soil pH maintenance, and crop movement via harvesting and logistics can add up to form an extremely complex system. Bugs and errors can be very difficult to remedy for a sophisticated system. Therefore, it can be unfavorable for the system to be easily marketed for industries to be used in their data traceability needs.

A potential gap can be identified based on the literature on data traceability of various usage scenarios. The implementation of blockchain technology for data traceability is still seldom compared to other more understood data traceability methods such as RFIDs, IoTs, and AI. Within the context of blockchain itself, several methods have been proposed for various usage scenarios, but they possess weaknesses. The most commonly found weakness in blockchain technology data traceability is the scalability, implementation, and data standardization issues. These weaknesses show a potential gap for the development of a more refined blockchain data traceability method.

5 Conclusions

This review paper presents an overview of the use of blockchain technology for data traceability in various industries. Through a thorough analysis of traditional traceability methods, the limitations of existing approaches are identified, emphasizing the need for more robust and efficient solutions. The review highlights the potential of artificial intelligence and the Internet of Things as complementary technologies to enhance data traceability. However, challenges such as the complexity of AI and the vulnerabilities of IoT devices are acknowledged. In this context, blockchain technology emerges as a promising solution due to its decentralized and transparent nature. By exploring different traceability methods, a holistic understanding of the benefits and challenges associated with blockchain-based traceability systems is revealed. The evaluation of research studies showcases the diverse approaches taken, emphasizing the importance of factors such as data standardization, processing speed, and cost. The paper also underscores the significance of interdisciplinary collaboration, standardization efforts, and advanced analytics in further advancing blockchain-based traceability systems. By leveraging blockchain-based traceability, organizations can drive positive transformations in achieving transparency, accountability, and efficiency.