Keywords

1 Introduction

In recent years, the growth of IoT services and other data collection and utilization practices has greatly enhanced our daily lives and brought many benefits in fields such as healthcare and medical services. The continuous production of personal health data (PHD) from wearable devices and sensors has led to its widespread use for personalized healthcare analysis and the promotion of health and well-being [1]. However, there are growing concerns about privacy-preserving and how to make the proper protection and management of collected and stored data. Therefore, individuals who utilize such information face the challenge of securing and maximizing the benefits of their data while also addressing privacy concerns.

To address these challenges, various privacy regulations have been established globally, including the European Union General Data Protection Regulation (GDPR) [2], the California Consumer Privacy Act (CCPA) [3], and Singapore's Personal Data Protection Commission (PDPC) [4], among others. These regulations highlight the increasing importance of privacy protection in data management. The GDPR, in particular, imposes significant legal penalties. Therefore, for sensitive data sharing such as PHD, it is necessary to require a secure way to manage and use data with the consent of each individual. However, there are still challenges to be overcome, including concerns with ensuring data ownership and enhancing privacy protection.

To solve these problems, many solutions using blockchain and privacy computing have been proposed [5, 6]. In our previous work [7,8,9], we proposed a novel model of Individual-Initiated Auditable Access Control (IIAAC) for privacy-preserved data sharing based on blockchain, Ciphertext-Policy Attribute-Based Encryption (CP-ABE) and InterPlanetary File System (IPFS). We further implemented the secure interoperation of blockchain and IPFS through a client application.

In this paper, we propose a trustworthy system enabled by decentralized federated learning in IIAAC. Federated learning is a method for collaborative machine learning, which only shares training models while maintaining all the target data on the decentralized nodes without centralized data collection [10]. By coordinating multiple nodes to execute machine learning, federated learning is suitable for health data sharing and utilization while enhancing privacy protection [11].

The remainder of this paper is organized as follows. In Sect. 2, related work on blockchain and federated learning are overviewed, and privacy protection issues on data sharing are identified. In Sect. 3, we introduce the functions and features of our proposed system and present the basic system architecture using decentralized federated learning. In Sect. 4, our simulated experiment is described, and two scenarios for health-related data sharing and utilization are assumed. Thereafter, a set of individualized visualization results are shown. Finally, this paper is summarized, and future directions are highlighted in Sect. 5.

2 Related Work

In this section, we briefly introduce the issues of health data sharing and utilization. Then, we present federated learning, blockchain, and previous related works conducted in data integration and sharing based on federated learning and blockchain platforms.

With the rapid growth of the Internet of Health Things (IoHT) in health infrastructures, large amounts of health-related data are being collected and processed in storage data centers [12]. However, securing a vast amount of health data presents a significant challenge, and it is necessary to implement innovation with privacy-preserving health data solutions.

In recent years, privacy computing has emerged as a promising privacy-preserving technology. Gartner has recognized privacy-enhancing computation as a top strategic technology trend for consecutive years 2021 and 2022 [13]. Privacy computing encompasses techniques, such as multi-party computation, differential privacy, and federated learning. Especially federated learning enables machine learning analysis without data aggregation and public transmission. In a federated learning platform, data remains stored locally while models are trained and shared in multiple nodes, preserving the privacy of the data owner [10]. Therefore, federated learning enables data processing and analysis locally while reducing the transmission of original data, allowing for secure and privacy-protected sharing of models.

However, federated learning may also bring privacy risks, such as malicious users can obtain sensitive information through the inference of data during the training process [14], and data leakage could occur when the central server or client is compromised [15].

To solve these problems, blockchain technology has been used as a complement to federated learning, which allows for trustworthy identity authentication, tamper-proof data storage, and decentralization. To counter global aggregation attacks and distributed poisoning attacks in federated learning, many frameworks and approaches using blockchain have been proposed. Heiss et al. [16] proposed a blockchain-based federated learning, which uses zero-knowledge proofs to verify off-chain computations and prove the correctness of parameters. It remains the federated learning framework with a central server. To address these several key issues, such as ensuring the reliability and quality of distributed data and considering how to motivate data owners to share data with others by using an incentive mechanism. In our previous work [8, 9], we proposed a novel model of Individual-Initiated Auditable Access Control (IIAAC) in a consortium blockchain-based system incorporating CP-ABE and IPFS. We further implemented secure interoperation of blockchain and IPFS through a client application. Based on our previous work, this paper focuses on data integration and sharing in a trustworthy way enabled by decentralized federated learning in IIAAC.

3 A Trustworthy Decentralized System for Health Data Integration and Sharing

In this section, we first introduce the functions and features of our proposed system. Then, we describe the prototype system using decentralized federated learning in IIAAC.

3.1 System Requirements

To integrate and share health data effectively and securely, a trustworthy decentralized system enabled by blockchain and decentralized federated learning is designed. The major functions and features of our proposed system are summarized as follows.

  1. 1)

    Health data integration

    Various types of health-related data are collected and integrated into the system, including health features collected via a wearable device related to bio indicators (e.g., heart rate and blood oxygen), sleep indicators (e.g., sleep score, deep sleep continuity, wake-up counts, and breathing quality) and activity indicators (e.g., step number, step distance, activity calories consumption, and moderate to high-intensity activity duration). In addition, health data in terms of Traditional Chinese Medicine (TCM) are obtained and used, which is important for predictive health risk analysis [17, 18]. These data are collected and integrated into the system for sharing and utilization in a trustworthy and secure way.

  2. 2)

    Doubling data ownership

    Users in the system are classified into two types: the data owner, who generates the data, and the data requester, who uses the data. In certain situations, health data can be owned by plural users. For instance, an electronic medical record (e.g., TCM health data) is usually created by a medical staff or a medical device as the data owner. However, such a kind of health data is important for a person to manage his/her health through health data analysis. In our previous work, we allowed the system to set double ownership for the person, which is implemented by smart contracts in blockchain [19].

  3. 3)

    Data analysis via decentralized federated learning

    By using the decentralized federated learning mechanism, machine learning algorithms are applied to data analysis while maintaining data decentralization. The local model and the global model are used for further personalized analysis.

  4. 4)

    Incentive mechanism

    To motivate data owners to participate in data integration and sharing positively, we incorporate an incentive mechanism that provides individualized feedback to a data owner who shares the data. The individualized feedback includes comparative analysis results with a peer user or a group of users.

3.2 System Architecture

In this section, we describe the basic system architecture in IIAAC by using Hyperledger Fabric, a consortium blockchain, an IPFS distributed file system, a CP-ABE encryption mechanism, and a decentralized federated learning platform to interoperate with each other.

In a decentralized federated learning mechanism, we use the local model and the global model for personal health data analysis and feedback on the analysis results to data owners. The system architecture designed in our proposed model mainly consists of two categories of users, which are data owner and data requester. In our proposed architecture, personal health data is kept on local devices (decentralized nodes) instead of being centralized on a server or transferred data to otherwhere. In addition, important information related to the training process and model parameters in the federated learning process can be securely stored on on-chain storage and off-chain storage via blockchain. The basic architecture of the prototype system is shown in Fig. 1.

Fig. 1.
figure 1

Basic System Architecture

4 Experiment and Application Scenarios

In this section, we first describe the simulated experiment. Then, we assume two application scenarios for sharing and utilization of personal health data and visualization feedback to users.

4.1 Experiment Overview

Wearable devices were used to monitor and record the daily health data of 22 recruited participants for 50 days and collecting data every day. Each participant was requested to record Self-assessed Subjective Health Score (SSHS) every day [20]. This study was conducted under the approval of the Ethics Review Committee on Research with Human Subjects of Waseda University, Japan (No. 2018-092), and all subjects for this experiment signed the informed consent.

In this paper, we design and conduct a simulated experiment based on the proposed system for health data integration and sharing using a part of the collected data, as mentioned above. The simulated experiment environment was built using a Blockchain-powered Verifiable PPC (Privacy-Preserving Computation) network, namely Delta FrameworkFootnote 1, which integrates blockchain and ZPK (Zero Knowledge Proof) to ensure it is verifiable that the computation is actually performed as designed on the required data in a privacy-preserving manner. Delta transforms the tasks into horizontal/vertical federated learning, or federated analytics task and executes it on the network.

For the experiment environment, we also implemented a dashboard interface on the Delta Framework for usability and use Jupyter Lab to construct these tasks, which are written in Python. In the experiment, we simulated three nodes on a computer with the Ubuntu OS. The specifications of the experiment computer are given in Table 1, and the versions of experiment platforms and tools are shown in Table 2.

In the experiment, we took three nodes as three users to simulate the integration and sharing of data in privacy-preserving computation. We put three datasets collected from three subjects into the three decentralized nodes to simulate the data integration and sharing, in which the data structure is the same. The result showed the feasibility of health data integration and sharing in the simulated experiment environment of decentralized federated learning with three nodes.

Table 1. Specifications of experiment computer
Table 2. Versions of experiment platforms and tools

4.2 Application Scenarios

In our previous work [21], we proposed two types of comparisons for visualized feedback of personal health data analysis results, namely temporal comparison, and horizontal comparison. Temporal comparison is designed to show the current health indicators versus the features obtained from the data of the past. On the other hand, the horizontal comparison provides a measure to let users know where they stand in relation with others, e.g., a peer of the same gender, or a group of the same age, which is also considered to be a good way for us to know ourselves. We describe two application scenarios by showing the visualization feedback which our proposed system aims to provide as follows.

Scenario for Temporal Comparison.

A female student wants to know her health indicators for the past week from January 2 (Monday) to 8 (Sunday), 2023, comparing with the averages in the last two weeks up to the day before. She selected five health indicators, i.e., resting heart rate, total sleep duration, activity calorie consumption, stress score, and SSHS (the last two as one pair). The results of these temporal comparisons are shown in Fig. 2, in which the reference range is highlighted in light green color. The user can observe from the graphs how her health features changed, to what extent they are different from the averages, and whether the selected resting heart rate, total sleep duration, and stress score are within the reference range or not.

In Fig. 2, we can see that the daily indicators in the last one week have significant fluctuations compared to the averages in the last two weeks up to the day before. Moreover, the resting heart rate and the total sleep duration on Tuesday are greatly out of the reference range and the other one is less than the reference range, which resulting a higher stress score. And the SSHS is also very low, which is kept low for continuous four days. By continuously observing the bio indicators, such as the resting heart rate, or the sleep indicators, such as the total sleep duration, the user can understand their physical strength improvement, accumulated fatigue, or stress, and their relation to her health status.

Fig. 2.
figure 2

Temporal Comparison

Fig. 3.
figure 3

Horizontal Comparison

Scenario for Horizontal Comparison.

The female student wants to know her selected five health indicators during the period of January 2 (Monday) to 8 (Sunday), 2023, compared in terms of the averages in the last two weeks up to the day before with one of her good friends as well as a female group whose members all agree to share their data. The results of these horizontal comparisons of five selected health indicators are shown in Fig. 3, in which the reference range is highlighted in light green color. From the figures, she can observe the trends and the changes to find similarities or differences between herself and her friend or a group of chosen female users.

In Fig. 3, we can see that the trends of the average of the user, the average of a peer, and the average of a group are similar and stable. Among them, her resting heart rate and activity calorie consumption (represented in black lines) are lower than the peer and the group, while their total sleep duration is almost the same, about 7.5 h. Less activity calorie consumption of the user may imply that she spent less time doing exercises, which may result in her higher stress scores than the peer and the group, although it is still within the reference range. From Fig. 3(d), we can also see that the average SSHS of the user has a declining trend from weekdays to weekends.

5 Conclusion

In this study, we proposed a trustworthy decentralized system with the blockchain and federated learning for privacy-preserving data integration and sharing. The system introduced in this paper can be expected to realize the processes of data life cycle management and utilization with trustworthiness.

In this paper, we described the functions and features of our proposed system in terms of health data integration, doubling data ownership, data analysis via decentralized federated learning, and incentive mechanism. Then, we explained the system architecture in IIAAC. We further described the simulated experiment and assumed two scenarios for sharing and utilization of personal health data. The individualized feedback was given to a user in a visualized way, in comparison with the averages calculated from the data of a peer or a group, which can also be used as an incentive for positive data sharing.

For our future work, we will implement the proposed trustworthy decentralized system. We will conduct the validation and performance evaluation experiment on the proposed system using decentralized federated learning models to analyze the tasks in [20]. We further plan to compare the proposed system with other related works for benchmark analysis.