1 Introduction

In recent years, Deep Learning (DL) based methods have achieved considerable success in medical image segmentation [1]. However, their progress has often been constrained as they require large datasets. AL has the potential to significantly enhance the efficiency of any intelligent diagnostic system such as [2] by mitigating the need for extensive annotation efforts, as evidenced in previous studies [3, 4]. For example, ophthalmologists use the segmentation of ocular OCT images for diagnosis, and treatment of eye diseases such as Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) [5, 6]. Labeling medical image data for AL is a time-consuming and expensive process as domain experts are required to annotate them manually. In this study, we primarily rely on data derived from OCT. This technology has gained significant popularity in the field of ophthalmology imaging due to its effectiveness. OCT employs the use of light waves, enabling it to generate high-definition, cross-sectional visuals of the internal structures of the eyes.

In ophthalmology, OCT is used to diagnose and monitor conditions such as macular degeneration, diabetic retinopathy, and glaucoma. OCT images can provide detailed information about the thickness and integrity of retinal layers, the presence of fluid or swelling, and the size and shape of optic nerve structures. To observe the development and changes in retinal layers, the presence of fluid or swelling, and the size and shape of optic nerve structures during the treatment phase, doctors annotate the images. This allows them to easily track how the retinal layers change over the course of treatment.

AL can serve as a beneficial tool in the realm of medical image segmentation. It has the potential to alleviate the extensive effort required for annotation by leveraging the model to obtain annotations for image regions where the model exhibits high confidence. Conversely, in instances where the model demonstrates lower confidence, experts can contribute by providing more ground truth data [3]. In practice, expert annotation of large-scale medical image databases is highly laborious, resource-intensive, and often infeasible.

In response to this challenge, we propose PIAA, a region-based AL technique. This technique, which trains over time using a minimal amount of annotated regions, aids ophthalmologists by generating OCT segments. The PIAA framework capitalizes on the prediction uncertainty across the boundaries of the semantic regions of input images, informing the end user about the segmentation areas it is confident about and of those it is not. The end user only accepts segmentation output from the confident areas while providing feedback to the model on less confident areas. The model learns based on this feedback, and as a result, its performance improves over time. Edge information is one of the image’s most salient features, and it can boost segmentation accuracy when integrated into neural model training [7]. We formulate a novel acquisition function that leverages the variance of the predicted score across the gradient surface of the input to measure uncertainty.

Empirical results show that PIAA outperforms other state-of-the-art AL methods in three OCT image datasets. This paper is an extended version of EdgeAL [8], offering additional experimental results, illustrations, and practical use cases to provide a more comprehensive elaboration of the methodology, experiments, and results of the algorithm. The extended content is presented to enhance the understanding of the algorithm’s capabilities and use cases.

2 Related work

Active learning is applied to a variety of tasks, including natural language processing, computer vision, and reinforcement learning, and it is expected to play a major role in the development of interactive machine learning methods. It is a cost-effective method that selects the most informative samples for annotation to improve model performance based on uncertainty [9], data distribution [10], expected model change [11], and other criteria [12]. A simpler way to define uncertainty is to use the posterior probability of the predictions, e.g., to select an instance with the least confident posterior probability [9, 13] or the margin between posterior probabilities for different predicted class [14, 15]. Some methods [16, 17] use the entropy of class posterior as an uncertainty measure. These methods are often used in conjunction with sampling-based strategies to estimate model uncertainty which is based on the inconsistency of predictions [3, 12, 18].

In the context of active learning, when dealing with a pool of unlabeled data, there are primarily three major strategies that can be employed to select the next batch of data that needs to be labeled. These strategies include uncertainty-based approaches, distribution-based approaches, and methods based on expected model change [19].

In the uncertainty-based approach, the learning algorithm seeks out samples that carry the highest degree of prediction uncertainty. This can be gauged by measuring the posterior probability of a predicted class [13, 20]. The fundamental belief underpinning this approach is that these samples, once they have been labeled, hold the potential to provide the most critical information that can enhance the learning capacity of the model.

The distribution-based approach in active learning is centered on the selection of data points that embody the entire distribution of an unlabeled data pool. The underlying premise of this approach is that learning from a representative subset of the data can yield results that are as competitive as learning from the entire data pool. This approach can be implemented in several ways. For instance, Nguyen and Smeulders [21] employ a clustering algorithm to partition the data pool, thereby facilitating the identification of representative data points. Alternatively, Yang et al. [22], Guo [23] and Elhamifar et al. [24] formulate the selection of a representative subset as a discrete optimization problem, thus ensuring the selection of the most informative data points. Another approach, as proposed by [25, 26], involves evaluating the proximity of a data point to its surrounding data points, thereby selecting data points that can effectively propagate knowledge across the dataset. These methods, therefore, ensure the selection of the most representative and informative data points for model training, thereby optimizing the active learning process.

The technique of the expected model change is a more advanced and decision-theoretic strategy for model enhancement. This technique utilizes the existing model to predict the expected length of the gradient [27], anticipated future errors [28], or predicted changes in output [29] for all potential labels. These strategies, initially designed for use with smaller models and datasets, can be assessed for their efficiency when applied to larger deep networks [30, 31] and extensive datasets [32].

The approach based on uncertainty [13, 33] has shown robust results for classification tasks. However, it requires a task-specific design for other tasks as it leverages network outputs. In a more general approach, Gal et al. [34] achieves uncertainty estimates via multiple forward passes using Monte Carlo Dropout. Although this method has been validated with small-scale classification tasks, it is computationally demanding for recent large-scale learning due to the need for dense dropout layers, which significantly slow down the convergence rate. Beluch et al. [35] proposes an ensemble method that consists of 5 deep networks to measure uncertainty through disagreement. While it has demonstrated cutting-edge classification performance, it is not efficient in terms of memory and computation for large-scale challenges.

Many AL methods are adopted for segmentation tasks [14, 36, 37]. Gorriz et al. [36] propose an AL framework Melanoma segmentation by extending Cost-Effective Active Learning (CEAL) [38] algorithm where complimentary samples of both high and low confidence are selected for annotation. Mackowiak et al. [37] use a region-based selection approach and estimate model uncertainty using MC dropout to reduce human-annotation cost. Nath et al. [14] propose an ensemble-based method where multiple AL frameworks are jointly optimized, and a query-by-committee approach is adopted for sample selection. These methods overlook the incorporation of prior information, such as the structure of the image, edge details, and morphological data, in their uncertainty estimation process. Authors in [39] propose an AL framework for multi-view datasets [40] segmentation task where model uncertainty is estimated based on Kullback–Leibler divergence (KL-divergence) of posterior probability distributions for a disjoint subset of prior features such as depth, and camera position.

However, this viewpoint information is not always readily available in medical imaging, and even when it is, it may not make a significant difference. This is largely due to the static positioning of most medical imaging devices, which limits the variability and potential impact of different viewpoints. We leverage edge information as a prior for AL sampling based on previous studies where edge information has improved the performance of segmentation tasks [7]. To the best of our knowledge, while numerous classical computer vision research studies have demonstrated that edge detection methods can be utilized for segmentation [41], there has not yet been any exploration of using image edges as an a priori in active learning.

Moreover, there is not sufficient research other than [42] related to Active Learning for OCT segmentation. The proposed approach in [42] requires foundation models [31] to be pre-trained on large-scale datasets in similar domains, which can be infeasible to collect due to data privacy. On the other hand, our method requires only few samples (\(\sim\)2% of the usual subset) for initial training, overcoming the limitation of the need for a large dataset.

3 Methodology

Fig. 1
figure 1

Workflow of our algorithm. Given OCT images, by using Monte-Carlo simulation our algorithm first computes the edge entropy (EE) and edge divergence (ED) maps of the outputs of the segmentation model. Based on these maps, it calculates the overlaps between superpixels and recommends the annotation regions. The algorithm then utilizes the recommended region to guide the annotation process, with the annotation being stored for future trainings

Figure 1 illustrates our active learning method which comprises four key phases. First, we initiate network training using a subset of labeled images, typically a small fraction of the entire dataset (e.g., 2%). Next, we calculate uncertainty metrics for both individual input instances and specific input regions, and using this information we make choices regarding which superpixels to annotate, and we acquire annotations through a simulated oracle.

3.1 Segmentation Network

At first, we train our OCT semantic segmentation model by selecting a small, random subset of labeled data \(D_s\), which is used as the seed set. The remainder of the labeled data is used to simulate an Oracle. For our primary architecture, we use Y-net-gen-ffc (YN*), and we choose not to initialize it with pre-trained weights due to its documented superior performance [5].

Y-Net is composed of two distinct encoder branches: the spatial encoder for convolutional blocks and the spectral encoder to integrate fast Fourier convolutional (FFC) blocks [43]. The decoding process is handled by a single decoder architecture, which takes the spatial and spectral features extracted by the encoder networks as input and generates the segmentation map. Similar to the architectural design of U-Net [44], Y-Net follows an autoencoder-based structure, incorporating skip connections that link spatial encoder blocks with decoder blocks. The function of the spectral encoder is to identify and handle global features originating from the frequency domain, which could be overlooked when depending exclusively on spatial convolutions.

Moreover, we also train DeepLabv3 and U-net models with ResNet and MobileNetv2 as encoder backbone for ablation experiments. For these models, we conduct experiments with ImageNet [32] and Kaiming [45] weight initialization.

3.2 Computing Prediction Uncertainity

PIAA aims to enhance the model’s performance by actively querying uncertain regions within unlabeled data \(D_u\) following its training on an initial dataset \(D_s\). These uncertain regions are believed to be particularly valuable for further training. To achieve this goal, we introduce a novel edge-based uncertainty measurement strategy. This approach involves the computation of two key metrics: the edge entropy score and the edge divergence score. They are utilized to assess the prediction ambiguity associated with the edges between layers in the OCT images. Figure 2 provides visual examples of input OCT data along with the measured edge entropy and edge KL-divergence corresponding to the input.

3.2.1 Edge Entropy Score

Analyzing the edges of raw OCT (Optical Coherence Tomography) inputs can provide crucial insights into image features and texture. While these edges may appear noisy at first glance, they serve as a concise representation of all the alterations present in an image. The Sobel operator, as detailed in [7], is a suitable tool for detecting edges in the input image. Let’s define the normalized absolute value of edges in an image \(I_i\) of size (MN) as \(S_i\). In this context, \(|\nabla I_i|\) represents the absolute gradient, and \(S_i\) is calculated using the following equation:

$$\begin{aligned} S^{m,n}_{i} = \frac{|\nabla I^{m,n}_{i}| - \min (|\nabla I_i|)}{\max (|\nabla I_i|) - \min (|\nabla I_i|)} \quad \text {for } m,n \in M, N \end{aligned}$$

\(\min (|\nabla I_i|)\) and \(\max (|\nabla I_i|)\) represent the minimum and maximum values, respectively, within the absolute gradient matrix \(|\nabla I_i|\).

Additionally, to assess the probability that each pixel in an image belongs to a specific class denoted as c, we rely on the network’s output, represented as \(P_{i}^{(m,n)}(c)\). To introduce uncertainty into our observations, we adopt a Monte Carlo (MC) dropout simulation method as outlined in [34]. This involves averaging predictions over \(\vert D \vert\) occurrences. Consequently, an MC probability distribution indicates the likelihood of a pixel at position (mn) in the image \(I_i\) belonging to class c, where C represents the set of segmentation classes. During the neural network evaluation phase, we run MC dropouts \(\vert D \vert\) times and measure \(P_{i}^{(m,n)}(c)\) using Eq. (1):

$$\begin{aligned} P_{i}^{(m,n)}(c) = \frac{1}{\vert D \vert } \sum _{d=1}^D P_{i,d}^{(m,n)}(c) \end{aligned}$$
(1)

Following [46], we apply contextual calibration on \(P_{i}^{(m,n)}(c)\) by \(S_i\) to prioritize significant input surface variations. Now, \(S_i\) is linked with a probability distribution, with \(\phi _{i}^{(m,n)}(c)\) having information about the edges of input. This formulation makes our implementation unique from other active learning methods in image segmentation.

$$\begin{aligned} \phi _{i}^{m,n} (c)= \frac{e^ {P_{i}^{(m,n)}(c) \cdot S_i {(m,n)}}}{{\sum _{k \in C} e^{ P_{i}^{(m,n)} (k) \cdot S_i^{(m,n)}}}} \end{aligned}$$
(2)

We name \(\phi _{i}^{m,n} (c)\) as contextual probability and define our edge entropy by following the entropy formula of [17].

$$\begin{aligned} EE_{i}^{m,n} = - \sum _{c \in C} \phi _{i}^{m,n} (c) \log ( \phi _{i}^{m,n} (c)) \end{aligned}$$
(3)

3.2.2 Edge Divergence Score

In regions with pronounced edges or gradients, the edge entropy metric signifies the extent of inconsistency in the network’s predictions for each individual pixel in the input. Nevertheless, it is imperative to quantify the extent of this uncertainty. To achieve this, we employ the concept of KL-divergence to quantify the dissimilarity between \(P_{i}^{(m,n)}\) and \(\phi _{i}^{(m,n)}\) for a specific pixel located at coordinates (mn) within an input image. This approach is based on the concept of self-knowledge distillation within the context of \(I_i\) [47]. The edge divergence score, denoted as \(ED_{i}^{(m,n)},\) can be formally defined using Eqs.  1 and 2 as follows:

$$\begin{aligned} ED_{i}^{(m,n)} = D_{KL} \big ( P_{i}^{(m,n)} || \phi _{i}^{(m,n)} \big ) \end{aligned}$$

Here \(D_{KL} \big ( P_{i}^{(m,n)} || \phi _{i}^{(m,n)} \big )\) quantifies the distinction between the model’s predictive probability and the contextual probability for pixels belonging to the edges of the input (Fig. 2).

3.3 Superpixel Selection

Clinical images often have a sparse representation, and the critical or relevant information is localized in a small portion of the image. This characteristic can be particularly advantageous for active learning-based annotations allowing experts to concentrate on the most informative areas [37]. We use a traditional segmentation technique, SEEDS [48], to leverage the local structure from images for finding superpixels. Annotating superpixels and regions for active learning may be more beneficial to the user than annotating the entire picture [37].

We calculate the mean edge entropy \(EE_{i}^r\) and mean edge divergence \(ED_{i}^d\) for a certain area r within a superpixel. These can be expressed as follows:

$$\begin{aligned} EE_{i}^{r} = \frac{1}{|r|} \sum _{(m,n) \in r} EE_{i}^{(m,n)} \end{aligned}$$
(4)
$$\begin{aligned} ED_{i}^{r} = \frac{1}{|r|} \sum _{(m,n) \in r} ED_{i}^{(m,n)} \end{aligned}$$
(5)

Here, |r| represents the number of pixels within the superpixel region.

Fig. 2
figure 2

Illustrations of an instance of a an OCT slice along with its associated b edge entropy map, c edge divergence map, and d designated query region by our PIAA method. These visual representations highlight a notable observation: the right side of the OCT slice exhibits reduced clarity in retinal layer separation lines. This diminished clarity could potentially account for the model’s heightened uncertainty within that specific region

We use regional entropy to identify the optimal superpixel for our selection strategy, selecting the one with the highest value based on [39].

$$\begin{aligned} ( i,r) = \underset{(j,s)}{\arg \max }\ \quad EE_{j}^{s} \end{aligned}$$
(6)

Following [39], we identify a subset of superpixels in the dataset with a 50% overlap, forming a set, R. We choose the superpixels with the largest edge divergence to determine the ultimate query (sample) for annotation:

$$\begin{aligned} (p,q) = \underset{(j,s) \in R }{\arg \max }\ \big \{ ED_{j}^{s} \quad | \quad (j,s) \cap (i,r); (i,r) \in D_u)\} \end{aligned}$$
(7)

After each selection, we remove the chosen superpixels from set R. This selection process continues until we have selected a total of K superpixels from set R.

In the edge case, when gradients are absent in the image, edge divergence (\(ED_{i}^{(m,n)}\)) is assigned constant for every pixel, as the contextual probability (\(\phi _{i}^{(m,n)}\)) becomes the same as the original probability (\(P_{i}^{(m,n)}\)). This results in edge divergence being zero for every pixel. However, edge entropy (\(EE_{i}^{(m,n)}\)) is calculated based on calibrated probability (\(\phi _{i}^{(m,n)}\)), which is basically \(P_{i}^{(m,n)}\).

During the process of query selection, we initially arrange superpixels based on edge entropy (\(EE_{j}^{s}\)). From this set of superpixels, we select the one with the highest edge divergence (\(ED_{j}^{s}\)). In cases where all divergence values are the same (zero), the selection of queries is determined randomly based on divergence values. Nevertheless, since the initial criterion for superpixel selection was the order of \(EE_{j}^{s}\), the entropy of a superpixel becomes the decisive factor in query selection criteria. If edges are present in the image, the query criteria are determined by edge divergence. On the other hand, in the absence of edges, the criteria shift to the entropy of the model MC output, \(P_{i}^{(m,n)}\).

3.4 Simulated Labeling (Oracle)

A simulated annotator is used to label the ground truth for our active learning system. This virtual annotator is known as the oracle, and it has access to all of the ground truth label information, as illustrated in Fig. 1. Upon obtaining the selected superpixel maps described in Sect. 3.3, we acquire the corresponding ground truth information for those regions from the oracle. At each active iteration labeled data set expands with the annotated data and the unlabeled data set shrinks. In each active learning iteration, the model is freshly trained on the updated dataset.

4 Experiments and Results

In this section, we give a comprehensive overview of the datasets and architectures utilized in our experiments. Then, we present our extensive experimental results and compare them with results from other state-of-the-art methods to illustrate the effectiveness of our approach. We compare our AL method with nine other well-established active learning strategies: softmax margin (MAR) [15], softmax confidence (CONF) [38], MC dropout entropy (MCDR) [34], softmax entropy (ENT)[17], cost-effective active learning (CEAL), core-set selection (CORESET) [49, 36], and regional MC dropout entropy (RMCDR) [37], maximum representations (MAXRPR) [50], and random selection (Random).

4.1 Datasets and Networks

To evaluate the performance of our method, PIAA, we conduct experiments on three OCT segmentation datasets: Duke [51], AROI [52], and UMN [53]. The Duke dataset consists of 100 B-scans obtained from 10 different patients, the AROI dataset contains 1136 B-scans from 24 patients, and the UMN dataset comprises 725 OCT B-scans from 29 patients. Notably, the segmentation task in these datasets involves classifying into nine, eight, and two distinct segmentation classes in Duke, AROI, and UMN, respectively, encompassing various fluid and retinal layers. In accordance with established conventions and dataset guidelines, we adhere to a 60:20:20 train-test-validation split for the experiments, ensuring that data from a single patient is not mixed across these splits.

Furthermore, for uniformity and compatibility with our experimental setup, we resize all images and their corresponding ground truth segmentations to a common resolution of \(224 \times 224\) using a Bilinear approximation method.

To assess the robustness and generalizability of PIAA, we perform a fivefold cross-validation (CV) specifically on the Duke dataset, while taking care not to include data from specific patients in the same fold for both training and test sets. The results of this fivefold CV analysis are presented in Table 1, summarizing the performance outcomes of our approach.

Table 1 Results from a fivefold cross-validation (mean Dice scores ± standard deviation) for the PIAA and other active learning techniques method on the Duke dataset for the YN* segmentation Model
Table 2 Overview of the test performance (average Dice score) is achieved by different active learning algorithms when they are combined with various deep learning architectures
Fig. 3
figure 3

The performance of the segmentation model (YN*) is indicated by the mean Dice score. PIAA and various other active learning (AL) methods are compared to baseline results across the Duke, AROI, and UNM datasets. The solid and dashed lines respectively denote 100% and 99% performance scores of the YN* when trained with the entire labeled dataset

We conduct experiments using the Y-net (YN) [5], U-net (UN) [31], and DeepLab-V3 (DP-V3) [39] architectures, employing both ResNet and MobileNet backbones [31]. The results of these experiments are presented in Table 2.

It’s worth noting that we do not utilize any pre-trained weights in our experiments, except for the ablation study outlined in Table 2. We utilize a mixed loss combining Dice and Cross-entropy and employ the Adam optimizer with learning rates of 0.005 and a weight decay of 0.0004. The training process spans 100 epochs with a maximum batch size of 10, which remains consistent throughout all active learning iterations. Our hyperparameter settings and evaluation metric (Dice score) are in alignment with those specified in [5], which serves as the baseline for our experiments.

4.2 Comparisons

Figure 3 compares the performance of PIAA with other contemporary active learning algorithms for image annotation across three datasets. Results show that PIAA outperforms other methods on all three datasets. By using only 12% (\(\sim\)8 samples), 2.3% (\(\sim\)16 samples), and 3% (\(\sim\)14 samples) labeled data on Duke, AROI, and UNM datasets respectively, our method can consistently achieve 99% of maximum model performance. Other AL methods, including CEAL, RMCDR, CORESET, and MAR, require significantly more samples to achieve these performances and their performances are not consistent across the three OCT image datasets. For a fairer comparison, we report the results using the same segmentation network YN* and hyperparameters (described in Sect. 3.1) for all the active learning strategies.

Table 1 describes fivefold cross-validation results on the Duke dataset. The segmentation results are reported in mean Dice scores. We can observe that for all the AL methods we achieve similar performance of the segmentation model, after training on a 2% seed set. However, on 12% actively selected training data, PIAA achieves performance close to model training on full data and significantly outperforms the other AL approaches. Results in row p100 indicate the performance on the testset when the model is trained using the complete training dataset. CEAL and MAXRPR could perform similarly after training YN* on 43% of actively selected samples.

Additionally, in order to investigate the robustness of PIAA independently, we conduct experiments using four different network architectures and default weight initialization methods in PyTorch (LeCun initialization)Footnote 1 and ImageNet weight initialization. Results in Table 2 show that our proposed active learning method consistently outperforms other AL methods across different foundation models for segmentation. In contrast, other active learning methods like RMCDR and MAXRPR exhibit strong performance only when applied to pre-trained models Table 2.

Classwise segmentation performance comparison of different AL methods for different retinal and fluid layers of the Duke, AROI, and UMN datasets are also reported in Tables 3, 4 and 5 respectively. For the Duke dataset in Table 3, by training the Y-net segmentation model (YN*) with 12% actively selected data using PIAA strategy, we achieve model performance close to p100 across all the 9 segmentation classes. The other AL methods achieve significant scores for a few classes including ONL-ISM and OS-RPE but underperform for the rest of the classes. Similar trends are also observed for different retinal and fluid layers for the AROI (Table 4) and UMN (Table 5) datasets. Moreover, Fig. 4 visually demonstrates the output of the segmentation model trained using our AL strategy.

To highlight the significance of the partial annotation, qualitative examples are described in Fig. 5. Instead of requiring the full images to be annotated, our region-based active learning strategy, PIAA, finds the partial region that is most uncertain and needs to be annotated. Thus reducing the annotation effort significantly.

Fig. 4
figure 4

Sample OCT (Duke) test images, along with manually annotated ground truth segmentation maps and our prediction results. The corresponding predictions are generated after training the model (YN*) using only 12% of the available samples

Table 3 The fivefold test result of the YN* segmentation model trained on 12% actively selected data (Duke) using different active learning methods (rows)
Table 4 The class-wise (columns) performance comparison between different active learning methods (rows) on AROI dataset when YN* model is trained on 3% actively selected data
Table 5 Class-wise (rows) performance comparison between different active learning methods (columns) on UMN dataset when YN* model is trained on 3% actively selected data
Fig. 5
figure 5

Examples of a input images, b ground truth segmenting, and c the corresponding query (highlighted white-colored regions) from our active learning algorithm. For our method, annotation beyond the white region can be skipped during training because the model has lower uncertainty in that region. As a result, partial annotation is sufficient for training

5 Conclusion

PIAA is a novel active learning technique for OCT image segmentation, which accomplished results similar to full training with a relatively small amount of data by utilizing edge information to identify regions of uncertainty. By harnessing edge information, our method not only minimized the labeling effort but also exhibited significant promise in the medical domain where labeled data is scarce. The performance of PIAA in OCT segmentation suggests that a significant amount of data is not always required to learn data distribution in medical imaging.

Furthermore, the reliance on edges as a fundamental image characteristic positions PIAA for potential adaptation and application in diverse domains with minimal model modifications. This adaptability opens up prospects for future research and development, encouraging exploration in other classical image processing and analysis areas. Our findings suggest that PIAA can significantly contribute to future active learning techniques for more resource-efficient and effective methods across multiple fields.

Our future work in the development of our active learning framework will focus on enhancing scalability, security, and efficiency while maintaining data sovereignty. Maintaining the privacy and security of patient data is paramount. To address these concerns, we have planned to implement the Minimum Data Transfer Service to anonymize structured and unstructured data, including medical images. This should ensure that no identifiable patient information transfers to any Cloud Computing Services by minimizing training data, and aligning with data sovereignty requirements.

Storing the entire medical data on the cloud can lead to privacy breaches, regulatory non-compliance, data sovereignty issues, increased security risks, and higher operational costs. The proposed PIAA active learning strategy will be utilized for selectively identifying and labeling relevant data, minimizing unnecessary data exposure to the cloud. Currently, the active learning framework is deployed on GPUs with limited power. We plan to transition the model training step to the cloud platform. The use of Sovereign Cloud may provide additional data protection measures compared to standard cloud platforms and could enhance trust among healthcare industry partners for the hosting of healthcare data in a secure manner.