Introduction

Coronary artery disease (CAD) is the main cause of death worldwide [1, 2]. Considering the burden of CAD to patients and the healthcare system, early detection of disease and prediction of patients’ risk of developing adverse cardiovascular events have become crucial to advancements in the medical field, leading to breakthroughs in disease treatment and patient care. Such risk stratification also plays an important role in defining when to initiate preventive therapies or change treatment strategies [2,3,4].

Coronary artery calcium scoring (CACS) is a non-invasive CT technique for the quantification of coronary artery calcium (CAC), used to determine the presence and extent of calcified atherosclerotic plaque and overall plaque burden. The use of CT imaging for direct visualization and characterization of plaque burden enables a more individualized approach for risk assessment than traditional risk factors [2, 4], providing insight into disease stages and response to treatment [5, 6]. Currently, CACS is most commonly performed in asymptomatic individuals, particularly those at intermediate cardiovascular risk. The test is used to estimate cardiovascular risk and predict future cardiac events by providing incremental risk information beyond mere risk factor-based approaches (e.g., Framingham Risk Score) [2].

Quantification of CAC usually needs human input to identify and mark calcified lesions in each image section [3, 7]. This is a time-consuming approach that requires a moderate level of expertise. Therefore, the development of an automated, precise postprocessing method is desired to reduce the need for human observer interaction [7, 8]. Recent advances in artificial intelligence (AI) modeling, including deep learning (DL) with convolutional neural networks (CNN), have provided promising applications in numerous industries, including medical imaging [9,10,11]. Currently, there has been growing interest in the potential of AI to improve various steps of the medical imaging workflow, especially in automatic detection and characterization of radiology findings [9,10,11,12,13,14,15]. Accordingly, many investigations have focused on the potential role of DL in CACS and shown promising results for clinical application in this field with the potential to increase demand [7, 8, 16].

This article reviews current applications of AI-based algorithms for CACS with their recent achievements, challenges, and potential clinical impact. This review also provides a brief summary of AI including important terms for a basic understanding of the technique.

Basics of AI: Terms and Concepts

The application of AI in medical imaging is becoming an integral part of clinical medicine [10, 12]. AI describes any computational program performing tasks that are typical of human intelligence. It incorporates a system that can make automatous decisions based on input data, without immediate human control [9,10,11,12,13].

The concept of ML was introduced early in 1959 and is a subfield of AI that provides computers with the ability to learn rules and extract patterns from data [10, 13, 17]. One subset of ML includes computational models and algorithms inspired by the complex neuronal connections in the human brain and is called an “artificial neural network (ANN).” An ANN is structured in layers composed of interconnected nodes: one input layer, which receives input data; one or more hidden layer, which extracts the pattern of data; and one output layer, which produces the results [9, 10]. In general, train-test systems are used to develop AI models. These consist of three sets: training, validation, and testing [9•]. The training set is needed for the algorithm to learn from example by fitting the model. For validation, a separate data set is used to evaluate different model fits and to adjust the model parameters for optimizing the initial algorithm. Then, the trained model is tested on a new data set to assess how well the algorithm performs under prespecified conditions [9, 13].

Compared with ML, however, AI incorporates a broader scope of intelligent functions generated by computer systems such as pattern recognition, planning, problem solving, recognizing objects, and understanding language. By definition, however, ML algorithms have the ability to “improve by learning” through experience without explicit rules [9,10,11,12,13, 18, 19]. Therefore, rule-based algorithms such as computer-aided detection/diagnosis fall within the category of AI and are not considered as belonging to ML. When used as a broad term, however, computer-aided detection/diagnosis may encompass ML approaches [9•].

Deep learning (DL) is a subset of ML and a special type of ANN that has multiple hidden processing layers, which characterize the depth of the network. Multiple processing layers enable mathematical calculations before producing outputs, thus allowing the DL model to learn the representation of data with high-level abstractions [9, 13, 20]. Among the different DL models, CNN has gained much popularity in computer vision and medical image analysis, especially for extraction of visual features from images. These convolutional DL algorithms have exhibited robust performance similar to human level performance in various areas, including medical imaging [13, 21, 22]. Recent success in DL applications in medical imaging are possible because of a combination of accelerated computing power, advances in hardware technologies such as graphic processing units, increased available datasets, and user-friendly software needed to analyze the data [9, 10, 20]. Currently, DL has the potential to improve multiple steps of workflow in medical imaging, such as patient scheduling, image acquisition, automated detection and interpretation of findings, and reporting and analysis. Automated detection and characterization of abnormalities on medical imaging is a domain where AI has exhibited significant progress and thus may have an immediate and positive impact [15, 21].

Application of AI in Detection and Quantification of Calcified Plaques

CACS is typically performed by using non-contrast-enhanced, ECG-triggered calcium scoring CT. The use of beta blockers is not generally required but may be administered with a small benefit in accuracy in cases of high heart rates. The scan parameters include a tube voltage of 120 kVp and variable tube currents according to patient size. Typically, using a prospectively ECG-triggered sequential scan, images are obtained during diastole and reconstructed in 3 mm section thickness for CACS. Coronary calcification is defined as a lesion of at least 3 consecutive pixels (or 1 mm2) with an attenuation of ≥ 130 HU. Using dedicated software, calcifications in coronary arteries are manually selected and quantified with the Agatston score, which is the weighted sum of each area of calcified plaque multiplied by a factor (between 1 and 4) related to corresponding CT density [2,3,4].

Currently, CACS is performed not only in dedicated calcium scoring CT, but also in the context of other types of CT studies, such as coronary CT angiography (CTA) or non-gated chest CT for lung cancer screening [23,24,25,26]. However, quantification of CAS requires slice-by-slice expert level detection and annotation of calcified lesions. Since this manual approach requires a certain degree of clinical experience and is a monotonous and time-consuming process, a more automated method is highly desirable, especially in large screening populations or in settings where it is not primarily intended but may be clinically useful [8, 27, 28].

Automated CACS

To overcome these drawbacks, several automated methods have been developed, from rule-based methods [29, 30] to ML and more recent DL approaches. The major obstacle in CACS is of differentiation of CAC from other structures with similar attenuation, such as mitral anulus calcifications [27].

ML-based methods for quantification of CAC on ECG-gated, non-contrast-enhanced cardiac CT were used prior to the introduction of DL. These approaches are based on identification of CAC among a large set of candidate lesions. Those images are designated with morphologic features such as size, shape, texture, and location to discriminate CAC from other neighboring candidates such as mitral annulus, aortic and pulmonary calcifications [6•]. Among these features, location features are of particular importance, which are constructed from anatomy-based approaches (heart coordinate system or spatial relation) [31,32,33]. Later, atlas-based localization of the coronary tree was introduced by creating a probabilistic CAC map [34]. The map determined the individual positions of the 3 major epicardial coronary arteries by using independent CTA-based atlases to compute the location estimate for each individual artery [35].

Automated CACS may also be performed on standard contrast enhanced cardiac CTA scans. An initial segmentation step is performed of the contrast-filled coronary artery tree. Because calcifications are usually higher in attenuation than contrast enhanced vascular lumen, calcifications are easily detected by searching for structures with high attenuation along the segmented coronary tree. Since contrast-filled luminal attenuation varies depending on injection protocols, from 250 to 600 HU, a threshold for calcium detection is recommended to be set at 2 standard deviations or 120–150% of mean vascular attenuation, in lieu of a fixed value [36,37,38].

Current Status of DL-Based CACS

Most recent approaches for automated CACS adopt DL algorithms using CNNs, well known for their capabilities of automatic extraction of visual features. In contrast to ML, DL-based methods typically classify individual voxels in place of candidate lesions [6•]. In their early study, Lessmann et al. [39] used a single CNN that classified CAC lesions in lung cancer screening chest CT. Wolterink et al. [7], however, used a pair of CNNs to classify all voxels to identify CAC in coronary CTA. First, a CNN identified voxels of potential CAC and discarded the majority of non-candidate voxels and then a second CNN further discriminated between CAC and neighboring similar negatives. In both studies [7, 39], to simplify the classification by reducing the volume of interest, a bounding box was created to localize the heart using a combination of three additional CNNs, where each detects the heart in an orthogonal plane.

More recently, without such localization methods, Lessmann et al. [40] used sequential CNNs to classify CAC as well as valve and aortic calcifications on chest CT images. With the reinforced capabilities of feature extractions and a large receptive field, the first CNN, was used to identify and label potential calcification voxels according to their anatomical locations. Subsequently, a second CNN refined the output of the first by identifying true calcifications among the candidates with similar shapes and locations. Even though their approach was challenging for training and evaluation, mostly due to low-dose protocol without ECG synchronization, this new strategy achieved good performance (F1 value of 0.68–0.90) in calcium detection and strong agreement (75–91%) with manual reference standard in cardiovascular risk categorization. Another study [8], a DL-based automated calcium scoring method for non-contrast ECG-triggered cardiac CT, showed high accuracy when compared to manually obtained reference scores in 511 patients. This calcium scoring application relied on a combination of multiple CNNs for understanding the context of the CT image (Fig. 1). The calcium scoring model was trained using an annotated dataset of 2000 coronary calcium CT scans to determine the probability of a voxel being a coronary calcification. This model showed that 93.2% of patients were classified into the same risk category as by the human observers.

Fig. 1
figure 1

The combination of a convolutional neural network for the image features and a fully connected network for the spatial coordinate features. A pre-computed coronary territory map is also used as an additional input to specify the likelihood of different voxels belonging to coronary arteries (a). An example case with automated detection of calcifications in coronary arteries overlayed and color coded in red (b)

Although several DL-based CACS methods have been published and achieved excellent performance, they have been specialized to a specific type of CT examination and used two steps similar to traditional CACS method, based on identification of calcium first and quantification thereafter on a specific type of CT examination. De Vos et al. [27], however, proposed a method for direct quantification of calcium score in input image sections without calcium segmentation using data from non-enhanced, ECG-synchronized cardiac CT and non-enhanced chest CT for lung cancer screening. By adopting the workflow for direct regression of CACS, this method successfully achieved accurate prediction of CACS in different types of CT examinations nearly on a real-time basis, showing an intra-class correlation coefficient (ICC) between automatic and manual CACS of 0.98 for both cardiac and chest CT.

In this regard, a recent multicenter study was performed to validate the performance of a DL algorithm in previously unexposed types of CT examinations [41••]. The results showed that the method trained for a lung-screening low-dose chest CT (baseline) adapted also well to several CT study types in a variety of population groups, yielding ICCs of 0.79–0.97 between automatically and manually obtaining scores. When a few representative cases of the respective CT type were supplemented to the baseline for performing protocol-specific training, ICCs between DL-based and manual CACS in that specific type of CT examinations improved to 0.84–0.99. Furthermore, a combined DL model trained with all available CT protocols (combined training) also improved the DL performance for CACS, with ICCs ranging from 0.85 to 0.99. Although there would be shifting in absolute score by using scan protocols other than that of CACS (e.g., tube voltage of 100 kVp), nevertheless, this study showed promising results that suggest extensibility of DL—i.e., applying one DL model trained for a specific CT type to diverse types of CT examinations.

Advantages and Limitations of AI in CACS

While various automated methods have been examined for years, the recent emergence of ML followed by a DL approach has provided considerable progress in the field. Use of DL in CACS has the potential to reduce human interaction for performing time-consuming and monotonous tasks in the clinical setting [6,7,8,9,10]. The re-direction of a clinician’s role to more value-added tasks has the potential to reduce costs and increase quality of healthcare [6, 8]. In addition, DL in CACS could be used as an additional screening option to CT studies including the heart without significant additional effort in terms of human interaction and radiation [6, 8, 27]. Furthermore, recent DL-based approaches achieved similar performance compared to manual reference but generated calcium score even hundred times faster, reaching in almost real-time basis with less than a few seconds [8, 13, 27]. For wide-spread use of AI in CACS, such fast computation is highly desirable and particularly important to screen large populations.

However, the major obstacles for development and adoption of AI in clinical practice are the need for a large collection of high-quality ground truth databases as well as standardization of diagnostic techniques [9, 42]. Historically, obtaining qualified training datasets has been very challenging due to few case numbers at an early stage of algorithm development [6, 9] or confounders such as motion or partial volume artifacts, which are generated through a diversity of CT acquisition and reconstruction techniques—i.e., low dose protocol, non-ECG synchronization, or thick slice reconstruction [43, 44]. Fortunately, for the development of DL-based CACS, large annotated datasets are readily available and image quality continues to improve thanks to constant advances in CT technology and reconstruction techniques [41, 45, 46]. Furthermore, CACS is a fairly obvious task that does not require a high level of expertise and the quantification method is highly standardized by adopting the Agatston score. In this respect, CACS became an excellent candidate for automated approaches and currently, several types of early solutions are commercially available for clinical use [16].

Finally, to date, most AI algorithms for CACS have been developed for specific CT protocols and validated mostly in single center studies. Therefore, despite of early promising results for AI-application, their performance requires further validation in clinical practice using diverse sets of CT protocols in larger populations in accordance with ongoing technical improvement in AI.

Conclusion

Since detection and quantification of coronary calcium plays a key role in risk stratification of CAD and management of patients, automatic CACS may provide diagnostic aids to physicians in clinical practice. Although still under development and refinement, the current rapid progress in AI technology in this field may allow for future routine application of automated CACS in clinical practice.