Two related advancements are among the necessary requirements for 3D printing to more completely realize its potential for clinical care: the first is that models are reimbursed. The second is that a complete quality and safety program must be developed. This chapter will highlight advances that the field has made collectively, and it will also point out the deficiencies that should be viewed as “action items” for current and emerging leaders in the field to tackle. In some ways, 3D printing can be considered as a new method to display data, following the progression in technology that the picture archiving and communication system (PACS) made over the film alternator, and then to supplement that data with strategies to enhance care pathways. Regardless of how the field is considered, we believe that a very useful strategy to envision the work to be done is to follow the steps necessary to propel this new technology to wider use in patient care.

Recently, the Radiological Society of North America (RSNA) launched the Special Interest Group for 3D Printing, emphasizing the importance of 3D printing in medicine and providing an organizational infrastructure. The Guidelines Subcommittee of the RSNA Special Interest Group, led by Dr. Adnan Sheikh of the University of Ottawa, is actively working to establish recommendations that will represent important practice parameters. This includes both the conversion of DICOM images to Standard Tessellation Language (STL) files and the design of nonanatomic STL files (e.g., surgical guides) based on anatomy visualized in DICOM images and the subsequent 3D printing of models from those files.

One important pathway toward general acceptance, and ultimately reimbursement, for 3D printing among specific clinical scenarios, is the development of guidelines akin to those in place American College of Radiology (ACR) (Appropriateness Criteria® (AC) . The RSNA Special Interest Group is formulating an algorithm to start, using well-established clinical scenarios. The usual three categories of appropriateness, as adopted by the AC, can be divided into usually appropriate, maybe appropriate, and rarely appropriate, and in general these have become integrated to clinical decision support engines. The role for appropriateness in 3D printing is critically important since the assessment for each of the clinical indications can be vetted among multidisciplinary groups, and the format of appropriateness enables organization of the literature.

Next to practice parameters and Appropriateness Criteria, the ACR model addresses quality control (QC) of a technology used for medical imaging. For 3D printing used to assist anatomic visualization, we believe QC will revolve around ensuring accuracy and reproducibility. At present, a printer producing anatomic models used for visualizing anatomy is viewed as equivalent to a film printer, making copies—albeit three-dimensional—of DICOM images, and thus is not regulated by the Food and Drug Administration (FDA) (Di Prima et al. 2016). However, this view may change in the future (Christensen and Rybicki 2017). It is however noted that 3D printer considerations are, and will remain, within the FDA purview when a printer is used in the process of manufacturing medical devices (FDA 2016).

Independent of the future landscape of FDA regulation, it is important to document the quality and safety of physical models produced by a 3D printer so that they can be most effective for their intended use. The ACR defines QC as “distinct technical procedures that ensure the production of a satisfactory product (i.e., high-quality diagnostic images)" (ACR 2012, 2015). These procedures are implemented primarily via the use of resolution and contrast phantoms to test imaging system fidelity. Similar to these QC testing guidelines, we believe that quality control testing of 3D printers will involve the use of specific phantoms that are to be regularly printed in order to ensure the production of a satisfactory product, in this case high-quality medical models. Much work in this arena, reviewed below, is currently underway to design and validate such phantoms specifically for use in clinical 3D printing. Whenever a digital reference standard of the intended medical model is available, mathematical metrics can also be used to establish procedures to determine the overall accuracy of a 3D-printed model. More importantly, such mathematical measures of accuracy can be used to develop interpretive quality assurance processes for radiologists and technologists involved in the creation of 3D-printed models (George et al. 2017). This is an active area of research in our group and elsewhere, and advances in this developing arena are also reviewed below. A final procedure that can be used for medical 3D printing QC is surgical or pathological correlation (Weinstock et al. 2015); this is also included in the ACR QC procedures (ACR 2012, 2015). This is straightforward for anatomic models that are 3D-printed for surgical planning or intraoperative navigation. Measurements made on the printed models can be directly compared to those made on the surgically exposed tissues (George et al. 2017; Gelaude et al. 2008) or on cadaveric specimens, a proviso that the source DICOM images used to generate the 3D printed model were acquired with the tissue in situ (George et al. 2017; Gelaude et al. 2008), to ensure that the segmentation and processing procedures are identical to those that would be used for in vivo images. Below, we describe techniques and advances for each of these QC procedures.

11.1 Phantom-Based Quality Control

In the context of 3D printing equipment, quality control is likely to rely on printer dimensional accuracy. As discussed in Chap. 2, 3D printer resolutions are typically significantly higher (<0.3 mm in all three axes) than those of most clinical imaging modalities. Resolution is the smallest scale that a 3D printer can reproduce and is only one factor affecting accuracy. Accuracy instead refers to the degree of agreement between the dimensions of the printed object compared to those intended, that is, the dimensions of the digital object stored in a STL or AMF file (Braian et al. 2016).

A number of meticulous studies using both geometric phantoms and anatomic models have reported that dimensional errors with most 3D printing modalities are <1 mm and, with current professional hardware, typically <0.5 mm (Table 11.1) (George et al. 2017). For most medical applications, this level of inaccuracy can be considered negligible. Furthermore, 3D printers have high reproducibility, as is expected since the components of well-calibrated, non-failing equipment tend to function nearly identically across runs. One study using SLA, for example, found the reproducibility of printing a skull model to be better than 0.07 mm in all three dimensions across seven prints (George et al. 2017).

Table 11.1 Studies reporting 3D printer accuracy by comparison of design STL versus printed model dimensions using commercial 3D printing equipment (>$5000)

Specific technical procedures that implement the basic methodology developed in these phantom-based studies have already been described in the medical literature toward establishing an in-hospital clinical 3D printer QC program (Matsumoto et al. 2015; Leng et al. 2017; Wake et al. 2017). In these procedures, QC phantoms containing features of sizes and shapes relevant for medical 3D printing have been digitally designed with precisely known dimensions in a computer-aided design (CAD) program . These digital QC models can be printed either at regular intervals (for preventive maintenance) or along with every patient model. Physical measurements of the printed QC phantom are then compared with the (design) dimensions of the digital model (Matsumoto et al. 2015; Wake et al. 2017). The first QC phantom proposed for medical 3D printing (Matsumoto et al. 2015; Leng et al. 2017) contained 0.5–2 linear pair resolution bars per mm (Fig. 11.1). “Second-generation” phantoms have been developed to address more complex shapes, including spherical, cylindrical, hexagonal, conical, and spiral features, both extruding and negative-shaped (i.e., holes of the prescribed shape) (Leng et al. 2017). Whenever possible, manual Vernier caliper measurements should be replaced by more precise and more numerous dimensional measurements of the printed phantoms, for example, via the use of 3D laser scanning or CNC coordinate measuring machines (Liacouras 2017).

Fig. 11.1
figure 1

Example of phantom for implementing 3D printing equipment quality control procedures developed at the Mayo Clinic. Reproduced with Permission from Leng S et al., 3D Printing in Medicine, 2017:in press

Recently, QC phantoms composed of two components that contain mirror features (i.e., positive and corresponding negative) have been proposed (Leng et al. 2017). Such phantoms enable a fit test to be used instead of physical measurements (Leng et al. 2017), simply inserting the positive half of the phantom (with features extruding) into the negative side of the phantom (with the corresponding depressions). A successful fit with no visible gaps would presumably attest to printer accuracy. This approach should not be used without some physical measurements, as phantoms printed with an incorrect scaling factor will still pass a fit test. An alternative we propose is to have one half of the fit test QC phantom manufactured using legacy manufacturing (e.g., injection molding, computer numerically controlled [CNC] milling , or laser cutting) and printing the other half with the 3D printer. A successful fit of these two halves would additionally confirm dimensional accuracy of the printed model.

It is important that QC phantoms for medical 3D printing contain features that extend in all three axes and that they also include overhangs that extend in all three axes, as different printer technologies have different accuracy characteristics for such features (George et al. 2017; Pang et al. 1995; Teeter et al. 2015). Furthermore, QC phantoms should ideally be printed using the same materials as the specific medical application for which quality control is being performed (Wake et al. 2017; Teeter et al. 2015), including color (Wake et al. 2017) as this may be achieved using different material chemistries.

11.2 Mathematical Metrics of Quality Control

Comparing agreement between two models of a tissue is a second approach toward establishing quality and safety of medical 3D printing. The two models can be two STL models, each derived from a different segmentation of a tissue depicted in a single DICOM image data set, for example each segmentation performed by a different radiologist. This scenario is useful for quality assurance (QA). The two STL files can also be the initially designed STL to be printed, and a digitized version of the printed model. This scenario is useful for QC of the individual print. A printed model can be digitized, for example, using 3D laser scanning, or tomographic imaging such as CT, and potentially even MRI (George et al. 2017; Mitsouras et al. 2017). Optical scanners are preferred as they have much higher precision (<0.01 mm) compared to CT and MRI, but they are limited to only assessing the outer surface of a model. Once the two STL models to compare are obtained there are two mathematical procedures that can be used to perform such comparisons.

11.2.1 Model Surface Distances

The first approach is to compare the “distance” between STL models. Conceptually, there is a minimum distance from an arbitrary point located on one STL surface to the other STL surface. This distance can be computed for any number of representative points (typically the nodes of the triangular STL mesh), thereby yielding a distribution of distances that possesses an average and standard deviation that together convey a quantitative assessment of the overall difference between the two models (Fig. 11.2).

Fig. 11.2
figure 2

Humerus segmented from CT by two different operators; segmentation 1 was fully automated (bone 226 Hounsfield Unit threshold), while segmentation 2 was manually edited. The former model is missing a portion of the humeral head. Comparing the two models using an STL distance metric to quantitatively assess model agreement is not meaningful; the mean distance from model 1 to model 2 is −0.36 ± 0.43 mm (range, −2.72–2.22 mm), while that from model 2 to model 1 is 1.24 ± 2.48 mm (range, −3.28–16.41 mm). The metric can potentially be used to readily determine qualitative agreement vs disagreement using an acceptable cutoff (e.g., <|1.5| mm in this figure)

This approach provides a simple comparison between STL models (George et al. 2017; Leng et al. 2017; Mitsouras et al. 2017) that can be used for QC of individual printed models. Individual printed model QC is necessary since an anatomic model may fail to print in a given printing technology (Fig. 11.3), for example, one that requires appropriate support structures such as SLA or FDM (see Chap. 2). The same model may print successfully using a different technology that fully surrounds the model being printed with support material, such as binder jetting, but forces exerted during cleaning of a model printed with those technologies may then lead to breakage of important anatomic features (Fig. 11.3). Visual inspection of a printed model should always be used as part of standard operating QC procedures to ensure that each finished medical model reflects the intended, segmented anatomy. Visual inspection is nonetheless prone to operator variability. The distance metric between STLs offers an alternative that is less prone to operator error. Specifically, the printed model can be scanned with CT in air, and the resulting images can be segmented to produce an STL model. This STL model can be aligned to the initial design STL that was sent to the 3D printer and the distance between the digitized model and original intended model calculated. Using, for example, a prespecified distance cutoff that is likely to capture missing anatomy (that failed to print) can be used as a QC procedure to detect bulk errors in the printed anatomy (Fig. 11.2).

Fig. 11.3
figure 3

Glenoid component models printed with bottom-up stereolithography printer (left panel) and bilateral renal artery aneurysms model printed with a binder jet printer (right-hand panel) exemplifying the need for per-model quality control procedures. A portion of the glenoid component failed to print (red arrows) due to large forces exerted during detachment of the model from the vat floor; additional supports (green arrow) enabled more of the component to successfully print but a portion still failed. Small renal artery in the binder jet model broke during removal of the model from the printer. These failures are model specific and likely would not have occurred if the models had been printed with different printer technologies; for example, the glenoid would not have failed in a binder jet system, and the renal artery would not have broken off if printed with stereolithography which uses stronger acrylic-based materials. A QC phantom printed at the same time as either of the models would have likely printed correctly, failing to capture these model-specific failures

This approach does however still have limitations that render it inappropriate for many 3D printing QC procedures (George et al. 2017). One limitation is that different quantitative results are obtained depending on which model is compared to which. This is readily conceptually understood for a humeral head that has been incompletely segmented by using a HU threshold for cancellous bone (226 HU). In this example, partial volume effects in locations where the bone is thin reduce the otherwise high HU of bone, and the resulting segmentation misses the bone in those locations. The distance from points on the incomplete bone to the manually fully segmented bone is likely small, since for every point on the incomplete bone model, there is a corresponding point a short distance away on the complete model. Reversing the order of comparison, the distance from a point on the complete bone that is located in a region where the bone is missing in the incomplete model can be as far as the opposite side of the bone (Fig. 11.2). Another limitation is that digitization of a printed model introduces the point-spread function as well as modality-specific artifacts of the imaging modality, in addition to 3D printing inaccuracies. For example, in one study that imaged a printed model with both CT and MRI, each modality led to a different distance to the originally-designed STL model (Mitsouras et al. 2017). An important limitation, specific to using medical imaging modalities (as opposed tousing an optical scanner) to digitize a printed model, arises from the need to segment the resulting images of the model. The resulting digitized model is highly dependent on the segmentation algorithm (George et al. 2017), even if the model is imaged in air and using an HU threshold in the range between that of air and the printed material’s CT number. A study using simple cube phantoms made of materials with CT numbers equivalent to high-density bone exemplified this limitation by assessing different segmentation thresholds ranging from 25% to 95% of the difference between the HU of water (=0) and that of the material (=1400 HU). The difference between the physical phantom and its 3D-printed replica ranged from 1 mm larger to 1 mm smaller than the phantom depending on the threshold (Naitoh et al. 2006), an order of magnitude larger effect than print reproducibility. Thus, a comparison of an STL model resulting from segmentation of images of the printed model at any one given threshold will give a different result as to the distance between this digitized printed model and the original STL model sent to the printer. A final limitation of digitizing a printed model for comparison to the initially designed model is that it is necessary to align the two STL models as the scan of the printed model will inevitably use a different coordinate system reference (landmark) than the patient scan. Registration methods used for alignment, such as CloudCompare (Russ et al. 2015) or the global registration algorithm in 3-matic (Materialise NV, Belgium) CAD software are iterative optimization algorithms and may not always find a single global minimum representing the best alignment. This precludes precise comparison of the digitized model and the initially designed model toward, establishing printer QC (which would need aprecision <0.5 mm in keeping with the resolution of typical clinical images), since different alignments will lead to different assessment of the distances between the models (Fig. 11.4).

Fig. 11.4
figure 4

Scanning a printed model with an imaging modality for comparison to the designed STL model should not in general be used as a QC procedure. Beyond introducing the point-spread function of the imaging modality into the errors that are being measured, model alignment algorithms are iterative optimization procedures that may converge to a local minimum, leading to different comparisons of the difference between two models

11.2.2 Residual Volume

A second approach to assess the differences between STL models relies on application of mathematical set theory, considering the STL models (or segmentations) of a tissue as mathematical subsets of 3D space (George et al. 2017). In this approach, a model is intrinsically considered to define a subset of the imaged volume (i.e., of three-dimensional space) that is interpreted by the radiologist to be occupied by the tissue. Mathematical set operations can be used on these subsets to quantify differences and similarities between models. For example, agreement between two STL models can be defined by set intersection (AB) (George et al. 2017) (Fig. 11.5). For two models of a tissue created from interpretation of the same diagnostic images by two independent radiologists, the intersection of the two modelss simply the volume of space that both readers agreed belongs to the particular tissue. An important assembly of set intersection and union (AB) operations yields the so-called residual volume (Cai et al. 2015, George et al. 2017) which can be used for medical 3D printing QA. It is defined as ((AB) − (AB)) (Cai et al. 2015), or, in shorthand notation as ((A-B)+(B-A)).This is the volume occupied by one or the other model, but not both and directly quantifies the disagreement between the two models (Fig. 11.5).

Fig. 11.5
figure 5

CT of patient with superior sulcus tumor . Two qualified radiology staff members segmenting the tumor differ in their interpretation of what tissue is tumor versus what is not. The two STL models of the same tumor can be analyzed mathematically using set operations on three-dimensional space to define their disagreement and agreement. If one model is a gold standard (model A in the example shown), true positive, false negative, and false positive measures are readily calculated in terms of volume (18.1, 1.3, and 4.3 cm3, respectively). Sensitivity (true positive rate), false negative rate, and false discovery rate for the interpreter producing model B are thus readily calculated (18.1/19.4 = 93.3%, 1.3/19.4 = 6.7%, and 4.3/22.4 = 19.2%, respectively)

These two measures of agreement and disagreement from set theory can in turn be used to define parameters commonly used to assess diagnostic accuracy, such as true and false positives and false negatives (George et al. 2017). For example, if one model is a gold standard, the true positive is the volume of space included both in the test and the gold standard models, i.e., their intersection (Fig. 11.5). The volume of space included in the test model but that does not belong to the tissue according to the gold standard model, i.e., (B-A), if B is the test and A the gold standard model, is then a false positive (Fig. 11.5). Finally, the false negative volume of space is that occupied by the tissue according to the gold standard model, but that is not included in the test model (Fig. 11.5). A “true negative” volume of space is not as readily defined for general 3D printing, as it would involve the volume of space that is negative for the presence of the tissue. This could be taken to mean the entirety of a scan volume, which in most cases would be a large volume compared to that of the tissue (e.g., a single tumor seen in a chest-abdomen-pelvis CT), and would thus carry little clinical significance. However, in specific scenarios, it can be meaningfully defined, for example, for a tissue within an organ such as a renal mass. In this case, the total kidney volume (including tumor) can be used to define the entirety of space, for which a true negative is meaningful. The volume of space within the kidney that both the test model and gold standard model agree is not tumor tissue would be the appropriate definition of the true negative volume in this example.

Using these definitions of true and false positives/negatives afforded by set theory, measures familiar to medical practitioners such as sensitivity, specificity, and accuracy can be defined for 3D-printed models whenever a gold standard (e.g., pathology findings or expert segmentation) is available. An appropriate QC program for a clinical 3D printing facility would calculate and rely on these metrics to ensure its practices enable the production of satisfactory medical models. Alternatively, agreement and disagreement between models, when neither model can be considered a gold standard, is an appropriate QA approach for a facility to compare different radiologist’s interpretation in creating 3D-printed models for individual cases. Furthermore, these metrics can be used toward optimizing specific protocols for specific indications of 3D printing (George et al. 2017). An example is optimizing CT radiation dose for generating accurate models of the skull for maxillofacial surgery. Using the residual volume, we found that the increase in signal-to-noise ratio possible with iterative CT image reconstruction does not increase accuracy (i.e., does not reduce the residual volume) compared to filtered back projection. Rather, accuracy (i.e., a small residual volume) is lost equally when reducing radiation dose, regardless of the image reconstruction technique used (Cai et al. 2015).

11.3 Self-Validating Models

When the intent is to perform QC procedures on individual 3D-printed models, both mathematical measures described above encounter the limitation of alignment of the digitized model to the initial designed model. A technique that can alleviate the need for registration to assess the accuracy of a printed model was recently proposed (George et al. 2017). It involves embedding markers in a prespecified pattern (such as small spheres arranged in a unit-spaced Cartesian grid) within the printed model. The embedded marker pattern can be printed with a material of similar mechanical properties as the medical model so as to not interfere with use of the model for surgical planning, but that has different radiographic properties, for example, a different CT number. Imaging the model with the corresponding imaging modality in which the marker and model material have different image intensities would then allow assessment of dimensional accuracy by ensuring that marker spacing reflects that intended. Similarly, counting and/or matching markers to those embedded in the particular model can rapidly detect bulk anatomy missing from the printed model due to printer failure. This technique is likely to simplify printed model QC as new printing materials that have different opacities are currently being developed.

11.4 “End-to-End” 3D Printing Quality Control

Phantom-based QC procedures can help ensure and establish the accurate, safe function of a 3D printer used to produce medical models, as well as the quality of individual medical models printed with it. It should however be noted that at present, 3D-printed phantoms should be avoided for quality control of the entire “end-to-end” process of medical 3D printing as understood to include DICOM image segmentation, STL generation, and STL post-processing. Three-dimensional-printed materials do not produce image intensities characteristic of human tissues (Mitsouras et al. 2017; Mooney et al. 2017; Shin et al. 2017; Bibb et al. 2011; Leng et al. 2016), precluding the imaging of 3D-printed models toward providing any assurances regarding the quality and accuracy of DICOM image segmentation. Furthermore, even if a QC phantom with tissue-like image intensity characteristics is used, any difference or lack thereof between the STL model obtained by segmenting will depend to some extent on the particular segmentation algorithm (e.g., the Hounsfield unit [HU] threshold) used. This is an innate limitation of all physical imaging systems, which may not have a vanishing full-width at half-maximum, complicating the assessment of model dimensions with high precision. To assess the end-to-end process of medical 3D printing, legacy (i.e., ordinarily manufactured) QC phantoms containing targets of known dimensions and different contrasts, such as the phantoms used in ACR QC procedures, should be ideally used and then only in conjunction with specific imaging protocols and specific segmentation algorithms (e.g., predetermined HU thresholds) that have been preestablished to be appropriate for segmenting each individual target using FDA-approved software for DICOM image segmentation (Di Prima et al. 2016). Such phantoms and segmentations can be the topic of future studies.

11.5 Conclusions

Quality control procedures will involve the input and research of multidisciplinary experts in the field to ensure delivery of high-quality, safe models. Physicians and medical physicists should play as strong a role as reasonable in the development of these guidelines, following the general format of those that have successfully enhanced aspects of radiology practices.