Introduction

In modern radiotherapy, regions of interest (ROIs) designating tumor volumes for dose prescription are manually defined by human users, introducing possible geometric variability due to variation in target delineation. As the initial step in the treatment planning process, appropriate target delineation is the vital basis of precise treatment delivery. The most conformal radiotherapy plan, delivered with impeccable positional verification, is of little consequence if the prescription ROIs within the treatment planning software do not accurately depict tumor structures in 3D space [1]. Despite this great import, comparatively few data have been generated regarding the strategic optimization of target delineation tasks.

The vast majority of target delineation is performed on computer software developed by radiotherapy manufacturers, performed using standard personal computer hardware and peripheral devices. While these commercial units may have platform-specific modifications with regard to imaging display, key functions, or mouse strokes, almost invariably, treatment planning systems have traditionally utilized a mouse-based user input device (UID) and screen-based display system. The typical computer keyboard–mouse–screen arose as an outgrowth of previous information systems (e.g., typewriting, word and field data entry) and was not initially designed for the manipulation of complex visual datasets. Consequently, while familiar, this ubiquitous system may be suboptimal for specific image-based tasks required for precise target delineation. Previous human–computer interaction have demonstrated that pen–tablet UIDs have demonstrated performance advantages compared to a mouse for tasks resembling target volume delineation [2].

We hypothesized that the use of an improved UID interface would result in the reduction of contouring tasks, faster contouring times, and subjective preference for a tablet–pen interface. Consequently, we have undertaken a prospective, multisite study to investigate the utility of UID modification for target volume delineation.

The specific aims of this study include:

  1. 1.

    Examination of the impact of distinct UID (pen–tablet compared to mouse–keyboard) interfaces on ROI delineation tasks, as defined by:

    1. (a)

      Task analysis of software-recorded objective inputs

    2. (b)

      Analysis of software-recorded objective contouring time and subject-reported contouring duration

    3. (c)

      Survey reports of subjective experience and UID preference

  1. 2.

    Assessment of potential intra- and inter-observer volumetric differentials attributable to said UIDs

Materials and Methods

Research Approval and Study Design

The current study was conducted under the auspices of The University of Texas Health Science Center at San Antonio, under IRB no. 20080166E, as an exempt study. All patient DICOM images/files were anonymized to comply with HIPAA-specified requirements. The study was designed as a prospective test/retest sequence, with matched-pair comparison planned for continuous variables for a non-Bonferroni-corrected paired Wilcoxon rank-order test. Power calculations were performed with G*Power 3 statistical software [3] for the parametric equivalent of the paired Wilcoxon signed-rank test (e.g., paired t test, assuming a minimum possible asymptotic relative efficiency of 0.864) [4] using an a priori 1 − β = 0.8, two-tailed α = 0.05, with a specified effect size of 0.8; this resulted in a requisite minimum sample size of 14 observers in each comparison arm.

UID Implementation

At selected institutions (The University of Texas Health Science Center at San Antonio, San Antonio, TX; Netherlands Cancer Institute–Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Oregon Health & Science University, Portland, OR; The University of Texas M.D. Anderson Cancer Center, Houston, TX), a commercial UID with a wireless pen and screen display tablet (DTX 2100, Wacom, Inc., Vancouver, WA, USA) interface was installed on an existing radiotherapy treatment planning workstation (Fig. 1). At each radiotherapy workstation, an off-the-shelf commercially available QWERTY keyboard and optical two-button scroll wheel mouse were also in place (Microsoft Corp, Redmond, WA; Dell Computers, Austin, TX, USA). Individual faculty and residents at each institution were given a 15-min orientation on use of the devices. The pen–tablet and mouse–keyboard UIDs were available for daily radiotherapy plan ROI contouring during an acclimation phase for 3 months thereafter, to minimize the effects of a possible procedural learning curve, before the initiation of the test/retest sequence. Observers were instructed, if possible, to perform all ROI contouring tasks on the same workstation (e.g., using the tablet as a display screen for the mouse–keyboard cases) to minimize differentials attributable to image display parameter variability.

Fig. 1
figure 1

UID configuration at the University of Texas Health Science Center at San Antonio; mouse–keyboard UID arrangement is shown on the left (a), with pen–tablet UID shown one the right (b)

Case Selection and Target Delineation Software

An expert-selected CT DICOM study case was procured and anonymized [5] for each of the following sites: head and neck, lung, brain, and prostate. Included with each case was a secondary DICOM image set consisting of a fused MRI for the brain and prostate cases and fused PET scan for the head and neck and lung cases. For each image set, a fictional standardized case presentation (representative of common clinical situations) and ROI structure set instructions were created (see Electronic Supplementary Material). Observers were sent to study the documentation which included a standardized case report, description of target volumes to be contoured, and a compact disc (CD) containing reconstructions of 3-mm axial CT images derived from the DICOM file of the standardized case study’s simulation CT to be contoured twice within the Big Brother (BB) target delineation software program. Big Brother [68] (J. Duppen et al., Netherlands Cancer Institute, Amsterdam, NL) is a self-contained software interface showing axial, sagittal, and coronal reconstructions, along with window/level, slice selection, and ROI contouring tools common to standard radiotherapy treatment planning systems (Fig. 2). Additionally, BB has the capacity to reconstruct fusion datasets of supplementary DICOM-derived images, which were included for each case (e.g., fused MRI images for brain and prostate, PET/CT for lung and head and neck cases). BB consists of a user interface with target delineation features common to most commercial treatment planning systems and collects a wide array of volumetric and target delineation data unobtrusively during the contouring session. BB also collects task-specific contouring data, such as number of ROI drawing tasks (defined as ROI point placement on an axial CT slice using the drawing tool), ROI corrections (designated as erasing, moving, or altering an ROI point on an axial CT slice, or erasing an ROI from an axial slice or image set), window/level changes, zoom, and scrolling/panning commands.

Fig. 2
figure 2

Representative screen image from Big Brother software for brain case showing fused MR (superimposed window at left) and representative target volume ROI, outlined in red

Drawing/editing tasks were defined by the GUI such that after the drawing mode was selected (by mouse right click or pen tap), an ROI would be delineated as a continuous series of points on an axial image slice. ROI delineation was initiated with the depression of the left mouse button or touching of the pen tip to the tablet device. The ROI was then drawn as a continuous series of points until right mouse release or lifting of the pen from the tablet. A secondary command/pop-up menu was initiated by right clicking the mouse or depressing a click button on the pen. For all ROI drawing/editing/erasing tasks, left mouse button depression and pen–tablet contact were deemed equivalent inputs, with left mouse release and pen tip lifting likewise correlated. Scrolling functions with the mouse scroll wheel were mapped to bilateral touch strips on either side of the screen of the tablet device. Scrolling/panning commands using the keyboard (PgUp/PgDN) were mapped to touch keys on either side of the screen of the tablet device. After entering a participant ID to unlock each contouring session, the keyboard was not used for pen–tablet contouring. All contours were completed in full-screen mode. Tasks were demarcated as commands initiated by left mouse depression/pen–tablet contact and terminated by left mouse release/pen elevation for drawing/correcting tasks.

Big Brother also collected software-recorded active contouring time, defined as the interval from user login to plan completion, excepting intervals of inactivity >30 s (e.g., if a keystroke, mouse, or pen command has not occurred for 30 s, the interval until the next keystroke/command is not recorded).

Distributed CDs contained an automated HTML link which, after electronic submission of all contouring session ROIs for central review, initiated a subsequent electronic survey.

Target Volume ROI Delineation and Analysis

Observers were invited to contour specified target volume ROIs using standardized instructions using both a standard keyboard–mouse interface and the installed pen–tablet UID. Observers were instructed to use the same workstation or an equivalent model screen–keyboard mouse combination to that on which the pen–tablet UID had been installed. Observers were allowed to contour the cases in any order over a period of 9 months. After the completion of each case, the online survey queried education level, site-specific expertise, contouring difficulty, estimated contouring active time, and device preference for each case.

Survey results, time, and ROI delineation data were collected electronically and centrally reviewed by study coordinators (CDF, CRNR, JCD). Survey results and task measures were tabulated. Data analysis was performed to evaluate potential differences between tablet and pen contouring. Measures evaluated included observer UID preference, total observer-reported contouring time, total software-recorded contouring time (minus pauses of >30 s), total volume (in cm3) of gross tumor volume (GTV) ROIs, and intra-observer GTV ROI volumetric conformation.

Volumetric conformation was assessed using a modification of the van’t Riet et al. [9, 10] conformation number (CN) which was derived as \( {\hbox{CN}} = {\hbox{C}}{{\hbox{V}}^2}/\left( {{\hbox{V}}1 \times {\hbox{V}}2} \right), \) where CV is the common volume of two observer ROIs and V1 and V2 represent the total volume of a respective ROI. Consequently, a CN = 1 indicates 100% volumetric conformance between ROIs. Intra-observer conformation was assessed using paired pen–tablet ROI volume (in cm3) for V1 and mouse–keyboard ROI volume (in cm3) for V2 for the same observer. Inter-observer agreement was assessed by calculating the mean CN of each observer’s GTV ROI as compared to all other observer GTV ROIs for the specified anatomic site, substratified by UID. Measured values >5SD from the cohort mean were deemed extreme outliers and censored from subsequent analysis.

Results

The self-reported level of training and expertise for all observers is listed in Table 1. A total of 25 individual observers submitted at least one contouring session. However, four observers used only one UID (mouse–keyboard); their results were omitted from all subsequent analyses. From the remaining 21 observers who completed at least one anatomic case with both UIDs, there were 129 submitted contour sets. A total of 124 contours were available from 62 paired sessions (e.g., where both paired mouse–keyboard and pen–tablet contours for a specified organ site) received, with five unpaired case sessions.

Table 1 Observer-reported position and site-specific expertise

Observers rated the head and neck case as the most difficult, with a majority of responses stating the case was either “Somewhat difficult” or “Difficult”; in contrast, the prostate case was deemed “Somewhat easy” or “Easy” in more than 90% of responses (Table 2).

Table 2 Number (and percentile responses) to query “Indicate the relative difficulty of the contouring session you just completed,” stratified by anatomic site

After completing ROI tasks with both the mouse–keyboard and pen–tablet UIDs, for 62 paired cases, when queried “If you had to choose between using a tablet or mouse for the contouring process on a case similar to the one you just completed, which device would you choose?” 52 of 62 users selected the tablet, 5 of 62 denoted no preference, and 5 of 62 chose the mouse interface.

Time parameter analysis revealed a statistically significant reduction in ROI contouring time using the tablet (Fig. 3). The mean ± SD active contouring time with the mouse was 31.7 ± 24.7 min (range 4–145) compared to 26.1 ± 22.5 min (range 2–139) with the tablet (paired Wilcoxon, p ≤ 0.01). The average active contouring time saved with tablet usage was 6.2 min per case (95% CI 2.9–9.8). Likewise, subjective estimation of time spent contouring was reduced using the tablet device, with a mean ± SD observer-reported subjective mouse contouring time of 30.8 ± 25.8 min (range 5–160) for all cases compared to 27.0 ± 21.8 min (range 5–120) for the tablet (paired Wilcoxon, p = 0.02). This resulted in an average perceived reduction in ROI contouring time of 5.5 min (CI 0–10.1 min).

Fig. 3
figure 3

Quantile boxplot of distribution of objective (a) and subjective (b) time parameters, substratified by anatomic case and UID. Whiskers represent the 0–100th percentile range, box encompassing 25th and 75th percentiles. Median is indicated by horizontal line within box, with mean line in green

Task analysis (Table 3) revealed the number of ROI drawing tasks to be non-different between mouse and tablet (p = 0.14, n.s.); however, the number of ROI correction tasks was reduced to a statistically detectable degree (p = 0.045), by an average of 8%. The frequency of window/level changes (p = 0.53, n.s.) and image zoom commands (p = 0.72, n.s.) were indistinct between UIDs; however, the frequency of panning/scrolling tasks was reduced by 23% (p < 0.001) with tablet compared to mouse usage.

Table 3 Mean ± SD, for software-recorded target delineation tasks, in number of tasks

Paired GTV ROI volumes for each user were not substantially different (p = 0.29, n.s.) between mouse and tablet sessions (Table 4). Mean intra-observer CN between pen–tablet tasks was 0.77 (CI 0.75–0.80), suggesting that observers were relatively reproducible across UIDs for all cases. Likewise, comparison of inter-observer agreement (Table 4) demonstrated no substantial difference (p = 0.09, n.s.) between input UIDs in terms of group agreement. However, analysis revealed a markedly poorer probability of covering the same voxel(s) as previously contoured, or as contoured by other users, which was noted in both pen–tablet and mouse–keyboard sessions for the head and neck case compared to other tested anatomic sites (Table 4).

Table 4 Mean volume, intra- and inter-observer conformation number, stratified by anatomic site (95% CI in parentheses)

Discussion

In radiotherapy treatment planning, target delineation is the premise of accurate and precise treatment delivery. Delineating treatment targets within DICOM files is complex, operator-dependent, and critical to the accurate delivery of conformal radiotherapy [1115]. These variabilities in target volume delineation can be a major primary source of inaccuracy of dose delivery and treatment errors [16]. Consequently, efforts have been made to identify processes in the target delineation process amenable to improvement, such as multimodality image incorporation [8, 1722], instructional modification [2325], visual atlas usage [1115, 26, 27], window-level adjustment [28], auto-segmentation [29, 30], and software-assisted contouring [25]. While specialized data entry mechanism for spatial data is common in other arenas (e.g., video games [31] and virtual simulation workstations [32, 33]), there have been comparatively few efforts to modify ROI definition at the hardware level in radiotherapy.

Previously, ergonomic analysis by Kotani and Horii [2] compared pen–tablet and mouse UIDs, demonstrating improved performance by pen–tablet UID on standardized repetitive computer drawing tasks (e.g., clicking, drag–dropping) and polygon tracing. The same series demonstrated EMG-detected reduction in muscular load to the flexor digitorum superficialis, extensor digitorum, and biceps brachii. Since, in its basic form, ROI outlining approximates polygon tracing as a fine motor task, evaluation of a pen–tablet UID was performed in this study.

The presented data suggest that specific aspects of the target volume ROI delineation process, across a range of tumor types and anatomic sites, are differentially impacted by the transition to a pen–tablet from a mouse–keyboard data entry system.

Both objectively measured active contouring time (e.g., excluding pauses >30 s in length) and subjectively estimated time contouring were reduced by a statistically detectable degree. However, the time savings did not appear to scale with total time required for each task and did not appear directly related to perceived case difficulty (Tables 2 and 3). The use of the pen–tablet UID was associated with a reduced number of corrective actions by individual users as well as reduced panning/scrolling functions, without reducing the frequency of contouring/drawing tasks recorded. Thus, it appears that the use of the pen–tablet results in ROIs that, while containing the same approximate number of drawn points, are less likely to be subsequently corrected than mouse-entered ROIs. The collected data also reveal no statistically detectable difference between pen–tablet and mouse–keyboard entry GTV ROI volume(s) for paired measurements (Table 4). Likewise, intra-observer and inter-observer CN values were roughly comparable to those seen in a previous series [26], suggesting that the ROI volumes designated were only minimally altered in a systematic manner based on data entry UID.

Subjectively, observers overwhelmingly preferred the pen–tablet entry system. Using collected data (Table 2) as an ordinal scale, observers rated cases contoured with the pen–tablet as comparatively easier (paired Wilcoxon signed-rank test, p = 0.04). Also, respondents were surprisingly accurate at time estimation, with a median difference between estimated and actual time spent delineating of <5 min. Consequently, not only was contouring with the pen objectively faster, but observers perceived the interface as faster.

While interest in target volume delineation has expanded in the conformal radiotherapy era, image perception evaluations [3437], workstation design [3840], and UID alteration [41, 42] have been less formally addressed than in diagnostic radiology. For example, Weiss et al. performed an evaluation comparing technologies used in diagnostic radiology. They compared QWERTY keyboard scroll wheel mouse to six different alternative UIDs (including five-button mouse, eight-button mouse, gyroscopic mouse, multimedia controller, handheld mouse/keyboard combination UID, and a gaming joystick) and found the standard mouse keyboard UID least favorable. In a similar study, Sherbondy et al. [41] evaluated trackball, pen–tablet, jog–shuttle wheel, and mouse UIDs, finding the pen–tablet UID, in two distinct configurations, to perform faster than the mouse and trackball UIDs, respectively, at a simulated angiography localization task.

However, most literature on UID modification in diagnostic radiology literature focuses on scrolling, ROI localization/identification [41], and annotation [43] rather than target delineation, as in radiotherapy. Dowsett et al. [44] reported a series of two gastro-/esophageal cancer cases, contoured by nine observers across several treatment planning systems (TPS). The treatment planning systems implemented distinct interface UIDs, including mouse–keyboard interface, light pen, and trackball input UIDs. In the Dowsett series, the TPS attached to each input UID was varied such that only a light pen and trackball could be compared on the same treatment planning system. Notably, in the present series, the use of the BB software substantially standardizes the contouring process such that all users were working with a common target delineation platform rather than on their typical institution-specific clinical workstations. Consequently, reflected task time estimates might be different with daily implementation on a commercially available TPS used regularly by the observer(s). More recently, Larsson et al. [45] presented two abstracts comparing mouse–keyboard and the same pen–tablet model in this series using geometric shapes, as well as a lumbar vertebra [45], as contoured by physicians, nurses, radiotherapists/dosimetrists, physicists, and administrative staff. In each case, the authors reported the pen–tablet interface to be faster at drawing tasks. Interestingly, they also observed that radiation oncologists were the slowest performers of ROI contouring tasks. The presented findings of this series, using more elaborate target volume-specific tasks designed to approximate clinical cases, correlate with those observed in simple geometric/anatomical shapes by Larsson et al. Work by Anderlind et al. [46], using a pen UID with haptic feedback, furthermore suggests that the addition of tactile information might even further improve contouring efficiency.

Despite being the largest prospective UID comparison in therapeutic treatment planning, several caveats of this study are apparent; our sample size, though cumulatively robust, was limited to a selected subset of institutions, resulting in potential selection bias. Though multiple cases from distinct anatomic locations and different perceived task difficulty were used, the limited number of cases in each site limits broad applicability. Additionally, the use of non-parametric analyses owing to sample size considerations, the presence of several statistical outliers in the dataset (Figs. 2 and 3), and multiple non-Bonferroni-corrected comparisons, might make excessive generalization of the results erroneous. It is also noteworthy to consider these caveats in light of the large range of contouring times, the presence of several outliers (Fig. 1), and the observations that times for contouring varied substantially for distinct cases (e.g., prostate case contouring was markedly faster than the head and neck case, regardless of device) and that individual users exhibited inter-observer differences in contouring speed severalfold greater than intra-observer UID-associated time improvement.

We did not survey participants for degree of familiarity nor preference with either UID before the study, which might also serve as an unidentified source of bias. By allowing users to contour cases in an order of their choosing, potential learning effects might be obfuscated. Thus, we sought to ascertain if contouring order was systematically by post hoc split-plot ANOVA analysis of active contouring time, UID, and both absolute (e.g., order of submitted contour sets for all four anatomic sites) and relative order (e.g., first or second contour submission for each organ site). In secondary analysis, neither absolute (F test, p = 0.46) nor relative order of completion (F test, p = 0.14) was found to be associated with a paired user difference in active contouring time.

Despite the stated limitations, our data demonstrate that modification of the input UID can detectably alter ROI delineation tasks. Specifically, in the tested cases, pen–tablet use was associated with decrement in active contouring time, perceived contouring time, perceived case difficulty, number of corrective actions, and number of panning functions while leaving ROI drawing and volumetric measures unaltered. In sum, the use of a pen–tablet device resulted in improved efficiency in ROI delineation tasks. These data suggest potentially appreciable savings in terms of physician time commitment. For instance, using the number of conformal/IMRT cases performed annually between 2006 and 2007, derived from site-specific (brain, prostate, head and neck, lung) numbers of conformal radiotherapy cases at the University of Texas Health Science Center at San Antonio and using time savings between input UIDs from the current dataset for each anatomic site, an estimated average ± SE of 1,335 ± 916 min (22 ± 15 h) of direct physician work time might have been saved annually from target volume delineation alone (e.g., not including normal structure/OAR ROI input or other time components of the treatment planning process) [47] at a single participating site. Consequently, while time savings are obviously dependent on the number of cases, case mix, case complexity, and departmental size, among other factors, considerable institutional efficiency gains might potentially be realized from input UID optimization, especially given the relative time reimbursement costs for radiation oncologists [48].

Finally, our data point to a need for a more developed analysis of image perception and human–computer interface evaluation specifically evaluating target volume delineation. It is imperative that future studies optimize technical parameters for accurate dose prescription. While this series did not directly evaluate clinical outcomes, it is possible that distinct display or human–computer interfaces might conceivably alter radiotherapy dose prescriptions sufficient to result in clinically meaningful sequelae, though such evaluation would require more robust numbers of cases.