Keywords

1 Introduction

Colorectal cancer (CRC) is a preventable cancer that is diagnosed in around 150,000 people each year in the US. Despite the fact that it can be prevented, about 50,000 patients die from CRC annually [1]. There is wide-spread consensus that the current preventive strategies in place in the US should drastically reduce the incidence and mortality of CRC, yet for a multitude of reasons this has yet to happen [2, 3]. Of all the methods to prevent death from CRC, colonoscopy holds most promise; it is a technique that allows detailed inspection of the entire colon and at the same time removal of all premalignant lesions. The latter is commonly performed during the withdrawal phase of the procedure. Colonoscopy is also readily available in most geographical areas of the US with wide-spread coverage of the procedure by payers.

The main problem with colonoscopy is the relatively limited CRC protective effect it currently provides. Several studies have shown limited or even total absence of a protective effect (i.e., in the right colon) against mortality of CRC, especially outside carefully controlled trials [2, 3]. More recent studies have shown a definite protective effect, in particular for CRC of the left colon [4, 5]. CRC of the right colon appears to be more difficult to prevent and numerous explanations for the relative failure of colonoscopy have been proposed. In general, these explanations can be divided into two sets. One set focuses on patient and biology related factors; these include a poor preparation, an inability of the patient to cooperate during the procedure, an abnormal anatomy, flat polyp morphology or an unfavorable polyp or tumor biology. Indeed, these factors all may be present more in right-sided CRC: frequently bile and small bowel content covers the right colon, the deep folds of the right colon make inspection difficult, and flat, more rapidly progressing tumor biology (CIMP pathway) is much more likely in neoplasia of the right colon. The other set of explanations focuses on procedure and endoscopist related factors: suboptimal equipment, no removal of remaining debris, not reaching the cecum, fast withdrawal, no effort at inspection of areas behind folds and angulations, and inadequate polyp removal technique.

A key study supporting this concept was published in 2014 and shows that for every 1 % increase in adenoma detection rate (ADR), there was a 3 % decrease in interval CRCs; the lowest interval CRC rate was observed among endoscopists with an ADR > 33.5 % [6]. Proponents of the first set of factors may point to the patient responsibility for a clean colon, the type of preparation and patient compliance, outline the benefits of propofol sedation and believe that interval CRC is a result of rapid growth. Proponents of the second set of factors are of the opinion that gastroenterology-trained endoscopists provide better quality than other endoscopists, believe that removal of debris, complete inspection and total removal of all neoplasia can be achieved and should lead to nearly complete protection against CRC if screening and surveillance guidelines are followed; interval cancers are considered a result of missed lesions (polyp or small cancer) or incomplete resection of identified lesions at prior colonoscopy. In reality there is not a strict separation into two sets of opinions but a gradual range of opinions.

Several years ago the general opinion within gastroenterology, in particular related to interval cancers, was more along the first set of explanations with a focus on tumor biology. Lately, research has shown serious gaps in procedure quality, suggesting that improvement in this area would dramatically improve patient outcomes. The most important observations that favor procedure and endoscopist are reports that show

  1. (1)

    vast differences in interval CRCs among endoscopists, [7]

  2. (2)

    a very low interval CRC rate in endoscopists with an adenoma detection rate (ADR) of at least 20 % in Poland or > 33.5 % in the US, [6] and

  3. (3)

    a very high CRC mortality reduction of 89 % when implementing a colonoscopy protocol that enforces high quality and results in 34 % ADR [8].

The question, then, is how to improve endoscopist technique during the colonoscopy procedure within today’s challenging medical environment in which physicians often feel pressure to see more patients in less time. Our hypothesis is that providing feedback during the critical phase of the procedure – the withdrawal phase - will result in improved endoscopist technique and higher ADRs.

2 Related Work

2.1 Endoscopic Multimedia Information System (EMIS)

Since 2003 our group has worked on creating an automated system to capture, analyze and summarize video files representing an entire endoscopic procedure [9]. We have called our system EMIS for Endoscopic Multimedia Information System. We have focused our efforts on colonoscopy. Our work has shown that our manual EMIS annotation technique is reproducible among annotators with fair to good inter-operator agreement; inter-operator agreement is best for very low and very high quality procedures, but varies when quality is average. Our automated EMIS technology results correlate with our manual annotation results, and both manual and automated annotations correlate with ADR – the most widely accepted main determinant of colonoscopy quality – for a set of video files representing the work of a single endoscopist or an endoscopy group.

2.2 Commonly Used Indicators of Quality

All commonly used indicators of colonoscopy quality, such as cecal intubation rate, average withdrawal time, ADR, polyp detection rate, and interval cancer rate, are averages and provide no information about a single procedure [10]. Instead, these quality parameters are summary data that reflect a group of procedures performed by an individual endoscopist or a group of endoscopists over a specific time period. An inherent feature of summary data is that a few really poor procedures combined with a larger set of higher quality procedures will result in acceptable overall quality scores. Intuitively it does not make sense to set as goal a specific withdrawal time (several specific times have been proposed) or a specific number of cases in which polyps should be detected. Instead, it would make sense to measure features that directly define quality of each procedure. However, an accepted method to measure quality of colonoscopy during the procedure does not exist at the present time. Therefore, our automated annotation, if it could be performed in real-time with real-time reporting of measured quality, has the potential to provide real-time feedback about quality, and thereby influence the outcome of the procedure.

2.3 Direct Indicators of Quality

Three things need to happen at the same time in order for a colonoscopy to be of high quality. First the colon needs to be well prepared (Clean). Second, most if not all of the mucosa needs to be inspected (Look Everywhere). And third, all neoplastic lesions, where possible, need to be completely removed (Abnormality Removal) [11]. We have combined these three features into the CLEAR acronym. EMIS uses computer-based algorithms to analyze the image stream generated during – not after – colonoscopy for specific metrics based on the CLEAR principle. EMIS does not interfere with actual colonoscopy as the same image stream is displayed on a monitor allowing the endoscopist to view the colonic mucosa and perform diagnostic and therapeutic procedures as indicated. To allow streaming video file analysis for CLEAR features we created SAPPHIRE: middleware that handles multiple simultaneous real-time algorithms and automatically distributes these to either one or more threads, CPUs or GPUs, all the time making sure that all single frame-related algorithms are completed before the next frame becomes available [12, 13]. With a video frame rate of 30 frames/s this means SAPPHIRE must complete all single frame-related analyses within a time span much shorter than 33 ms in order to process the results and generate feedback information.

2.4 Features of EMIS

EMIS can detect whether the colon is clean, whether the endoscopist removes remaining debris, whether the endoscopist tries to inspect the entire colon and whether polyps are removed. Using a graphics card attached to a MS Windows OS7 workstation we automatically capture the video stream from the endoscope image processor; algorithms process the video stream for many features related to quality. For each algorithm we went through a similar multi-step process. First, we decided what new or existing features needed to be derived to measure the desired quality metrics. Next we created a training set of images that incorporated the presence or absence of the features; for some training sets we used a binary approach (biopsy cable in frame: present/absent), for others a continual range (stool pixels per frame: 0–100 %). The third step consisted of creating algorithms that measured the features of interest. Initially we developed those in high level language such as MatLab. Then we would use Machine Learning techniques to train our software on the training set; next we determined sensitivity and specificity using the test set. Our goal is to achieve around 95 % sensitivity and specificity. Once we achieved these marks, we rewrite our code in either C/C++ to increase speed of execution, or when this does not result in fast enough execution, assembly language.

2.5 Real-Time Feedback

In addition to automated real-time analysis, EMIS allows real-time feedback. For this we developed within SAPPHIRE a reporting module that summarizes from all algorithms a summary state that is continuously updated in real-time; this summary state can be sent for display on a monitor. Recently we described all the features required to allow EMIS to function in real-time within a healthcare network; we encountered numerous challenges yet eventually we solved all [14]. Thus we are ready to test our first real-time feedback modules.

3 Methods

3.1 Features Ready for Real-Time Feedback

The protective effect of colonoscopy is directly related to removal of all precancerous lesions; as lesions can only be removed if identified, the first thing that needs to occur during colonoscopy is inspection of all mucosa, and where mucosa is covered with remaining debris, this debris needs to be removed. This is not as simple as it seems as the colon is a convoluted, moving tubular organ constantly receiving bile-stained digestive juices and food particles from the small bowel. Our algorithms for features that measure colon preparation and mucosal inspection have gone through a number of iterations aimed at improving accuracy and speed; at present they are fast enough to provide real-time analysis and the results have been coupled to a feedback module. Thus we are ready to start providing feedback related to colonic preparation and mucosal inspection.

3.2 In-Person Real-Time Feedback

Real-time feedback is commonly provided to physicians in training; indeed, nowadays endoscopy is taught first by acquisition of basic endoscope handling techniques on endoscope simulators. These simulators provide real-time feedback about the force that is used to manipulate the endoscope within a simulated patient and at the end of the simulation may summarize amount of mucosa and the number of simulated polyps seen as well as extent of intubation and duration of the withdrawal phase in a report. Next, after having observed experienced endoscopists during colonoscopy on real patients, trainees themselves start performing colonoscopy on patients under continuous in person supervision of experienced endoscopists. Especially in the first days and weeks of hands-on patient endoscopy, the trainee receives continues feedback about how to insert the endoscope, how to clean the lens of the endoscope and the mucosa of the patient from remaining debris, and how to maneuver the tip of the endoscope in order to achieve optimal inspection of as much of the colon mucosa as possible.

3.3 Automated Real-Time Feedback

Our real-time feedback is targeted at experienced endoscopists as well as relatively advanced trainees. Therefore, the intention is only to provide feedback when endoscopic actions are seen that ideally are not present in endoscopies by experienced gastroenterologists performing high quality colonoscopy. We are unaware of an existing system that currently provides real-time analysis with real-time feedback during endoscopic or surgical procedures; thus there are no examples of when and how to provide real-time feedback. Based on domain expertise we defined four feature/time feedback triggers during the withdrawal phase. First, we decided that whenever the image was not clear or blurry due to debris covering the lens or the tip of the scope being stuck within mucosa, the endoscopist should show efforts at obtaining a clear image within 15 s. Second, we expected that with debris, easily removable, an effort at removal should occur within 30 s. Third, we determined whether removal speed was rather fast over 30 s or more. And last but not least, we determined if there was obvious circumferential inspection activity within a 30 s withdrawal segment. Whenever polyps were removed, the latter two could not be evaluated as absent. For each 30 s segment we scored expected behavior as “0” and behavior that would trigger feedback as “1”. Table 1 lists the feature annotations and scoring. A single annotator with over 3 years of video file annotation experience annotating thousands of colonoscopy video files manually annotated all video files. This annotator was trained in colonoscopy video file annotation and then bench-marked to a set of 10 video files, annotated by a group of experienced endoscopists.

Table 1 Features and annotation triggers

4 Results

Video files were selected from a large set of video files obtained automatically as part of our quality studies in January and February of 2014. Video files were captured at a rate of approximately 30 frames/s. A total of 100 video files were annotated for this study. There were 4 video files (4 %) where not a single feedback trigger annotation was made; an example is shown in Table 2. This example is of interest as the total withdrawal time, 3 min and 12 s, is far below the recommended minimum withdrawal time of 6 min, yet not a single time was any of the manual trigger thresholds reached. The remaining 96 video files each included from 1 to 44 feedback triggers. Table 3 summarizes the results for the 100 video files.

Table 2 Example of a colonoscopy video file without any feedback triggers
Table 3 Summary of results for four inspection features. Withdrawal Time and Trigger Interval in seconds

As can be seen from Table 3, there was a wide fluctuation within each feature trigger with a large range for clear images and circumferential withdrawal. The expected average number of feedback triggers given the current feature definition was 8 ± 7 (Mean ± SD) triggers per colonoscopy. The average interval between triggers was nearly 2 min with a very wide range from once every 35 s to once in nearly 12 min. Feedback triggers were not randomly distributed throughout video files. Table 4 shows part of the feedback trigger annotations for the colonoscopy with the largest number of triggers, 44. As can be seen many triggers were grouped together between minute 30 and 33 after the start of the procedure.

Table 4 Example of video file annotation with triggers (red); not all triggers are shown

5 Educational Analysis

The two examples shown in Tables 2 and 4 were further analyzed by careful review of the events by an expert endoscopist with greater than 20 years of colonoscopy experience to determine the potential educational value of the feedback. Annotations were found to be accurate. In the procedure shown in Table 2 the colon was well cleaned and most remaining debris was removed during insertion and withdrawal. The image was always clear and speed during withdrawal was constantly in the direction of the anus. Spiral activity was present each 30 s segment. However, not all of the mucosa was seen, there was no inspection behind large folds and flexures and the endoscopist did not go back to inspect mucosal areas missed. Automated analysis of the video file (data not shown) revealed a very low total spiral score, the number of completed 360 degree inspections, of 4; the absence of back and forth movement with inspection of folds explains the low spiral score and the very short withdrawal time. Thus the manual annotation was accurate but the manual spiral annotation as performed not sufficient to generate feedback. The automated analysis which can summate the complete circumferential inspection activity over the entire withdrawal phase and provide an update of the time spent inspecting the mucosa however would provide real-time feedback of the score and withdrawal time, and inform the endoscopist that more effort towards inspection is warranted.

In the procedure shown in Table 4, the colon was fairly clean except for sporadic seeds, likely from a fruit or vegetable ingested in the days prior to colonoscopy. Around 20 min a polyp was resected which resulted in some bleeding and while inspecting the bleeding site, the endoscopist lost track of the polyp specimen. Around minute 27.5 the polyp specimen was found and suctioned into the instrument together with a number of seeds. Seeds are known to occlude the suction system. More seeds were seen and by minute 30 the endoscopist had lost suction; the endoscope tip was submerged in remaining debris, water used to wash the lumen and seeds. After three minutes the suction ability for water, debris and air was restored. The withdrawal phase was long, 30 min, but at least 10 min were spent inspecting the bleeding polypectomy site, looking for the polyp specimen and trying to regain suction. The automated analysis showed a spiral score of 19, which seems correct, but is inflated due to the efforts at inspecting the bleeding site and trying to find the polyp specimen. Another algorithm, determining forward and backward motion, can detect that the endoscope is not moving in meaningful forward or backward direction, and the lack of this movement can be included in determining circumferential withdrawal without movement in anal direction.

6 Discussion

Improving quality of colonoscopy with real-time feedback during the procedure has never been done. Therefore we are investigating each step in the development of real-time feedback. Here we determined how often feedback related to blurred images, failure to remove remaining debris, high speed of withdrawal and absence of circumferential withdrawal would occur if we used 15, 30, 30 and 30 s respectively as feature triggers activating feedback. Our results support the following conclusions. First, real-time feedback during colonoscopy is needed as quality fluctuates among procedures. Second, as shown in Table 3 real-time feedback is more likely to be beneficial for blurry images and absence of circumferential withdrawal. Third, the thresholds used in this study, which were based on domain expertise, for blurry frames and circumferential withdrawal with trigger time periods of 15 and 30 s respectively seem ready for testing in the clinical environment. Fourth, to our surprise, real-time feedback related to absence of removal of remaining debris and withdrawal speed is unlikely to greatly influence quality of colonoscopy. Fifth, feedback related to withdrawal time and cumulative circumferential withdrawal should be included with blurry image and circumferential withdrawal feedback. Sixth, the average interval between triggers of nearly 2 min seems acceptable; moreover, any improvement in technique due to real-time feedback will – automatically – decrease the number of trigger events and thereby increase the interval between triggers.

Our in depth analysis of two outlier cases, one without any triggers, and one with the maximal number of triggers revealed that meaningful real-time feedback during colonoscopy will have to include a large number of features. Relative position of the endoscopy, determined by delta forward-backward movement, will allow us to determine whether circumferential inspection is occurring at more or less the same location or during a gradual withdrawal of the instrument. 3D mapping of the colon mucosa from 2D images will provide a second means of determining endoscope movement direction anus [15]. Without any doubt additional features will need to be included or developed once we have tested real-time feedback in clinical practice.

Use of real-time feedback by endoscopy trainees for only spiral score and cumulative spiral score was associated with significantly higher inspection technique [16]. Thus EMIS has the potential to educate “on the job” and likely will be useful for objectively measuring and improving endoscopic technique of staff endoscopists. Yet although these findings are very encouraging, several questions remain. First, will EMIS be acceptable to staff endoscopists, especially those who have been accustomed for many years to practice endoscopy without any peer review? Second, will the effect of EMIS wear off over time? Third, more thorough inspection in general means longer inspection per procedure and fewer procedures per day; thus higher colonoscopy quality may lower endoscopist income. Lastly, EMIS itself comes with costs which further may decrease endoscopist income.

In conclusion, we are one step closer to testing real-time feedback in the clinical practice of experienced, staff endoscopists. Our results shown here support testing a number of features and suggest that a moving feature trigger period of 15–30 s will provide meaningful feedback without too many feedback triggers per procedure. Additional studies are required to determine whether “on the job” education via real-time feedback is acceptable and will lead to persistent improvement in endoscopist technique and decreases CRC mortality.