Introduction

Colorectal cancer (CRC) is the fourth most common cause of cancer deaths worldwide [1]. Conventional white-light colonoscopy is the gold standard screening test for CRC [2]. Several randomized clinical trials and prospective cohort studies have demonstrated that conventional colonoscopy with polypectomy reduced the incidence of CRC by 40 to 90% and reduced mortality [3]. Thus, the demand for colonoscopy continues to increase.

The sequence of adenoma, high-grade dysplasia, and carcinoma leads to CRC development [4]. Thus, the adenoma detection rate (ADR) is an important index of the quality of screening and surveillance colonoscopy. However, ADR is largely determined by the subjective judgment of the physician and relies heavily on the technique and expertise of the endoscopist [5, 6]. Overlooked colorectal adenomas due to a low ADR during colonoscopy can lead to the development of interval cancer [7, 8]. The miss rate of colonoscopy for polyps has been reported to be as high as 22%, and the rate of interval cancer is 3 to 6% [9]. Hence, endoscopy centers or clinics with inexperienced medical staff in rural areas could benefit from using image processing software to detect polyps [10,11,12,13]. Additionally, experienced endoscopists may also miss polyps because of various factors such as fatigue, patient status, equipment quality, preparation quality, start time, and other circumstances [14,15,16,17,18].

Polyps and observation of polypectomy sites and bleeding regions are important findings on colonoscopy [19]. Observation of the polypectomy site is necessary to confirm whether ablation and sutures are done well and whether bleeding is present [20]. By observing a scene containing tool, a physician’s tool skill during colonoscopy can be checked [21]. In addition, intestinal clearance can be evaluated by examining stool and other residue [22]. Related research is being conducted in the capsule endoscopy field, but we demonstrate that these observations can be utilized not only for capsule endoscopy but also for colonoscopy [23, 24].

Images obtained on colonoscopy can be easily stored as digital video files, allowing reobservation by a physician as needed [25]. However, reobserving video files is burdensome because it requires much time and concentration [26]. To alleviate this problem, a method for analyzing and summarizing the colonoscopy videos using image processing software is needed [23, 27].

Taken together, there is a need for a system to automatically detect significant information, including polyps, and make it accessible to the physician. In this study, we developed and verified a system that extracts meaningful information from recorded colonoscopy videos by hierarchically applying a support vector machine (SVM), which is a type of machine learning technique. In addition, the extracted information was summarized for the physician in a visualized summary report as a color-coded timeline. We tried to analyze and summarize the colonoscopy video records according to current trends in medical information analysis using Artificial Intelligence, similar to IBM’s Watson [28,29,30,31]. This summary report in the form of a color-coded timeline is more convenient, accessible, and readable than a video recording and is expected to be a useful addition to the medical record [32].

Methods

Study design and population

This prospective, single-center trial enrolled patients aged 19 to 75 years who underwent colonoscopy for screening, surveillance, or therapy such as polypectomy at Seoul National University Hospital, a tertiary referral center in Korea, from August 2016 to December 2016. Potential participants were excluded if they had a previous history of colorectal resection, severe constipation, idiopathic pseudo-obstruction, acute exacerbation of inflammatory bowel disease, impaired renal function (glomerular filtration rate < 30 mL/min/1.73 m2), other serious medical illnesses (major cardiac, metabolic, or psychiatric illness), or if they were pregnant or lactating.

Colonoscopy procedure

All patients received standard bowel preparation including 4 L of polyethylene glycol solution. All colonoscopies were performed by two experienced endoscopists, who have performed > 10,000 colonoscopies and are certified by the Korean Society of Gastrointestinal Endoscopy. Colonoscopies were performed with a high-resolution endoscopy device (CV260SL, Olympus, Tokyo, Japan). The size, location, morphology (Yamada type), and histology of all polyps detected during the procedure were reported. Bowel preparation quality was assessed using the Boston bowel preparation scale [9]. Adequate bowel preparation was defined as a score of ≥ 2 for each location [9, 33]. We calculated the ADR (the number of colonoscopies with at least one adenoma detected, divided by the total number of colonoscopies) [5].

Acquisition of endoscopy video

The colonoscopy videos were acquired with a video capture card (SkyCapture U6T, Skydigital, Yongsan, Korea), after signal branching from the CV260SL. The videos were encoded as MP4 files to avoid loss of resolution, and the resolution was 1920 × 1080, 30 frames per second. Videos were acquired from 113 patients, and each video was about 30 to 40 min long (6.5 GB).

Classification of informative frames

Because the videos were long and noisy, it was inefficient to use the whole video; it was more efficient to extract the frames needing reobservation or image processing [34]. Original frames that were 0.3 s long were extracted as PNG files using VirtualDub software. All original frames were cropped to frames that were 850 × 750 pixels in size in order to extract only the colonoscopy area, excluding patient information and settings.

Frames with meaningful information were termed informative frames; others were termed non-informative frames. In this study, only informative frames were used for image processing. To classify informative frames and non-informative frames, a MATLAB SVM was used by performing 5-fold cross-validation [34,35,36,37,38]. Difficult to analyze or meaningless frames that had color separation, blurring caused by motion, excessive darkness, brightness, or enlargement of the screen (Fig. 1a) were classified as non-informative frames. These frames were recorded in the summary report, but they were not used for image processing.

Fig. 1
figure 1

a Classification of non-informative frames and informative frames. b The classification of informative frames into seven types: bleeding, tool, polypectomy, residue, common, thin wrinkle, and folded wrinkle

Seven types of informative frames

The informative frames were classified into seven types: bleeding, polypectomy, tool, residue, thin wrinkle, folded wrinkle, and common (Fig. 1b). Three researchers carried out this classification over a period of about 3 months. “Bleeding” refers to a frame containing bleeding regions. “Polypectomy” refers to a frame containing polypectomy or injection regions; it can be used to confirm whether ablation and suturing have been performed and whether bleeding is present at the resected area after polyp ablation. “Tool” refers to a frame containing tools for biopsy, polypectomy, injection, or others. These frames can be used to monitor details of the physician’s actions during colonoscopy. “Residue” refers to a frame containing stool or other material remaining inside the large intestine and obstructing observation, which can be used to evaluate intestinal clearance.

Other frame types were: thin wrinkle, folded wrinkle, and common, meaning frames containing a thin wrinkle, many folded wrinkles, and a flat intestinal wall with no wrinkles, respectively. After classification, these three frame types were applied to the polyp detection algorithm to detect polyps efficiently.

Hierarchical SVM

In this paper, we applied the SVM model hierarchically according to the optimized order (Fig. 2). With this method, the seven frame types were classified and used to optimize the polyp detection algorithm.

Fig. 2
figure 2

Overall structure of hierarchical support vector machine and application of the summary report of the colonoscopy video

First, the “bleeding” frames were extracted from the other frame types. These could be classified easily because of their distinctive red color [39,40,41]. Next, “polypectomy” frames were extracted. These were easily identified by their unique blue color, which is rare in the large intestine [41, 42]. Third, “tool” frames were extracted by identifying the color of metal, which is heterogeneous in the large intestine [41]. Fourth, yellow, brown, or green frames were classified as stool or residue [41]. After extraction, all frames labeled “bleeding,” “polypectomy,” “tool,” and “residue” were recorded into the summary report and were not used in the next step.

All remaining frames fell into three types: “thin wrinkle,” “folded wrinkle,” or “common.” Because the specificity of the three types of frames differed, it was inefficient to apply one polyp detection algorithm to three frame types uniformly. Since the performance of the SVM model used in the polyp detection algorithm was based on the training dataset, the training of the polyp detection algorithm performed differently according to each type to increase the accuracy of the SVM model [34, 35]. For this reason, it was necessary to classify these three frame types separately to optimally train the polyp detection algorithm.

Fifth, thin wrinkle frames were extracted, and finally folded wrinkle frames were separated from common frames. In this process, the scale-invariant feature transform value and a gray-level co-occurrence matrix were used as SVM classifiers [43, 44].

SVM application for polyp detection

The frame image applied to the polyp detection algorithm was inefficiently large. The polyp detection algorithm had difficulty detecting polyps at the original frame size because most polyps were less than a quarter of the frame image. To resolve this problem, the three types of frames were divided into smaller sizes before being analyzed. To avoid missing polyps during division, a sliding window method was used (Fig. 3) [45]. Although this increased the amount of data to process, it prevented loss of polyps. The size of the image was 850 × 750 pixels, and the size of the sliding window was set at 350 × 350 pixels with a 100-pixel interval, considering the polyp’s size. Finally, 30 divided window images were acquired from a single frame. The polyp detection algorithm was applied to the divided window images, and the frame number of the window image was recorded when the algorithm detected a polyp. Because the time information of the frame can be calculated with the frame number, the time when polyps were found could be confirmed in the summary report.

Fig. 3
figure 3

Sliding window method was used to apply the polyp detection algorithm efficiently

For the three types of frames, the SVM model was used as the polyp detection algorithm to classify the polyp and non-polyp window images with various classifiers [45,46,47,48,49]. The polyp detection algorithm was trained using 5-fold cross-validation with the polyp dataset that was selected for each type by fellow doctors and a professor in the Department of Gastroenterology [34, 35]. An overview of the entire process is shown in Fig. 4.

Fig. 4
figure 4

Overview and process of the proposed system

Focus group interview for summary report

In order to obtain comments on the system and understand physicians’ requirements, we implemented a face-to-face focus group interview, which included five fellow gastroenterology doctors. The meaningful information referred to in this paper was selected by the focus group, and ideas about visualization were also obtained and applied.

Sample size calculation and statistical analysis

Statistical analyses were performed using SPSS 18.0 software (SPSS version 18.0 for Windows, SPSS, Chicago, IL, USA). Assuming that the accuracy of the system in detecting polyps was 80% and that of colonoscopy performed by a professional endoscopist was 90%, a sample size of 180 polyps was required to detect this difference with 80% power and an α of 0.05. According to several previous studies, about 72 polyps were detected with conventional colonoscopy in 40 patients [50]. Therefore, we assumed that a sample size of 100 patients would be sufficient to detect 180 polyps.

The actual polyps detected by endoscopists during colonoscopy and the proposed polyps detected by the system were compared to determine the sensitivity. Continuous variables were calculated as the mean ± standard deviation and categorical variables as the number (%). Student’s t test was used to compare continuous variables, and the χ2 test was used to compare categorical variables between the two groups. The relative risk and 95% confidence interval of the significant factors were calculated. A P value < 0.05 was considered statistically significant.

Results

Characteristics of patients and colonoscopy quality assessment

A total of 113 patients were enrolled prospectively from August 1, 2016, to December 31, 2016. To increase the reliability of the colonoscopies performed in this study, we recruited mostly patients receiving repeat colonoscopy (90.5%) within 2 years. The patients’ baseline characteristics and the results of the colonoscopy quality assessment are shown in Table 1. The patients’ mean age was 60.4 years (range, 47–73), and 62 patients (54.8%) were men. The indications for colonoscopy were referral for treatment of polyps or further evaluation of symptoms from our hospital or outside hospitals (72.6%) and screening or polyp surveillance (27.4%). Cecal intubation succeeded in 112 patients; the reason for failed cecal intubation was conversion to sigmoidoscopy because of a lack of patient cooperation. The mean withdrawal time was 19.7 min, and bowel preparation using the Boston Bowel Preparation Scale was adequate in 92.9% of the patients [9, 33].

Table 1 Characteristics of study patients and colonoscopy quality assessment

Polyp characteristics

The baseline characteristics of the polyps detected by the endoscopists are summarized in Table 2. The endoscopists detected a total of 351 polyps during the colonoscopy. The mean polyp size was 6.0 mm (range 2–60 mm), and the most common locations where polyps were found were the ascending colon (32.7%) and the transverse colon (31.0%). Yamada type I (47.0%) and type II (42.5%) were the most common morphologies of polyps detected on colonoscopy [51], and tubular adenoma (56.4%) was the most common histologically.

Table 2 Characteristics of the polyps detected during colonoscopy

Classification of frames

The SVM was applied hierarchically to optimize classification, in the following order: “bleeding,” “tool,” “polypectomy,” “residue,” “thin wrinkle,” and “folded wrinkle.” After classification, false positive and false negative extracted frames were confirmed manually. The accuracy and sensitivity of each frame type were averaged (Table 3).

Table 3 Mean accuracy and sensitivity of classification according to type

Effectiveness of colon polyp detection

The window images of suspected polyps were detected through a polyp detection algorithm. The sensitivity of polyp detection was validated differently than type classification. Because one polyp appeared in several frames, a frame-by-frame comparison was inefficient. Thus, the system counted the number of polyps present in the video, not in every frame where the polyp appeared. Based on clinical judgments, true positive (TP), true negative (TN), and false positive (FP) values were calculated by matching the polyps detected from the system and doctors’ clinical judgments. Among the 351 polyps found on colonoscopy, 288 polyps were detected by the system, and the average sensitivity of the system for polyp detection was 82.1%. In contrast, specificity was calculated using frame-by-frame comparison for each patient because the number of non-polyp parts can be counted only in frame units. Specificity was defined as follows, where TN represents non-polyps classified as non-polyps and FP represents non-polyps classified as polyps [52]:

$$ \mathrm{Specificity}=\frac{\mathrm{True}\ \mathrm{negative}\ \left(\mathrm{TN}\right)}{\mathrm{True}\ \mathrm{negative}\ \left(\mathrm{TN}\right)+\mathrm{False}\ \mathrm{positive}\left(\mathrm{FP}\right)} $$
(1)

The average specificity of the system for polyp detection was 89.1% with a standard deviation of 4.6%. Likewise, the positive predictive value (PPV) was also calculated in frame units. PPV was defined as follows, where TP represent polyps classified as polyps:

$$ \mathrm{Positive}\ \mathrm{predictive}\ \mathrm{value}=\frac{\mathrm{True}\ \mathrm{positive}\ \left(\mathrm{TP}\right)}{\mathrm{True}\ \mathrm{positive}\ \left(\mathrm{TP}\right)+\mathrm{False}\ \mathrm{positive}\left(\mathrm{FP}\right)} $$
(2)

The average PPV was 39.3% with a standard deviation of 4.1%.

We compared the polyps detected and missed by the system (Table 4). The mean size of the detected polyps was greater than that of the missed polyps (6.3 vs. 4.9 mm, P = 0.003). Most polyps missed by the system were Yamada type I (81.5%). Yamada type IV polyps were not missed by the system. However, regarding the distribution and histology of the polyps, there were no statistically significant differences between the two groups.

Table 4 Performance of detecting polyp

Summary report of colonoscopy video

Through the proposed system, the position information of “bleeding,” “tool,” “polypectomy,” “residue,” and “polyp” frames were checked in the colonoscopy video. These results were displayed to effectively deliver the information to physicians. In the process of making and displaying the summary report of the colonoscopy video (SRCV), advice and comments were received from the focus group. The SRCV is shown in Fig. 5. The white gaps in the timeline represent the three types of frames in which polyps were not found: “folded wrinkle,” “thin wrinkle,” and “common.” The SRCV is linked with the colonoscopy video frame by frame. If the user clicks or double-clicks a certain point on the SRCV, the frame of that point is shown, or the video plays from that point.

Fig. 5
figure 5

Summary report of the colonoscopy video (SRCV). The SRCV matches the colonoscopy video frame by frame

To evaluate the utility of the SRCV, we adapted the System Usability Scale (SUS) questionnaire. The SUS is a popular, simple, ten-item attitude Likert scale assessing usability [53]. We modified the general questions for the SRCV evaluation (see Additional File 1). The questionnaire was administered to 30 physicians from the Department of Internal Medicine (gastroenterology, n = 15; non-gastroenterology, n = 5), Department of Family Medicine (n = 5), and the Department of Gastrointestinal Surgery (n = 5).

Figure 6 shows the answer distribution of the usability questionnaire, grouping the results related to positive and negative questions. Regarding positive questions (questions 1, 3, 5, 7, and 9), 73.3% of participants expressed agreement (scores 4 and 5), and 2.7% gave disagreement scores (scores 1 and 2). Regarding negative questions (questions 2, 4, 6, 8, and 10), 54.7% of answers were in disagreement (scores 1 and 2), and 18.7% gave agreement scores (scores 4 and 5). However, all participants could understand the SRCV without any detailed help. We suppose the necessity for an explanation of “classification by frame unit” affected the result.

Fig. 6
figure 6

Usability questionnaire results

Discussion

In this study, we developed and verified a system that extracts meaningful information from recorded colonoscopy videos by hierarchically applying an SVM. Most recent studies that used machine learning algorithms applied these algorithms to an optimized dataset and evaluated their accuracy [54,55,56]. However, the present study used all images extracted from real endoscopy videos, analyzing the entire videos. The results containing informative frames and suspected polyps were presented on a timeline. When the utility of the SRCV was evaluated by an SUS questionnaire, a large majority of participants expressed positive opinions.

The mean accuracy and sensitivity of the system for classification according to type was over 90%. We introduced an efficient order by applying the SVM hierarchically. The types of frames were extracted in the order of importance as follows: “bleeding,” “tool,” “polypectomy,” and then “residue.” To optimize classification and avoid duplicate notation in the SRCV, this order of hierarchical SVM was determined to be the most effective.

The colonoscopy videos used in this research were reliable for polyp detection because skilled experienced physicians performed the colonoscopies and most patients enrolled in the study had previously undergone colonoscopy within the past 2 years, a period of time that is shorter than the 3- to 5-year post-polypectomy surveillance period recommended by the American Society for Gastrointestinal Endoscopy and the American Gastroenterological Association [57, 58]. This may have resulted in lowering the false negative results when the system was applied to the video.

The mean sensitivity and specificity of polyp detection were also over 80%, but the PPV was low in the study. In recent years, considerable efforts have been made to develop an efficient approach to automatically detect polyps using colonoscopy videos. However, automated detection of polyps from colonoscopy videos is very challenging because of high variations in polyp size, color, shape, texture, and location [59,60,61]. Although there have been considerable advancements in automatically detecting polyps, these methods still have low detection accuracy. In several previous studies using datasets, sensitivity ranged from 36.9 to 71.4%, and the PPV ranged from 13.6 to 93.5% [62,63,64,65]. In the present study, the polyps detected by the system were larger and had more pedunculated morphology than the polyps missed by the system. The proportion of advanced histologic features gradually increases as the polyp size increases, and it is important not to miss large polyps. Lesions are considered advanced when they are ≥ 1 cm in size [66]. A previous study showed that only 10.7% of adenomas ≤ 1 cm in size had a villous component, versus 56% of those larger than 1 cm [67]. However, in our study, when the size cutoff was set at 1 cm (< 1 and ≥ 1 cm), there were no statistically significant differences in the detection rate between the detected polyp group and the missed polyp group by the system. The six cases of missed polyps ≥ 1 cm in the system were all non-granular were a pseudo-depressed type of lateral spreading tumors. These lesions are difficult to distinguish visually, and we need to improve the system in the future.

The low PPV of polyp detection in our study was because we preferred “false positive” over “false negative” when the polyp detection algorithm was applied. Although the occurrence of false positive results reduces efficiency by providing unnecessary information to physicians, it is better than missing polyps. Moreover, it has the possibility of identifying polyps missed during colonoscopy by identifying suspicious areas. In this study, we used the SVM to solve a series of processes with one tool, which is simple and efficient. The SVM can be replaced by another classification tool. In future studies, accuracy is expected to improve by using deep learning methods.

We tried to convey meaningful information from the colonoscopy video through data visualization. The non-informative frames were included in the SRCV because physicians want to reference which points were non-informative when they conduct colonoscopy again. In this study, one frame was extracted per 0.3 s. If the frames would be extracted from the video more frequently, the SRCV would become more detailed and omissions would be reduced.

The proposed system that creates a summary report automatically from colonoscopy videos is expected to contribute to the gastroenterology and endoscopy field by enabling physicians to have easier access to important colonoscopy data. The system can summarize meaningful colonoscopy information, provide visualized guidelines, or be utilized for educational purposes using artificial intelligence, similar to IBM’s Watson [28,29,30,31]. It can also be used to training non-experts in polyp detection [68, 69]. Moreover, when a physician who did not conduct the colonoscopy reads the medical records, the SRCV can help increase the reliability of the report and the physician’s understanding of the results by matching the video with the SRCV. Additionally, if the SRCV is included in the medical records, it will be helpful when medical records and patient information are shared among doctors or hospitals [32].

In conclusion, our study showed the usefulness of a system that extracts meaningful information from colonoscopy videos and provides a summary report on a color-coded timeline. The accuracy and sensitivity of classification type and polyp detection was verified. Further validation of the system is needed after system improvement, including an increase in the PPV for polyp detection by using deep learning methods. If the system improves accuracy and is capable of real-time application, then clinical application of this system can be expected to assist in the detection of polyps and finally increase ADR in colonoscopy. Furthermore, when the utility of the SRCV was evaluated using an SUS questionnaire, a large majority of participants expressed positive opinions. Thus, our proposed system for analyzing colonoscopy videos is expected to be useful for physicians and patients, and it also can be utilized in capsule endoscopy.