Introduction

Biliary atresia (BA) is a rare pediatric disease caused by hepatobiliary destruction. BA has an estimated overall worldwide incidence of around 0.3–3.7 in 10,000 live births and an incidence around 1.04–1.1 in 10,000 in Japan [1]. It is the most common cause of death from liver disease in children [2]. The therapeutic approach consists of hepatoportoenterostomy (HPE or Kasai procedure) as the first strategy and, in case of failure, liver transplantation [3]. The timing of a surgical intervention with Kasai procedure is one of the main prognostic factors; a Kasai procedure performed less than 30–45 days is associated with better prognosis [1]. Over the past decade, the average age at surgery has remained between 60 and 70 days in Japan [4]. Detecting BA symptoms such as jaundice and pale stool at 2 weeks to a month has been challenging because physiologic jaundice in newborns is extremely common, particularly in Asian populations. Since the diagnosis of BA is made intraoperatively after a series of detailed examinations to exclude other causes of cholestasis [5], successful screening at 2 weeks–1 month after birth is crucial for the optimal timing of a Kasai operation.

Over the last 10 years, infants’ stool color has become a primary strategy for BA screening in several different countries including Taiwan, Japan, Switzerland, and Argentina [1, 6]. The first universal screening program using a stool color card was introduced in Taiwan in 2004, and it has served as a successful model for implementation in other countries [7,8,9]. In the US, a mobile application designed to assist in identification of acholic stool, PoopMD, was introduced in 2015. Acholic and normal stool color ranges in their application were represented using color hexes captured from the stool images on the Taiwan stool card and accounted for variations in hue and brightness. According to the results of their pilot study, the sensitivity was 100% (7/7) and the specificity was 89% (24/27) [10]. In Japan, a new nationwide stool color card containing seven photographs of different colors of stool samples ranging from 1 (acholic) to 7 (normal) is included in Maternal and Child Health Handbook since 2012. The results of a regional cohort study spanning 19 years showed that the sensitivity and specificity of the stool color card are 76.5 and 99.9%, respectively [11]. Despite these results, as previously reported, the stool colors of BA patients are not always acholic, as obliteration of the bile duct often occurs gradually [12]. In fact, according to the Japanese Biliary Atresia Registry, 72.9% (924/1267) of the parents of patients with BA initially identify their baby’s stool as normal [13].

More timely detection of BA symptoms through evaluation of stool may be achieved by enhancing the ability to detect subtle differences in color even for non-acholic BA stools in which bile drainage into the duodenum may lead to pigmented stool color. With this objective, we have developed a detection algorithm based on pattern recognition and machine learning processes. Our aim in this report is to introduce Baby Poop, a free iPhone application that evaluates infant stool images based on this detection algorithm and alerts the user when suspicious colors are detected, as well as describe a preliminary assessment of its performance and potential large-scale utility.

Materials and methods

Software development

Baby Poop (officially named Baby unchi in Japanese) was designed to target caregivers of infants about 2 weeks–1 month of age. Based on the overall conceptual framework, user interface elements and application design were developed in collaboration with UNLOG K.K. (Tokyo). Using a touch screen interface, the user can access the explanation for the overall study protocol and informed consent, answer the baseline questionnaire, and take a photo of a baby’s stool using the camera or may choose a stool photo from their digital photo library (Fig. 1). Standardizing the photo shooting method is crucial for accurate evaluation of stool color; therefore, instructions for ensuring an optimal photo shooting environment are provided, such as under fluorescent lighting without flash, and the photo taking method with sample images showing an ideal photograph of the stool. To further enhance photo shooting precision, the application includes an automated zoom-in function, allowing users to keep their phones at a reasonable distance from the stool sample. Once a new stool picture is taken or selected, the application automatically analyses its color and provides feedback on the result of the evaluation. Images may be saved that allows caregivers to track and log stool colors to facilitate the identification of changes in stool colors longitudinally. The application, Baby unchi, was released on September 29 (2016) in the Apple Store, and is currently available only in Japanese.

Fig. 1
figure 1

Baby Poop (Baby unchi) Screenshots (Link: https://itunes.apple.com/jp/app/babyunchi/id1152163386?mt=8). Using a touch screen interface, users are asked to take a photo of a baby’s stool using a camera, or the user may choose a stool photo from their digital photo library

Collection of pre-existing stool pictures

We assembled a total of 57 pre-existing BA stool pictures collected from several institutions and through the families of patients whom agreed to provide their pictures to this study. These BA stool pictures included pigmented stool images that would not necessarily be screened as BA positive based on the stool color card (Fig. 2). Due to low resolution, 3 of 57 BA stool pictures were excluded. An additional 100 non-BA stool images were collected through volunteers mainly from affiliated birth clinics, and for all of these images, corresponding patients were confirmed not to have BA or any liver-related diseases. The age at time of photograph was also recorded when available. All samples were from a Japanese population and no secondary data (i.e., printed stool color card) were used in our analyses. For the training data used in the machine learning process, same stool images under different light sources were also included.

Fig. 2
figure 2

Example of indistinguishable stool images of BA and non-BA infants. The top stool picture represents the non-acholic BA stool picture while the bottom stool picture represents the non-BA stool picture. HSV color values of the randomly selected three points from two indistinguishable pictures are all different in each HSV value. BA biliary atresia, HSV hue, saturation, and value

Stool color evaluation system

We first determined which parameters to use in developing our detection algorithm. Potential parameters included red, green, and blue (RGB) and hue, saturation, and value (HSV). We measured RGB and HSV information for 50 pre-existing stool pictures including those of 19 BA patients and 31 healthy infants. Setting BA or non-BA stool as the outcome, and RGB and HSV values as the explanatory variables, we performed logistic regression to examine the importance of each parameter when predicting BA or non-BA stool. A receiver operating characteristic (ROC) curve was plotted using the true positive and false-positive rates. The area under the curve (AUC) that ranges from 0.9 to 1.0, is a finding that is considered “excellent” for the prediction of the outcome [14].

After the baseline analysis, we developed a classifier based on approaches of pattern recognition and machine learning especially utilizing class-featuring information compression (CLAFIC) method, one of the discriminant methods in pattern recognition [15]. In this process, we converted 30 BA and 34 non-BA pictures into numerical values using either RGB or HSV values, based on the results gained from our baseline analysis. We then optimized a tuning parameter, using the CLAFIC method and a K-fold cross validation [16]. In this process, a newly taken stool photo image is divided into domains and analyzed, and is then given a label by applying the optimal CLAFIC classifier. For the baseline analysis and machine learning process, we selected stool pictures to represent a broad range of colors so that the algorithm would be able to detect subtle differences in stool colors (i.e., pigmented BA stool colors and transient non-BA stool colors).

In addition to the two main labels (BA and non-BA stools), we also created “indeterminate” and “re-examination” label classifications. The “indeterminate” classification assigns the group of unrelated colors to images of the training data. Therefore, when a new unlabeled observation is classified into this “indeterminate” class, we can judge that the probability of belonging to BA or non-BA is low based on its input data. In addition, the “re-examination” class suggests the group of colors of observed stools that would be suspected for BA or non-BA classification. Hence, when a new unlabeled observation is classified into the “re-examination” group, we can judge that the probability of belonging to BA or non-BA is slightly high, but below the threshold to categorize as either BA or non-BA, based on its input data.

In the iPhone application, once a newly taken stool photo is identified as “abnormal”, a message stating, “Please consult with a pediatrician or a pediatric surgeon as soon as possible” with the link to a list of institutions with certified pediatric surgeons is provided on the screen. Additionally, a message stating, “This is not an actual diagnosis of disease. Please consult with a medical specialist as soon as possible for accurate diagnosis.” is given on the same screen to make sure the user identifies this as just an alert and not a disease confirmation.

For the baseline statistical analysis, we used STATA 14.1 (StataCorp LP, TX, USA). For the development of the detection algorithm, we used R3.3.1 [17].

Ethics statement

This study involved the use of de-identified photographs and was approved by the Institutional Review Board.

Results

The average age of the infants when the pictures were taken was 26.8 ± 11.3 days. The average values of RGB and HSV were 171.9 (R), 149.1 (G), 31.7 (B), 49.9 (H), 82.2 (S), and 67.9 (V) for BA stools, and were 144.1 (R), 139.2 (G), 77.5 (B), 56.9 (H), 49.4 (S), and 56.6 (V) for non-BA stools. Results suggested that hue and saturation are strongly associated with BA stools (OR 1.31, P < 0.001 and OR 0.87, P < 0.001 respectively; Table 1). AUC was 0.95, demonstrating that HSV is an important component in the accurate identification of BA stools (Fig. 3). For this reason, HSV was used as a parameter in the pattern recognition process. After applying a decision by the majority approach to assess all training data in BA and non-BA groups, the CLAFIC classifier was applied and created BA and non-BA classifiers.

Table 1 Results of logistic regression of red, green, and blue (RGB) and hue, saturation, and value (HSV) on biliary atresia stools (n = 50)
Fig. 3
figure 3

Receiver operating characteristic curve of predicting BA stool by HSV. Logistic regression to examine the importance of each parameter when predicting BA or non-BA stool was performed and its receiver operating characteristic curve was plotted. The area under the curve was 0.95, demonstrating that HSV is an important component in the accurate identification of BA stools. BA biliary atresia, HSV hue, saturation, and value

To test the performance of the detection algorithm, we used a sample of 40 pictures including 5 BA stools and 35 non-BA stools that were independent from our training data. The five BA stools represented ranges of stool colors, some of which would not have been identified as acholic through visual assessment (Fig. 4). Additionally, relatively pale-colored stool pictures from non-BA children were included to test whether the detection algorithm could accurately detect faint differences in stool colors. Shown in Table 2 is an example of BA or non-BA assignment based on detection probabilities resulting from the algorithm run for 12 of the 40 pictures. When the picture were taken, the probability of being BA and non-BA were calculated and the final output labels were decided based on the higher of the two estimated probabilities. Sensitivity (i.e., percentage of true BA stools that Baby Poop analyzed as abnormal) and specificity (i.e., the percentage of true normal stools that Baby Poop analyzed as normal) was 100% (5/5, 95% CI 0.48–1.00) and specificity was 100% (35/35, 95% CI 0.90–1.00). Additionally, two stool images from Alagille syndrome patients, one image from a patient with neonatal intrahepatic cholestasis caused by citrin deficiency (NICCD), and two images from patients with progressive familial intrahepatic cholestasis (PFIC) were also evaluated and correctly identified as abnormal stool colors. To investigate the variability of the performance of the classifier with our proposed detection algorithm on different versions of the device, we conducted performance evaluations using an iPhone 5s and an iPhone 6. Substantial agreement was determined based on a kappa coefficient falling within a range of 0.80 and 1.00 [18] but no variability between iPhone 5s and iPhone 6 was detected, resulting in a kappa coefficient between the two iPhone versions of 1.

Fig. 4
figure 4

Sample of stools for assessing device accuracy. An example of stool images used to test the performance of the detection algorithm is shown which included visually non-acholic BA stools. We also included transient acholic non-BA stools in the sample. As a sample, 12 out of 40 are shown and the pictures marked “1–3” are from BA patients. BA biliary atresia

Table 2 Results of stool color detection algorithm on 12 pictures out of 40 pictures used in our sample data

Discussion

The stool color evaluation system in Baby Poop was integrated with pattern recognition and machine learning processes using BA and non-BA pictures which allows the detection of differences in stool color that may be undetectable by a lay person (i.e., non-acholic BA stools and non-BA pale-colored stools). This is in contrast to the existing BA screening mobile application, PoopMD [10], which is based on color hexes captured from stool images on the Taiwan stool card and accounts for variations in hue and brightness. Baby Poop was able to correctly identify all samples of 5 BA stools with 100% accuracy with no false-positive or false-negative designations. Furthermore, analyses of these images were consistent across different versions of the iPhone (5s and 6). Baby Poop was also used to correctly identify images of stool from patients with Alagille syndrome, NICCD, and PFIC as abnormal. In our current stool color detection algorithm, the detection classifiers are set as either BA or non-BA; therefore, abnormal stool colors of infantile cholestatic disease such as Alagille syndrome, NICCD, and PFIC used in our sample were judged based on BA or non-BA categories. This demonstrates potential utility of this technology in accurately detecting color abnormality of stools in other cholestatic diseases as well. However, further studies specifically targeting those diseases are necessary.

The success of Taiwan’s nationwide stool color card program has not only led to a decrease in the median time to BA diagnosis from 47 to 43 days, but has also led to an increase in the percentage of patients receiving their Kasai procedure within 60 days (from 60% before the program to 74% after implementation). In addition, the program is associated with an increase in 5-year jaundice-free survival from 27 to 64% [8]. The implementation of this nationwide screening program included not only parental education through the distribution of the stool card, but also provided physician education as well. Use of stool color card as a modality for screening BA has not automatically led to their success, but also their efforts in program management has accounted for their significant improvement in outcomes of BA.

Mogul et al. [19] evaluated the cost-effectiveness of screening using the stool color card in the United States. In their Markov model-based study, they applied the results gained from the Taiwan stool color program. They concluded that when compared with no-screening program, the implementation of a stool color card program was associated with nearly 30 life-years gained, and a decrease in total costs of nearly $9 million, demonstrating cost-effectiveness. The study from Canada which also applied the sensitivity and specificity estimates from the Taiwanese study confirmed that by comparing with no-screening at all, home-based screening for BA using stool color card is feasible and is potentially highly cost-effective [20]. However, Taiwan’s successful implementation of the stool color card program cannot be assumed to always apply to other countries. Additionally, previous studies have indicated that newborn screening for BA should include not only the use of stool color cards, but also the measurement of serum conjugated bilirubin concentrations to ensure sufficient sensitivity and specificity [6]. Additional studies are warranted to further evaluate the feasibility, effectiveness, and costs of potential screening strategies for early detection of BA in other settings. In our current setting, since the application is free, the cost for the consumer would include service provider expenses for data usage in downloading the application, uploading a stool picture, and occasionally updating the system. Non-iPhone users need to have access to an iPhone to use this system. The various cost calculation scenarios depend on how direct and indirect costs are considered from both the individual and broader national perspectives. The use of a mobile application could be a potential modality to add to the status-quo of BA screening, and strategies to maximize the use of it should be further investigated.

According to the Japanese Ministry of Internal Affairs and Communications [21], at the end of 2014, the prevalence of mobile phone use was 119% (implying some individuals own multiple phones) with 60% smartphone use. The rate of smartphone use among those in their 20s is approximately 94%, and 82% for those in their 30s—both childbearing age-groups. Therefore, the use of a mobile application allows for the targeting of likely caregivers of infants between 2 weeks and 1-month-old. Furthermore, possible public dissemination strategy of Baby Poop could include distributing pamphlets at regional municipals where all pregnant women visit at least once during their pregnancies to receive Maternal and Child Health Handbook or to submit their child’s birth certificate to the municipal. Since the coverage rate of Maternal and Child Health Handbook distribution is almost 100% [22], the chance that parents receive the information will be high.

Before expanding our application’s availability to other brands of smartphones, first our developed detection algorithm should be tested using a larger sample. In this study, only five BA stool images were applied to assess the performance of the device due to the very limited number of pre-existing BA stool images; this is reflected in the wide confidence interval for the sensitivity estimate. To this end, our application is integrated with Apple’s ResearchKit—an iPhone application development framework for medical research [23]. The ResearchKit framework provides three crucial components required in clinical research: informed consent forms, surveys, and real-time active tasks (e.g., the taking of photographs). By utilizing those functions, we can collect not only stool images, but also participants’ demographic data and infant birth-related information (e.g., birthweight and birth-related complications, use of stool color card, pre-existing knowledge of BA and diagnosis of BA, and other related diseases). This prospective cohort study will also allow us for further collection of stool images of not only BA and non-BA, but also other related diseases that could be potentially screened by examining stool colors such as Alagille syndrome, congenital biliary dilatation, NICCD, PFIC, and infant hepatitis syndrome. All prospectively collected data will be used for our future validation study to assess the baseline algorithm and will also serve as a new training data set for the machine learning process for parameter tuning leading to more accurate results.

Conclusion

Stool color cards have been useful in the screening of BA, as shown in Taiwan, and later adopted by several other countries; however, BA stools can sometimes be fully or partially pigmented, as obliteration of the bile duct occurs gradually. Based on the results of this study, Baby Poop, a free iPhone application, allows for the detection of subtle differences in stool colors, even with stools that may be considered non-acholic by other screening methods. Therefore, Baby Poop has the potential to serve as an objective stool color evaluation tool for the early detection of BA and other related diseases. Furthermore, use of the Baby Poop application could also provide additional data to continue to test and improve the detection algorithm used within the application.