Introduction

Dysphagia affects more than 569,000 children in the United States and over 100,000 infants are discharged from acute care hospitals with a diagnosis of feeding/swallowing problems [1, 2]. The incidence of pediatric dysphagia is increasing as a result of advances in medicine that have improved the survival of children with histories of prematurity, low birth weight, and complex medical conditions [3,4,5]. It is well known that childhood dysphagia is associated with serious acute and long-term health consequences, and may predict future speech and language delays [6]. Associated complications have been implicated in the adult onset of respiratory diseases and other serious health conditions [7,8,9,10]. Therefore, pediatric dysphagia is a significant health problem. Given that early detection and prompt intervention have potential to reduce the co-morbidities associated with swallowing dysfunction [11, 12], there has been a rise in the number of instrumental swallowing evaluations performed in infants and young children. Increased use of Videofluoroscopic Swallow Studies (VFSS), also named Modified Barium Swallow studies (MBS), has raised concerns about the balance between the diagnostic and clinical efficacy of VFSS procedures and exposure to ionizing radiation [13,14,15,16,17].

The clinical utility of VFSS examinations has been established for adults with dysphagia [18]. Across the age spectrum, VFSS results provide information about physiologic swallowing impairments that go beyond describing the presence or absence of aspiration [14, 15, 18]. In addition to the detection of swallowing problems with potential findings of aspiration, VFSS examinations enable identification of interventions that may improve the safety and efficiency of swallowing, facilitate appropriate oral intake recommendations, and prompt appropriate referrals. Interpretation of videofluoroscopic images and the amount of radiation exposure associated with VFSS procedures are impacted by study protocols (e.g., bolus variables, order and method of contrast material administration) and the experience of the examining clinicians. Longer fluoroscopy times, which are associated with higher levels of radiation exposure, are typically higher for novice compared to trained clinicians [19]. Standardized protocols can provide clear and consistent information for comparisons between and within patients, and thereby may ultimately decrease the lifetime radiation exposure by limiting the need for multiple VFSS studies [19]. The Modified Barium Swallow Impairment Profile (MBSImP)™© was developed to standardize the acquisition and interpretation of radiologic images in adults for accurate, practical, and clinically meaningful evaluation of swallowing function [20, 21]. Analogous tools do not exist for bottle-fed children. As a result, VFSS procedures in children are variable and determined by the skills of the examining clinician, institutional wisdom, and information derived from adults with dysphagia [14, 15].

The primary purpose of this study was to assess the reliability of making judgments of oropharyngeal and cervical esophageal swallowing from visual observations of bolus flow and oropharyngeal structural movements on VFSS images obtained from bottle-fed children. To achieve this goal, a novel tool was developed to incorporate physiologic components of swallowing known to be critical for safe and efficient swallowing across the lifespan while including attributes, such as sucking and swallowing interactions that are unique to bottle feeding. The secondary purpose of this study was to gain information regarding the nature and challenges of learning, including the rating and interpretation of the range of observations characterizing each physiologic component. This information will help guide future decisions about the nature and intensity of training required to achieve accurate and reliable scoring using the proposed method. Taken together, this investigation represents the first steps in a large-scale investigation with the overriding goal of developing a novel assessment tool for the standardized interpretation of VFSS findings obtained from bottle-fed children. Such a tool has the potential for guiding clinical decision-making for enteral nutrition management, defining targets for physiologically based treatments for dysphagia interventions, and identifying biomarkers that signal the onset and natural history of swallowing disorders in affected children.

Materials and Methods

Content Validation

The MBSImP™© served as the conceptual impairment assessment model in this study for the identification and quantification of swallowing impairment(s) and was approved by the Institutional Review Boards of the participating institutions [20]. The multiple Principal Investigators (MPIs: ML-G, BM-H) combined information from years of field testing experience with the MBSImP™© and their extensive clinical experiences to develop the prototype of the pediatric tool. This prototype tool included components that captured physiologic parameters of oropharyngeal and cervical esophageal swallowing shared by adults and infants as well as physiologic components unique to bottle-fed children. Score variants for each component were developed to characterize the unique and distinguishable videofluoroscopic observations representing the range of function reported in the literature and confirmed by expert consensus. These scores were operationally defined and assigned rank ordered numeric values, with the lowest number representing normal or typical function, and the highest number representing the worst function or performance. In this investigation, the number of score variants differed for individual components (Appendix), which is consistent with reliability training and testing from VFSS images in adults.

The prototype tool was disseminated for review to an international, multidisciplinary group of 20 experts representing 5 pediatric disciplines (speech-language pathology, otolaryngology, radiology, neonatology, and pulmonology) and with an average of 11.6 years of experience in the evaluation and management of infants and young children with swallowing disorders. This version of the tool included components of oropharyngeal swallowing that were divided into five hypothesized functional domains: (1) Nutritive Sucking/Oral Containment and Clearance, (2) Pharyngeal Swallow Initiation, (3) Pharyngeal Containment and Clearance, (4) Airway Protection, and (5) Esophageal Entry and Clearance. Experts completed a web-based survey regarding the importance of including the proposed components of swallow function and the clarity of the definitions for scoring the variants of each component. See Fig. 1 for a schematic of the tool development and rater training process. These survey results were used to refine the prototype tool, for creation of a pilot tool, and the development of a training manual that detailed the operational definitions for scoring each component. The hypothesized domains, components, and score variants in the refined tool (henceforth referred to as tool) are displayed in the online Appendix.

Fig. 1
figure 1

Tool development and rater training process

Training Materials, Raters, and Rater Training for Scoring of Swallowing Components and Reliability Testing

Training materials included 94 digitized motion clips obtained from clinically indicated VFSS examinations of bottle-fed children that were conducted at two academic medical centers. Samples were extracted from archived studies on Pentax Medical Digital Swallowing Workstations™. All studies were performed at 30 frames per second and used standardized thin liquid barium contrast materials (Varibar®) that were prepared according to the manufacturer’s instructions. Consistent with “real world practice,” there was no effort to control for nipple, bottle type, or positioning. Adaptations were based upon the needs of the individual child as determined by the examining SLP.

The rater training/reliability testing process began with didactic component training and group interactive scoring practice, and was followed by independent scoring and reliability testing. Additional training sessions were provided, and the independent scoring and reliability testing process was repeated until all raters achieved the ≥80% reliability criterion for each component.

SLP Rater Training: Didactic Component Training and Interactive Scoring Practice 7 SLPs (henceforth, raters) certified by the American Speech Language Hearing Association (ASHA) with 2–28 years (median 7 years) of experience participated in a 6-h face-to-face didactic training session conducted by the MPIs that introduced scoring for the components comprising the tool. Five of the raters had experience with pediatric dysphagia, which included conducting and interpreting VFSS examinations in bottle-fed infants and young children. Two of the raters had worked exclusively with adult dysphagic patients and completed training with the MBSImP™© approach.

Each rater was provided the scoring metric, scoring forms for the tool, and a detailed training manual. During the didactic training session, the MPIs reviewed definitions for all the components and their variants, and instructions for the overall impression (OI) scoring method as used in MBSImP™© [22]. The OI scoring method is a “charting by exception approach” that represents the worst performance (highest score on the ordinal scale) for each swallowing component across all thin liquid swallows and was developed to assist clinicians in the detection of impairment.

MPIs trained the raters using frame-by-frame review of digitized images for each swallowing component and associated score variants. Images were viewed on QuickTime software. Wording of operational definitions for some components and their variants was modified on the basis of rater input during the didactic training and interactive scoring practice sessions.

Reliability Training: Independent Scoring Practice The revised scoring metric, scoring forms for the tool, and training manual, as well as a library of digitized video clips of component examples and their variants were disseminated to the seven raters for independent review and practice. Additional training via two videoconference sessions was conducted by MPIs to address any questions generated during independent scoring practice. The training manual and scoring sheets were edited to reflect clarifications, and adjustments were made to the library of video images. When raters were comfortable scoring, they began the process of reliability testing.

Reliability Testing Sessions Each rater was provided ten de-identified and randomly assigned, full-length digitized examinations of bottle-fed children (age range 1 week to 23 months) for scoring. Consistent with everyday practices, no effort was made to control resolution of computer screens used for analysis. Independent of rater training sessions and by consensus, the MPIs established a “gold standard” OI score (henceforth gold standard score) for each of the components on all 10 examinations. In this feasibility study, the primary goal was agreement in detection of impairment, and therefore concordance between raters and the gold standard was defined as either exact agreement with the gold standard score or the gold standard score +1 (worse function) (e.g., detection of impairment but overcalled by one level). The exception was for component #3, for which raters were required to count the number of sucks to form a bolus. Given the variability in the number of sucks per swallow on radiologic examination [23], raters were asked to count the number of sucks per swallow for component #3 (Number of Sucks to Form Bolus) and concordance was defined as equal to or up to two sucks greater than the gold standard score (i.e., gold standard score +1 or +2).

After raters scored the OI for each component on the 10 examinations, they were provided summaries of the accuracy of their scoring. Each rater’s percent agreement on individual components was calculated and accuracy (% correct) was defined in relation to the gold standard scores. Reliability criterion was defined as ≥80% concordance with the PIs’ gold standard scores for each of the 24 components.

Additional individualized training was provided via teleconference and on site for raters to review scoring for any of the components that did not achieve reliability. Following each re-training session, raters were provided a new set of de-identified, randomly assigned full-length examinations and asked to re-score the components missed during the previous testing session. Raters were re-trained and re-tested until the ≥80% criterion was achieved for all components.

Results

Reliability criterion (≥80%) for all of the 24 oropharyngeal swallow components was achieved by all seven raters after completion of three training–testing sessions. Table 1 displays the number of sessions required by raters to achieve the reliability criterion for each of the components and their hypothesized domains.

Table 1 Cumulative number (%) of raters (n = 7) by training sessions to achieve criterion (≥80%) for each component within hypothesized functional domains
  1. 1.

    Nutritive Sucking/Oral Containment and Clearance Although all seven raters achieved criterion after one or two session for two-thirds of the components, they had greatest difficulty in meeting the reliability criterion for Initiation of Nutritive Sucks and Oral Residue at End of Suck/Swallow.

  2. 2.

    Pharyngeal Swallow Initiation The majority of raters achieved criterion on both Bolus Location at Initiation of Pharyngeal Swallow and Timing of Initiation of Pharyngeal Swallow following one training–testing session.

  3. 3.

    Pharyngeal Containment and Clearance With the exception of one rater, all achieved reliability after one or two training–testing sessions. Tongue base retraction was the most challenging component in this domain.

  4. 4.

    Airway Protection This domain was the most challenging for raters. Three raters required two training–testing sessions, and four required three training–testing sessions to reach the reliability criterion.

  5. 5.

    Esophageal Entry and Clearance Esophageal entry was determined at the level of the Pharyngoesophageal Segment ([PES] also called Upper Esophageal Sphincter)] and included distension (esophageal level maximal opening of the PES), duration of PES opening, and obstruction to bolus flow. All raters achieved criterion for reliability upon completion of the second training–testing session for both components comprising this domain.

Discussion

VFSS examinations are frequently used to diagnose oropharyngeal dysphagia and to guide appropriate, targeted interventions across the lifespan. Establishing standardized procedures and reliable identification of salient physiologic swallowing impairments are essential first steps in determining the clinical utility of these examinations. Standardization of VFSS procedures has enabled the reliable and accurate identification of swallowing impairments that translate to clinical practice and targeted treatment for adults with dysphagia [18, 22]. To date, comparable standardized and validated VFSS procedures have not been developed for bottle-fed children. Currently, little is known about the reliability, feasibility, or clinical utility for quantifying observations of impairment from VFSS recordings in bottle-fed children.

To our knowledge, this is the first study to report on a novel tool that enables identification and reliable judgments of swallowing physiology and airway protection, across clinically indicated, full-length VFSS examinations obtained from bottle-fed children. Although three previous investigations reported on swallowing and bolus characteristics, these studies described VFSS characteristics from a small number of swallows. Newman et al. reported on full-length studies from 43 consecutive infants referred for VFSS examinations; however, only 4–5 swallows per study were extracted for reliability measurements [4]. Weckmueller and colleagues selected VFSS studies that were judged to be “normal” and analyzed reliability of observations for three swallows per study [24]. In the third study, Gosa et al. reported on the reliability measures from a total of 25 swallows extracted from images of 10 children who had been selected on the basis of having airway invasion on VFSS [25]. Although all of these investigations showed high inter-rater agreement, data were extracted from observations of a limited number of swallows and from pre-selected patient population in two of the studies. Additionally, none of these investigations reported whether the reliability measures represented the best, worst, or most common patterns of swallowing function. Therefore, it is unclear whether the previously reported findings are generalizable to clinical practice using full-length VFSS examinations obtained across the range of swallowing characteristics observed in bottle-fed children seen for VFSS examinations.

Although all raters in the current study achieved established reliability criterion on all components, the number of unique and unambiguous observations required to capture variations in impairment (scores) differed by component. Additionally, some of components/variants proved more difficult to distinguish than others. Specifically, variants most difficult to score during the reliability testing included those assigned to components comprising the hypothesized domains of nutritive sucking/oral containment and clearance, pharyngeal containment and clearance, and airway protection.

Variability in oral function coupled with age and developmental changes may have contributed to challenges in scoring components related to sucking patterns and oral containment [23, 26,27,28]. Components related to airway protection also required additional training sessions. Plausible explanations include differing resolution of viewing screens, difficulty detecting very small amounts of thin barium in the airway, and incomplete calcification of structures that delimit the airways of young children. Nonetheless, these finding are of particular concern because in current practice clinicians frequently emphasize the detection of airway invasion as the primary finding to guide medical management and therapeutic interventions.

In summary, standardization of operational definitions and training is essential for reliable scoring of physiologic swallowing impairments and may eventually limit radiation exposure [19]. Our study demonstrates the feasibility for identification of swallowing impairment and reasonable reliability for quantifying variations in impairment from VFSS observations using a rank-order, clinically practical method. These data provide the evidence necessary to move forward testing of these swallowing measures in a large-scale, prospective study of bottle-fed children that will clearly delineate the relevance of these diagnostic measures on the health and well-being of bottle-fed children with dysphagia. Our next steps in the development of this novel tool are to further investigate within and between rater agreement in a powered cohort of 300 VFSS examinations of bottle-fed children, test construct, and external validity, and reduce the number of unique observations into relevant clusters that simplify scoring for clinical practice. Together these efforts should culminate in the development of a standardized training and interpretation method that improves the ability to quantify swallowing impairments in bottle-fed children, reproduce results across clinics and labs, target appropriate interventions, and detect changes in function over time including patterns of development.