Introduction

Kawasaki disease (KD) is a systemic febrile illness with vasculitis that has a predilection for the coronary arteries (CA). It is the leading cause of acquired heart disease in children in the United States and many other developed nations [1]. CA lesions develop in 15–25% of untreated children and 3–5% of those treated with intravenous immunoglobulins (IVIG) [1]. These lesions may cause myocardial ischemia and potentially sudden death [2, 3]. Other cardiac abnormalities associated with KD include myocarditis [4, 5] and valvular insufficiency [6].

In 2017, the AHA guidelines for diagnosis, treatment, and long-term management of KD revised their criteria for diagnosing incomplete KD (iKD) [6]. Complete KD is diagnosed clinically in children having four or more days of fever with at least four of the following manifestations: changes in extremities, polymorphous exanthem, bilateral conjunctival injection, changes in lips or oral cavity and cervical lymphadenopathy. iKD is considered in children who lack a sufficient number of manifestations to meet the classic definition of KD, and is more challenging to diagnose. In the updated guidelines, iKD is diagnosed by taking into account additional laboratory findings often associated with KD as well as a higher risk group in infants ≤ 6 months of age. Other than aneurysmal coronary diameter, the “suggestive” diagnostic echocardiographic parameters include the presence of mitral regurgitation, pericardial effusion, depressed left ventricular function, and right coronary artery (RCA) or left anterior descending (LAD) z score measurement of 2–2.5 [6]. This differs from the previous 2004 guidelines which additionally included the “suggestive features” of lack of tapering (LT) and perivascular brightness (PB) as part of the diagnostic algorithm for iKD [7].

Timely diagnosis and treatment of iKD is essential in reducing the rates of dangerous complications. Children with iKD have a similar or higher occurrence of CA aneurysms over the course of their illness as compared to children with KD [8], which may be attributed in part to a delay in diagnosis and initiation of therapy, particularly in those younger than 6 months of age [9]. A nationwide Japanese survey of 15,857 cases found a prevalence of CA aneurysm of 18.1% in KD and 19.3% in iKD [10].

Few studies have examined the reliability of LT and PB as diagnostic parameters for iKD, supporting or refuting their inclusion or exclusion from the previously published diagnostic algorithms. Therefore, inclusion and exclusion of the “suggestive” echocardiographic features have relied on panels of expert opinion. Since, convincing evidence in support or opposition of these features is lacking, it leaves the possibility for a flaw in the currently practiced guidelines. In this study, we sought to determine if LT and PB are non-specific finding that are seen in children with KD, children with iKD, and in children without KD who have a systemic, inflammatory, febrile illness as well as in healthy controls.

Patients and Methods

The study was approved by the Northwell Health Institutional Board Review with a waiver of informed consent.

We performed a single-center retrospective cohort review at Cohen Children’s Medical Center of New York with cases from January 1, 2008 to December 31, 2016. The study populations were limited to children 0–10 years of age who had an echocardiogram (echo) performed prior to IVIG administration and met the following group parameters:

  • Healthy group: children who underwent echocardiography for evaluation of a murmur that was found to be benign with a normal echo.

  • KD group: children with an acute, daily, febrile illness (≥ 38 °C) for 72 h or more treated with IVIG and meeting four or more of the clinical criteria for KD.

  • iKD group: children with an acute, daily, febrile illness (≥ 38 °C) for 72 h or more treated with IVIG and meeting 1–3 clinical criteria for KD.

  • Febrile group: children with an acute, daily, febrile illness (≥ 38 °C) for 72 h or more, due to an acute, inflammatory, systemic illness, not treated with IVIG.

The KD and iKD groups were identified through a list of all children treated with IVIG that was maintained by our hospital infection control service. We grouped patients into KD and iKD by reviewing clinical criteria as recorded in the initial cardiology and infectious disease consultation notes. The febrile subjects were identified and grouped by searching our institution’s echo library for hospitalized children with an indication that included fever and suspicion for an illness other than KD. The medical record at the time of admission was reviewed to confirm patients were febrile for a minimum of 72 h at the time of echo and had an established diagnosis for their febrile illness. A healthy group was identified by searching the echo library for outpatients of the appropriate age with the echocardiographic indication of a murmur and confirmed by echo and the medical record to be healthy children. We excluded patients who underwent echo after IVIG therapy, as well as those with structural congenital heart disease, with the exception of a patent foramen ovale or a trivial patent ductus arteriosus.

CA images were obtained via standard institutional protocol which includes parasternal short axis imaging at the level of the CA. Although transducer frequency, gain, and dynamic range are prearranged at default institutional settings, they are often adjusted at the discretion of the sonographer to optimize image quality. CA imagings were obtained using Philips IE33 and Philips EPIQ 7C. Infrequently a Siemens ACUSON SC2000 echocardiography machine was used (< 10 cases). Study investigators isolated 3–4 parasternal short axis echo clips at the level of the CA. Instructions on appropriate CA diameter measurement using methods described in reference 11 were provided to each reading cardiologist as part of the data collection sheet [11]. Identifying information, age, weight, height, BSA, and vital signs were removed. Isolated echo clips were reviewed by six blinded pediatric cardiologists, specializing in imaging, all of whom have had over 5 years of experience in echocardiography interpretation since completing training. None of the cardiologists were involved in selection of echo clips and remained blinded to patient clinical category. Each of the six reading cardiologists interpreted the same clips independently, was free to select the frame in which to perform measurements, and recorded the presence or absence of LT and PB as well as the diameters of the left main coronary artery (LMCA), LAD, and RCA. Although we are not aware of a clear, objective, definition for either LT or PB in the literature, our institution defines LT as the absence of the normal tapering expected in the distal CA without the presence of an aneurysm. PB is defined as echogenicity or brightness in the coronary arterial wall. Additionally, we define ectasia in our institution as dilation of the coronary artery not reaching aneurysmal level (i.e., z score 2–2.5). It is a term not typically used in our institution and was therefore not included in this study.

In order to assess intra-observer reliability, at least ten duplicate cases were selected at random and incorporated in the reading cardiologists’ group of echo clips. After data collection was complete, z scores of CA diameter were calculated separately by study investigators using the Boston Children’s Hospital z score system. Reading cardiologists were blinded to clinical patient information including age, weight, height, and BSA and therefore could not calculate or predict z scores themselves.

Statistical Analysis

Inter-rater reliability among the six readers was assessed using the Fleiss’ Kappa coefficient for categorical variables (LT and PB) and the intra-class correlation coefficient (ICC) for continuous variables (coronary artery z score). Intra-rater reliability between pairs of measurements for each reader was assessed using the Cohen’s Kappa coefficient. Multiple statistical classifications describing degree of reliability are reported in the literature. A commonly used methodology by Landis et al. categorizes agreement as follows: < 0.01, poor agreement; 0.01–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; 0.81–1.00, almost perfect agreement [12]. The age and sex of patients were compared across the four groups using the Kruskal–Wallis and Chi-square tests, respectively. All analyses were performed using R version 3.3 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Medical records of 143 subjects were reviewed, of which 117 had adequate CA imaging and comprised the study group (Table 1). The median ages of the groups ranged from 1.7 to 3.9 years and 20.7–51.9% were female. A review of available medical records showed subungual peeling and/or CA aneurysm (i.e., RCA or LAD z score > 2.5) during convalescence in 21 (70%) of 30 KD patients and 19 (59%) of 32 iKD patients corroborating the initial diagnosis in most cases. The febrile group consisted of subjects without structural heart disease who were being assessed for infective endocarditis that was eventually excluded as a diagnosis (N = 9), subjects with bacteremia or sepsis (N = 6), subjects with a rheumatologic illness (N = 4), subjects with a malignancy (N = 3), subjects with pneumonia (N = 2), subjects with a confirmed viral illness (N = 2), and subjects with a fever of unknown origin not due to endocarditis (N = 2).

Table 1 Patient characteristics by group (N = 117)

In comparing the findings of inter-observer reliability among the six reading cardiologists, the CA z score measurements showed considerable agreement with reliability coefficients ranging from 0.52 to 0.6. LT and PB showed lower agreement with reliability coefficients of 0.36 and 0.13, respectively (Table 2).

Table 2 Agreement analysis of echo parameters

The proportion of subjects in whom LT and PB were detected varied markedly by patient, group, and reader. The prevalence of positive LT and/or PB interpretations by individual cardiologists in the healthy and febrile groups at times outnumbered the total number of positive interpretations in the KD or iKD groups. The median frequency of LT detection as interpreted by the six reading cardiologists was 11% for the healthy group, 53% for the KD group, 44% for the iKD group, and 24% for febrile group. The median frequency of PB detection was 4% for the healthy group, 17% for the KD group, 14% for the iKD group, and 11% for the febrile group. The median frequency of combined LT and PB detection was 3% for the healthy group, 15% for the KD group, 13% for the iKD group, and 11% for the febrile group.

Nearly all subjects were interpreted to have had LT and/or PB by at least one cardiologist yet the rate of agreement for positive reads in a majority of cardiologist (i.e., four or more of six) was exceedingly low. For LT, in each of the four patient groups, 60–90% of subjects were read as positive by at least one cardiologist. For 62% of patients, a majority of cardiologists (i.e., four or more) agreed on a negative LT read, whereas in only 25% of patients a majority agreed on a positive LT read (Fig. 1). For PB, in each of the four patient groups, 36–53% of patients were read as positive for PB by at least one cardiologist. For 92% of patients, a majority of cardiologists (i.e., four or more) agreed on a negative read, whereas in only a single patient a majority agreed on a positive read (Fig. 1). For the combination of both LT and PB, in each of the four patient groups, 0–29% of patients were read as positive for both LT and PB by at least one cardiologist. Similarly, only in a single subject in the KD group did a majority of cardiologists (i.e., four or more) agree on a combined positive read for both LT and PB.

Fig. 1
figure 1

Frequency of lack of tapering and perivascular brightness interpretations per patient. Each column denotes the frequency of positive findings as interpreted by the six cardiologists per each group of subjects. Total positive reads per subject would have a minimum of zero if read as positive by none of the cardiologists, and six if read as positive by all of the cardiologists. For PB in particular, and to a lesser extent in LT, a substantial majority (i.e., ≥ 4 cardiologists) agreed upon a negative read, while in an exceedingly low number of cases was a positive read agreed upon by a majority

The individual intra-observer reliability for both LT and PB varied markedly among cardiologists and was very low in many cases (Table 2) indicating poor reproducibility of these findings even by the same cardiologist. For LT, reliability coefficient among the six reading cardiologists widely ranged from 0.14 to 0.79. For PB, reliability coefficient among the six reading cardiologists widely ranged from 0.0 to 0.61.

A review of medical records of the patient cohort with iKD indicated that LT and/or PB were detected in 10 of the 32 patients at the time of initial diagnosis. Of those ten subjects, infectious disease consultation and progress notes indicated that in four cases (one with LT, 1 with PB, two with both LT and PB) these findings played a substantial role in the decision to diagnose iKD and treat with IVIG and aspirin. For one of the four patients (with both LT and PB), documentation was available of subsequent subungual peeling that corroborated the diagnosis of iKD. None of the four patients developed CA aneurysms on subsequent echocardiograms. Overall, none of these ten patients with iKD with LT and/or PB in our study had refractory iKD requiring therapy other than IVIG or multiple infusions of IVIG. Furthermore, none developed long-term CA aneurysms.

Discussion

To our knowledge, the use of PB was first reported in an abstract by Takahashi et al. in the 7th international KD symposium in 2001 [13]. Along with LT, it was soon after adapted into the 2004 AHA/AAP guidelines for the diagnosis of KD with the suspicion that LT or PB may represent arteritis prior to true dilation of the CA which generally presents in the second week of the illness. There is only one published study that evaluated LT or PB in KD and no studies evaluating LT or PB in iKD. A study performed in Korea found that the proportion of children with PB did not significantly differ between subjects with KD (N = 58) and control subjects defined as healthy children of comparable age who underwent echocardiography for evaluation of a murmur (N = 34); the KD group did not include patients with iKD [14].

The inclusion and exclusion of LT and PB from the previous and current AHA guidelines were made largely without compelling evidence. This study was initiated and conducted prior to publication of the 2017 AHA guidelines with the aim of examining the utility of LT and PB in the diagnosis of iKD. Although a previously suspected observation, we provide evidence in support of the removal of LT and PB as contributing diagnostic criteria for the diagnosis of iKD, as was done in the 2017 AHA guidelines. The major finding of this study is that both LT and PB are poorly reliable, poorly reproducible echocardiographic findings that are not specific to iKD or KD. They can be seen in echocardiograms of healthy children and in cases of inflammatory illnesses of etiologies other than KD or iKD.

The inter-reader reliability analysis showed that the agreement was inadequate among six pediatric cardiologists specializing in non-invasive imaging when independently and blindly assessing the presence of either LT or PB. In contrast, the inter-reader variability analysis of CA diameters showed a considerable rate of agreement. This could be due to the fact that CA diameter is a quantitative measurement and therefore more objective than the qualitative assessment of LT and PB. These findings make it unlikely that use of the suggestive features LT and PB help differentiate KD patients from non-KD patients when blinded to the underlying clinical history.

Few previous studies attempting to quantitate PB using integrated backscatter analysis in children with KD showed conflicting results and could not demonstrate its diagnostic validity [14] but suggested that serial echo evaluations using this technique may reflect success of therapy [15]. However, this technology for the purpose of assessing PB is not widely used or standardized, which along with LT remains a subjective finding among most cardiologist.

The reproducibility of the finding LT or PB, among cardiologists and by the same cardiologist, was poor. Our data show that although a majority of cardiologists (i.e., four or more of six) agreed on a negative read in most subjects there was poor agreement on a positive read for both LT and PB independently and in combination (as low as 1% for the PB group). The distribution in Fig. 1 reveals that essentially there was no agreement on positive PB interpretations and a similar trend, though to a lesser extent, was seen for LT. To our knowledge, an analysis of this magnitude was not previously conducted to assess the reliability and agreement of LT and PB as diagnostic features for iKD.

Marked variability in intra-observer reliability for both LT and PB was also noted among the reading cardiologists; to our knowledge this has not been previously studied. Even though the precision of our intra-reader reliability analysis was limited due to small sample sizes (only 10 to 22 echo clips were read twice by the same cardiologist), the computed reliability coefficients indicate that many readers displayed poor consistency within themselves when evaluating LT and PB (Table 2). Total positive LT interpretations were detected sufficiently frequently to generate statistical data to demonstrate the wide range of intra-observer agreement. PB, however, was detected in a lower proportion. Two cardiologists who rarely interpreted echo clips as positive for PB were very consistent with their negative interpretations and as a result we were unable to generate statistical data to produce a reliability coefficient (noted as NA in Table 2).

In this study, we found that LT and PB are non-specific in that neither reliably differentiated subjects diagnosed with iKD from febrile patients and healthy controls. This was demonstrated by the considerable percentage of healthy controls and patients with a febrile illness not related to KD or iKD with positive LT and/or PB interpretations. Pediatric cardiologists specializing in imaging exhibited different interpretations of the same images; this is particularly notable because all of the echocardiograms were read at a single institution using similar technical and interpretation modalities. A high percentage of all patients was read as having LT and/or PB by at least one cardiologist signifying the potential for over- or under-diagnosis due to the unreliability of these findings.

In four of the 32 patients with iKD, the presence of LT and/or PB played a pivotal role in making the diagnosis and the decision to treat with IVIG and aspirin. In these four cases, infectious disease documentation attributed treatment to LT and/or PB suggesting they would likely not have been treated for iKD, or at least not treated for iKD on the day of the echo, using the current AHA guidelines. In our study, these four subjects may be a potential limitation because they were included in our iKD study group after having been diagnosed with iKD on the basis of the presence of LT and/or PB. However, when read by blinded cardiologists as part of our cohort, varying results were found; 1–4 of six cardiologists interpreted the four subjects as having LT, while 0–2 of six cardiologists interpreted them as having PB. Interestingly, for the only subject with the late finding of subungual peeling, originally read as having both LT and PB, two of six cardiologists interpreted the studies as having LT and none detected PB.

There are several limitations to this study. The design was retrospective and relied on medical record documentation for the assignment of patients into the four study groups. There was a statistical difference in age between the groups. LT and PB lack standardized echocardiographic parameters for diagnosis. Although studies varied in transducer frequency, gain, and dynamic range, which may have contributed to interpretation of PB in particular, each of the six cardiologists interpreted the same images with the same transducer settings and therefore would generally be expected to reach the same conclusion in most cases. As noted above, in at least four cases, the clinical diagnosis of iKD may have depended on unreliable echo findings of LT and PB and some of these patients may in fact not have had iKD. However, a majority of patients with iKD had the late findings of subungual peeling and/or CA aneurysms, findings generally considered pathognomonic for the condition. The clinical diagnosis of iKD was made by several different pediatric infectious disease specialists who may have varied in their threshold to diagnose iKD and prescribe IVIG. This study was conducted at a single institution and could be strengthened by a multi-center approach. However, conducting the study at a single institution removed, to a large extent, echocardiographic technique and interpretation modality variations as confounders that could explain the lack of agreement among reading cardiologists.

In conclusion, guidelines without evidence may be flawed. We provide data that LT and PB are subjective, poorly reproducible features that may be seen in febrile patients without KD as well as in healthy children. LT and PB should not be assessed when evaluating patients with proven or suspected KD or iKD. We agree with the 2017 AHA guidelines for the diagnosis of KD which excluded LT and PB as diagnostic parameters.