Introduction

Constipation is a frequently encountered problem in children and may cause considerable distress to the child and family [1]. There is no generally accepted definition of constipation [2]. Subjective history criteria, such as defecation frequency, straining on defecation and stool consistency are used most often. Objective criteria, such as whole gut transit time and mean daily stool weight have also been used, but are cumbersome in that they require stool collection or daily abdominal radiographs for 4–5 days [3, 4].

Plain abdominal radiography is frequently used to complement clinical history and examination. An abdominal radiograph is simple and cheap, and does not expose the patient to a high radiation dose [1]. However, in the clinical setting there is no common policy on how to review the plain abdominal film. Over previous decades, three different scoring systems for assessment of fecal retention on a single radiograph of the abdomen have been proposed [57]. The earliest was reported by Barr et al. [5] in 1979, followed by Blethyn et al. [6] in 1995 and Leech et al. [7] in 1999. The systems have in common that they score both the amount of feces and the localization of it in different colon segments. Previous studies have assessed the diagnostic accuracy of these systems and independently concluded that all three methods are reliable, with both sensitivity and specificity up to 90% [7].

No studies have investigated and compared the reproducibility of these systems. To assess which scoring system is the most useful in the clinical setting we determined intra- and interobserver variability of these three scoring systems.

Materials and methods

We retrospectively reviewed 40 plain abdominal radiographs of children consecutively referred to our hospital for assessment of constipation between January and December 2001. The radiographs were taken at presentation. Patients complained of infrequent defecation, soiling, encopresis, or abdominal pain.

Masked plain abdominal radiographs of the 40 children were independently evaluated by two observers, both experienced paediatric radiologists. The observers assessed each abdominal radiograph on two separate occasions, 6 weeks apart. Each abdominal radiograph was scored according to the three different scoring systems.

The first method described by Barr et al. [5] quantifies the amount of feces in four different bowel segments (ascending colon, transverse colon, descending colon and rectum) and also the consistency of the feces, i.e. granular or rocky stools. Constipation is defined as a Barr score >10. The second method described by Blethyn et al. [6] is a rough scoring system used to assess the amount of feces in the large bowel. In this system the degree of constipation is graded as follows: grade 0 normal, feces in rectum and caecum only; grade 1 mild constipation, feces in rectum and caecum and discontinuous elsewhere; grade 2 moderate constipation, feces in rectum and caecum, continuous and affecting all segments; grade 3 severe constipation, feces in rectum and caecum, continuous elsewhere and dilated colon and impacted rectum. The third method was described more recently by Leech et al. [7]. In this system the colon is divided into three colonic segments (Fig. 1) (1 ascending colon and proximal transverse colon; 2 distal transverse colon and descending colon; 3 rectosigmoid) and the amount of feces in each segment is scored from 0 to 5, 0 indicating no feces and 5 indicating severe fecal loading and bowel dilatation. With a possible score of 0–15, a score of >8 is considered to indicate constipation.

Fig. 1
figure 1

Systematic review of an abdominal radiograph according to the method developed by Leech et al. [7]

Intraobserver variability was determined for each scoring system by comparing data from the same observer at two different reading sessions. Interobserver reproducibility was determined by comparing data from the two observers on one occasion. Thus two intraobserver and two interobserver variabilities could be derived for each parameter. Kappa coefficients were calculated as indicators of intra- and interobserver variability, and coefficients <0.20, 0.21–0.40, 0.41–0.60, 0.61–0.80, and 0.81–1.00 were considered to indicate poor, fair, moderate, good, and very good agreement, respectively [8].

Results

Table 1 shows the age and sex distribution of the total study population. The mean age of the patients was 7 years (range 3–12 years); 18 patients (45%) were girls and 22 (55%) were boys. Most of these patients (38%) presented with diffuse abdominal pain. Other symptoms included encopresis/soiling (23%), low defecation frequency (27%) and diarrhoea (10%).

Table 1 Baseline characteristics of the study population

The intra- and interobserver coefficients according to the three different scoring systems are presented in Table 2. The Leech score showed the highest reproducibility with high intraobserver agreement for both observers (κ 0.88 and 1.00, respectively), and high interobserver agreement (κ 0.91 in the first round and κ 0.84 in the second round).

Table 2 Interobserver and intraobserver variability (κ values) according to different scoring systems

Agreement on the existence of constipation as measured by the Barr score between the first and second scoring round was fair for both observers with an intraobserver agreement (κ) of 0.75 and 0.66, respectively (P<0.05). However, the interobserver agreement was moderate in the first round reflected by a κ value of 0.45. The reproducibility of the Blethyn score was the lowest of the three scoring systems with κ values ranging from 0.61 to 0.65 for intraobserver agreement and 0.31 to 0.43 for interobserver variability.

Discussion

Constipation is a common clinical problem and frequently dominates the life of a child and his or her family. There is no consensus on the criteria to be used in the diagnosis of constipation [2]. Usually, an arbitrary combination of subjective indicators (history data) and objective indicators is used [9]. Various objective methods have been used to assess constipation, including rectal examination after defecation, rectal manometry with balloon insufflation and anal sphincter electromyography, but all are invasive. Others have used radiopaque markers for the assessment of constipation. This type of study produces semiquantitative information on the colorectal transit time, but requires up to 6 days to complete [10].

The plain abdominal radiograph is another diagnostic tool that can be used to quantify constipation [3, 4, 5]. The test is convenient for both patient and physician, takes little time and is less cumbersome than, for instance, measurement of the whole gut pellet transit time. Furthermore, previous studies have shown that a plain abdominal radiograph is as reliable as measurement of fecal weight or marker transit and can thus be advocated as one of the first diagnostic procedures.

During the past decades three different scoring systems for systematic review of a plain abdominal film, used to estimate the amount of fecal retention, have been described [5, 6, 7]. The first one was developed by Barr and colleagues in 1976 [5]. They stated that in the absence of a clear diagnosis of constipation, only one abdominal film is necessary for detecting and grading occult stool retention in children with symptoms of constipation. More recently, Blethyn et al. [6] devised a simple scoring system and reported a strong correlation between their score and a symptom score based on bowel frequency [6]. Their findings were supported by the study of Leech et al. [7] who developed another scoring system, and concluded it to be a clinically useful tool with sensitivities and specificities up to 91% and 85%, respectively [7].

Although the diagnostic accuracy of these methods has been evaluated in several studies, their reproducibility has never been properly determined. To our knowledge, our study is the first to assess which scoring system is the most reproducible in a clinical setting by calculating and comparing the intra- and interobserver variability of the three different scoring systems. We found that the Leech score is the most reproducible grading system for assessment of childhood constipation on a plain abdominal radiograph with high intra- and interobserver agreement with κ values ranging from 0.84 to 1.00. We therefore conclude that this system could be of value in clinical practice for systematic quantification of constipation on plain abdominal radiographs in children.