Introduction

Arthroscopy is an important tool in the diagnosis and treatment of wrist pathologies [14]. Imaging is mandatory for documentation and reproducibility of the diagnoses. If the diagnosis is doubtful despite photo documentation [5, 6] performed by one surgeon, in certain cases it could be better to repeat a wrist arthroscopy when the patient will be treated again by another surgeon. Attempts have been made to improve the reproducibility by adding videos to photo documents. However, in a study of Löw et al. [7] videos were criticized for being too short to detect the correct diagnosis.

The purpose of this study was to analyze the relationship between video length and interobserver reliability for detection of intra-articular structures and/or lesions assessed by two independent examiners.

The hypothesis of this study was that the interobserver reliability would be higher based on long video documents than based on short video documents. We further expected a lower rate of false-positive cartilage lesions after reviews of longer videos compared to reviews of shorter videos.

Materials and methods

The study was approved by the institutional ethics committee of the local university. In a prospective study, videos of 100 consecutive wrist arthroscopies (57 right, 43 left) were made by the first author. Indications for the arthroscopies were the evaluation of the scapholunate (SL) ligament in eight patients, the triangular fibrocartilage complex (TFCC) in 74 patients, and to assess other issues (such as cartilage status) in 18 patients. All operations were done in a standardized manner as described by Löw et al. [1] under axillary plexus anesthesia and by tourniquet control with 4 kg of axial traction. The arthroscopy medium was 0.9 % saline. The 3–4 and 4–5 radiocarpal portals and the radial and ulnar midcarpal portals were used. The optic was always inserted radially. Intra-articular structures were visualized using a 2.7 mm 30° optic (Stryker, Kalamazoo, MI, USA). Two videos of the wrist showing the radiocarpal (cartilage of the scaphoid, lunate, scaphoid fossa, lunate fossa; palmar radiocarpal ligaments; scapholunate and lunotriquetrum (LT) ligament; triangular fibrocartilage complex; synovia), and midcarpal (cartilage of the capitate, hamate, triquetrum; SL and LT joint gap) structures were produced. The first video was twice as long as the second video. Cartilage lesions were classified according to Outerbridge, with grades 2 and 3 combined into one grade [8]. Tears of the palmar radiocarpal, and the SL and LT ligaments were graded as partial or complete. Additionally, the SL ligament was graded according to its examination by the probe from the midcarpal joint. Grading was performed according to whether the probe could or could not be inserted into the SL interval, whether the probe could be turned inside the interval, and whether the optic could be inserted into the SL interval. Grading was based on Geissler’s classification [9] excluding its radiocarpal assessment. The TFCC was assessed for a trampoline effect [10] and a lesion was classified according to Palmer [11]. If existing, a synovitis was rated separately as present or absent.

The first author randomly mixed the 200 pairs of long and short videos and presented them for assessment by two independent surgeons (hereafter “examiners”) who were highly experienced in wrist arthroscopy. The examiners were neither involved in the treatment of the patients nor informed about the study’s hypothesis. The videos were presented pseudonymized, to blind the examiners as well as the first author to the actual diagnosis (hereafter “surgeon”). The examiners were asked to assess all intra-articular structures/lesions mentioned above and give a diagnosis, which was compared to the actual diagnosis of the surgeon.

Statistical methods

Since the Kolmogorov–Smirnov test revealed significant deviation from normal distribution, the lengths of the video sequences were compared using the Wilcoxon test. Cohen’s Kappa coefficients were calculated to measure the interobserver agreement. Concerning the cartilage status, Kappa coefficients were calculated for the presence or absence of a cartilage lesion anywhere in the joint and separately for each carpal bone. For the latter, Kappa coefficients were also calculated for the assessment according to Outerbridge. Additionally, the rates of false-positive estimations including the 95 % confidence intervals were calculated. The contingency tables were analyzed for specifics that influenced Kappa statistics. Relevant findings were outlined in a descriptive manner.

Results

The prevalence of intra-articular pathologies—cartilage and ligament lesions—according to the surgeon’s diagnosis is outlined in Table 1.

Table 1 Intra-articular findings according to the diagnoses of the surgeon

Length of the video documents

The median length of long videos was 56.50 s for the radiocarpal and 41.50 s for the midcarpal joint. These lengths were approximately twice as long as the short video sequences, which lasted 26.50 and 23.00 s, respectively. The difference was statistically significant (P < 0.001) as shown in Fig. 1a, b.

Fig. 1
figure 1

Video lengths of the radiocarpal (a) and midcarpal (b) joints in wrist arthroscopy. The median length of the videos corresponds to the documentation of “simple” wrists. For such wrists, the long video sequences lasted 56.50 s for the radiocarpal and 41.50 s for the midcarpal joint. “Difficult” wrists necessitated correspondingly longer video documents. The short video sequences in this study lasted 26.5 s for the radiocarpal and 23.0 s for the midcarpal compartment

Cartilage

Comparing long with short videos, according to Kappa statistics no difference was observed. Assessing the Outerbridge classification of each carpal bone, Kappa coefficients indicated inhomogeneous results ranging from poor to substantial agreement (Tables 2, 3). Both examiners rated twice as many false-positive cartilage lesions on short than on long videos (Table 4). The differences were not statistically significant.

Table 2 Kappa coefficients for cartilage assessment of examiner 1
Table 3 Kappa coefficients for cartilage assessment of examiner 2
Table 4 Percentage rates of false-positive findings and 95 % confidence intervals for cartilage assessment

Table 5 shows Kappa coefficients calculated for assessment of SL, LT, palmar radiocarpal ligaments, TFCC, and synovitis. The following paragraphs outline the specifics of the contingency tables that help explain Kappa statistics results.

Table 5 Kappa coefficients for assessment of ligaments, TFCC, and synovitis

SL ligament

Kappa coefficients revealed moderate to almost perfect agreement with no obvious advantage of long video sequences. Examiner 1 identified six of the ten complete SL ruptures by assessing short videos, but he diagnosed nine of the 10 complete ruptures correctly assessing long videos. For examiner 2, there was no difference. In 94 and 103 out of all 200 midcarpal videos, respectively, the two examiners observed that the probe could be inserted or even turned inside the SL joint from midcarpal. Examiner 1 rated this as “partial tear” in 32 cases and as “complete tear” in one case, whereas examiner 2 rated this as “partial tear” in 22 cases and as “complete tear” in three cases. Both examiners rated the insertion of the optic into the SL joint as a complete SL lesion.

LT ligament

The assessment of the LT joint was more accurate using long videos compared to short videos (Table 5). Both examiners identified the single complete LT ligament tear only on the long video. Examiner 1 identified seven, whereas examiner 2 identified two of 15 partial LT ligament lesions on long videos. Only one partial lesion was detected by examiner 1 on a short video, while examiner 2 saw none.

Palmar radiocarpal ligaments

Examiner 1 rated 35 of the short and 12 of the long video sequences as inadequate for the assessment of the palmar radiocarpal ligaments. In these cases, they criticized the moment when the ligaments were displayed for being too short to adequately assess their integrity. Examiner 2 rated only one short and none of the long video sequences as inadequate. In these cases, the examiners felt that the moment when the ligaments were displayed was too short to adequately assess the ligaments.

TFCC

Interpretation of the trampoline sign by examiner 2 revealed a higher interobserver reliability on long videos compared to short videos. According to Palmer’s classification, no correlation between the length of the video and interobserver reliability was found. Examiner 1 detected one of 12 ulnar TFCC lesions on long videos. No lesions were identified on short videos. Examiner 2 detected TFCC lesions in three long and three short videos.

Synovitis

Neither Kappa coefficients nor analysis of the contingency tables outlined the specific influence of the video length on the inter-rater agreement for assessment of synovitis.

Discussion

According to Kappa statistics, long videos have no advantage over short videos for the assessment of the most important intra-articular structures and/or lesions—cartilage, SL ligament, and TFCC—in wrist arthroscopy. Nevertheless, both examiners rated twice as many presumptive cartilage lesions in the short videos as they observed in the long videos. This illustrates the need for a slow-motion video to adequately display articular surfaces. Among the studies that examine interobserver reliability for the assessment of cartilage, the authors of only one study mentioned the length of the video sequences [12]. Their videos for knee arthroscopies lasted one min each for the patellofemoral, medial, and lateral compartments. Despite these relatively long sequences, Kappa coefficients between 0.43 and 0.49 revealed only moderate agreement. Cameron et al. [13] videotaped the arthroscopies of six cadaveric knees. They calculated the percentage of agreement between the grades determined during arthroscopy and at subsequent arthrotomy. They found the Outerbridge classification moderately accurate for grading chondral lesions arthroscopically. Marx et al. [14] examined the interobserver reliability of the grading of cartilage lesions among surgeons at different institutions. They concluded that the arthroscopic grading of articular cartilage lesions would be reproducible, although their calculated Kappa coefficients ranged between 0.34 and 0.87. None of these studies considered the video length to be relevant in the assessment of cartilage lesions. In a prospective study, Javed et al. [15] found an interobserver variation of 18 % for the evaluation of articular surface defects. They stated that the examiners’ level of experience would influence the accuracy of cartilage assessment. In this study, the surgeon and the two reviewers were highly and equally experienced in wrist arthroscopy.

As it is, we have to be aware that the assessment of articular surfaces by viewing video documents differs among surgeons [1214]. Spahn et al. [16] have shown that poor interobserver reliability can be expected if different surgeons examine the same joint arthroscopically. Consequently, it seems necessary to improve the quality of video documentations in wrist arthroscopy. Determining a minimum length for such videos as in this study is an important part of quality improvement.

In this study, the video lengths differed within a large range (Fig. 1a, b). This seems to be due to technical issues based on the patients’ anatomical structures. Synovitis and fibrosis may cause difficulties continuously examining the joint. Moreover, it is necessary, that the videos not stop at a site where the view is limited. This has been criticized in an earlier study [7]. All the more, it is necessary to display unclear parts of a joint, as these parts usually depict relevant articular pathologies. Therefore, in this study, single nonstop videos of each joint compartment were provided.

Regarding the SL ligament, complete lesions are not necessarily easy to diagnose when viewing a video document. The moment when the optic is inserted in between the scaphoid and the lunate should last appropriately long to allow an independent reviewer to recognize the capitate glancing through the SL gap. In this study, this resulted in obscured SL ligament lesions when the reviewers assessed the short videos. Kappa statistics were not able to depict this issue. Nevertheless, it makes sense not to pass difficult parts of the joint too quickly during video recording.

Limitations

Some bias may be assumed by the fact that the surgeon who performed the arthroscopies could have influenced the quality of the short video sequences, as he knew the study’s aim. Another limitation of this study is the small number of actual intra-articular lesions, which resulted in low Kappa coefficients.

Conclusions

Despite the lack of statistical significance, adequate video documentation of our findings in wrist arthroscopies seems to necessitate adequate length of the video sequences. To avoid false-positive cartilage lesion diagnoses and to facilitate the detection of relevant ligament lesions, the video documents should last appropriately long. Assuming that the median length of the videos in this study adequately displays the findings in a simple wrist, we recommend that a sequence of the radiocarpal joint should last about 60 s and that the sequence of a midcarpal joint should last about 45 s. Videos of difficult joints should last appropriately longer.