Introduction

Connective tissues diseases (CTDs) are a heterogeneous group of immunologically mediated systemic diseases that may affect every organ and system. Interstitial lung disease (ILD) is one of the pulmonary manifestations of CTDs which can lead to significant morbidity and mortality [1, 2]. The incidence of ILD varies according to the specific CTD, but it is often underestimated because it may remain symptomless for many years. In fact, from a clinical point of view, ILD in CTDs is generally subclinical, and it manifests with dyspnea and respiratory failure in few cases. Nevertheless, ILD is the second cause of death in systemic sclerosis (SSc) and rheumatoid arthritis (RA) [3].

In daily clinical practice, conventional chest radiography is the first imaging tool used to evaluate the presence of ILD, but it is of limited use due to its low sensitivity, particularly in early stage of disease. Chest high-resolution computed tomography (HRCT) is considered the gold standard in the diagnosis of ILD and in the ability to identify lung pattern of different interstitial pneumonia [4].

In the last 20 years, many authors demonstrated the utility of ultrasound (US) assessment in the evaluation of lung and pleural diseases, offering new tools for the management of acute and chronic pulmonary conditions [5,6,7,8,9,10,11,12]. Lung ultrasound (LUS) was also applied in the evaluation of ILD in CTDs [13,14,15,16,17]. The elementary findings detectable are artifacts generated from the thickened interlobular septa at lung surface level. The suggestive artifact of the presence of ILD is a hyperechoic narrow-based reverberation type of artifact, defined US B line, and appears like a laser ray up to the edge of the screen (Fig. 1) [18].

Fig. 1
figure 1

Example of B lines in systemic sclerosis. B lines are the “comet tail” artifacts generating from the pleural line (∆) to the edge of the screen (*). In this scan are detectable three B lines. m: chest wall muscles; p: pleural line; l: lung parenchyma

Considering the numerous studies carried out in this field, the aim of this paper is to provide the “state of the art” of the role of LUS in the management of ILD associated with CTD.

Methods

We reviewed all relevant scientific articles regarding ILD in CTDs published in the last 18 years, according to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines [19]. We performed a systemic research on the electronic databases (PubMed and EMBASE) using the following search terms in all possible combinations: ultrasound, ultrasonography, sonography, interstitial lung disease, pulmonary fibrosis, interstitial pulmonary fibrosis, interstitial fibrosis, systemic sclerosis, rheumatoid arthritis, Sjögren’s syndrome and systemic lupus erythematosus. We evaluate all the articles concerning studies in humas, published between January 2000 and December 2018, removing duplicates. Two independent rheumatologists (MG, MT) screened all the titles, abstracts and full reports of articles identified, and in case of disagreement, a third investigator (CB) was consulted, obtaining a consensus. Exclusion criteria were case reports, letters to the editor, nonhuman studies and articles not published in English. In the evaluation of methodological quality of the included studies, we applied the Newcastle-Ottawa Scale (NOS), which is a tool developed to assess quality of nonrandomized observational studies [20].

Results

We identified 713 publications, of whom 20 papers were included. In Table 1 are summarized demographic data, number of patients enrolled and type of diseases included.

Table 1 Demographic data, number of patients enrolled and type of diseases of patients involved in the review studies

The largest number of articles concerns the SSc, since the incidence of ILD in this disease is higher than in other CTDs. The first study was conducted by an Italian group in 2009 [13]; they subjected 33 consecutive SSc patients to a LUS assessment, using a 2.5–3.5 MHz cardiac transducer, in the anterior, middle and posterior chest. Sixty-two pulmonary intercostal spaces (LIS) were included with an average LUS examination time of 10 min. LUS data were correlated with HRCT data, using the score proposed by Warrick et al. [21]. They obtained a linear correlation between the LUS and HRCT data and a significant correlation between the number of B lines and the diffusion capacity values of carbon monoxide (DLco).

The same group of authors [22] conducted a study on 25 patients to compare the evaluation of LUS performed with two different probes: a 2.5–3.5–3.5 MHz heart probe and a 6–12 MHz linear probe used at 6 MHz, and with HRCT chest data, using the Warrick score. They found a significant correlation between the LUS evaluations obtained with both transducers and a moderate and good correlation between the heart probe and HRCT (ICC = 0.547) and the linear probe and HRCT (ICC = 0.600).

Another Italian group evaluated 34 consecutive patients with different CTDs (26 SSc, two Sjögren’s syndrome, two antisynthetase syndrome, two dermatomyositis, one mixed CTD and one undifferentiated CTD) [16]. LUS evaluations were performed on 50 LIS in anterior, middle and posterior chest, with a 2–7 MHz convex broadband multifrequency transducer, using an average time of 23 min for each patient. They correlated the LUS results with the pulmonary function test (PFTs), with the DLco and with the HRCT score proposed by Warrick. In addition, to determine the severity of the ILD, they proposed a semiquantitative LUS score: grade 0 = normal (< 10 B lines); grade 1 = light (11 to 20 B lines); grade 2 = moderate (21 to 50 B lines); and grade 3 = marked (> 50 B lines). They found a positive correlation between the LUS data and the DLco values and, in addition, a significant linear correlation between the semiquantitative LUS score and the Warrick score. The reliability of the interobserver between two sonographers in the detection of B lines was excellent. A larger number of B lines were observed in the LIS at the level of the lower posterior regions of the chest, which were proposed by the authors as the first LIS to be evaluated in the initial phase of the ILD.

The same group of researchers [17] proposed a simplified evaluation of the LUS, performing a post hoc analysis. They included 36 CTD patients (28 SSc, 2 Sjögren’s syndrome, one undifferentiated CTD, two antisynthetase syndrome, two dermatomyositis, one mixed CTD), performing the evaluation of LUS with a 2–7 MHz broadband multifrequency convex transducer. Fourteen LIS were chosen in which the highest prevalence of US B lines was obtained: the second LIS along the parasternal lines, the fourth LIS along the anterior thoracic wall (CW), the anterior and middle-axillary lines at the center CW and the eighth LIS along the paravertebral, sub-scapular and posterior axillary lines at the posterior CW. They proposed a simplified LUS score: 0 = normal, (< 5 B lines); 1 = light (6 to 15 B lines); 2 = moderate (16 to 30 B lines) and 3 = marked (> 30 B lines). They found a significant correlation between the complete and simplified LUS score (p = 0.0001) and a positive correlation between the simplified LUS score and the HRCT score (p = 0.0006), both in the quantification and extension of the ILD. A notable result is the significant difference in average time taken to perform the simplified LUS assessment (8.3 min) compared to the comprehensive LUS assessment. Excellent reliability was also reported, and the feasibility of the simplified score was acceptable.

The evaluation of pleural features was introduced by Moazedi-Fuerst et al. [23], who performed a study in a single center, comprising 25 patients with SSc and very early SSc and 40 healthy volunteers. They correlated the LUS data with chest HRCT and proposed the definition of pleural irregularities when the thickening of the pleural line was greater than 2.8 mm and a pleural score: 0 = without areas of irregularity, 1 = 1–5 areas of pleural irregularity and 2 = > 5 areas of pleural irregularity. They found the B lines in 7% of healthy subjects, and in 44% of patients with pathological LUS, ILD was confirmed by chest HRCT. An interesting element was the fact that all SSc patients with B lines also showed significant pleural irregularities, opening up a new area of LUS research.

Barskova et al. [24] conducted a pilot study in 58 consecutive SSc patients, including 32 patients with very early SSc. They performed LUS evaluation in 62 LIS of CW, using a 2.5–3.5 MHz cardiac transducer with a length of 2.5 cm, and correlated ultrasound data with HRCT and pulmonary function tests. ILD, detected by HRCT, was observed in 41% of the population with very early SSc. They found 100% sensitivity and 59% specificity when the total number of B lines was > 5, and an agreement of 83% between HRCT and LUS for ILD detection. Discrepant cases were determined by false positive data detected with LUS, providing a 100% negative predictive value in both SSc and early SSc.

Mohammadi et al. [25] proposed a reduced scoring system for the evaluation of LUS. They performed a cross-sectional study in 70 patients with SSc, correlating LUS data with chest HRCT as the imaging standard to investigate its concomitant validity. The evaluation of LUS included the evaluation of ten LIS: for the anterior chest: the fourth LIS along the midclavicular line; for the lateral chest: the fourth LIS along the anterior and middle-axillary axillary lines; and for the posterior chest: the eighth LIS along the sub-scapular and posterior axillary lines. A significantly positive correlation was found between the LUS data and the severity of lung involvement on chest HRCT, resulting in sensitivity, specificity, positive and negative predictive values of 73.58%, 88.23%, 95.12% and 51.72%, respectively, compared to HRCT.

Sperandeo et al. [15] proposed a new method for evaluating pleural abnormalities (thickening, pleural/subpleural nodules and other subpleural lung abnormalities), performing a LUS evaluation of all CW LIS, using a convex probe at 3.5–5-MHz. The purpose of the study was to evaluate the ability of the LUS to detect pleural line thickness (usually < 3.0 mm) for the study of subclinical ILD in patients with SSc to plan HRCT evaluation. They performed chest LUS and HRCT in 175 patients with SSc. Pleural line thickening (3.0–5.0 mm) was found in 97 patients, subpleural nodules in 32 patients and major pleural line thickening (> 5.0 mm) in 35 patients, while normal pleural line thickening was found in 26 patients without ILD. All LUS data showed good agreement with the HRCT score, classified as extended pulmonary fibrosis (PF) (definitely involving the medium–high lung), limited or baseline PF (only involving the posterior lower-base lung) or absent PF (no apparent sign).

In accordance with the previous study, Buda et al. [26] proposed a new method to describe the results of the LUS for the ILD criteria: pleural line irregularity, pleural line narrowing, the fragmentary nature of the pleural line, pleural line blurring, pleural line thickening, B-line artifacts ≤ 3 and subpleural consolidations < 5 mm. They performed a study at a single center to correlate LUS results with HRCT results, using Warrick’s score, in a cohort of 52 ILD patients compared to 50 healthy subjects. They evaluated all the LIS of CW, dividing them into upper, middle and lower fields. Pleural line irregularity was most often found in the lower fields of both lungs (100% of ILD patients). The most frequent result was a thickening of the pleural line (thickness ≥ 2 mm), mainly in the lower lung fields in patients with SSc, while in severe cases of ILD a blurred pleural line was detected, which is detected in patients with honeycombing in the HRTC scan. Sensitivity and specificity of the blurred pleural line were 0.59 and 0.82, respectively. B lines were observed in 92.3% of patients with SSc, of whom 69.3% had numerous B lines (≥ 4), especially in the lower fields. B lines were also observed in the middle and upper fields when ILD was severe. The authors also noted that numerous B line artifacts occur when the blurred pleural line is detectable in LUS (p < 0.001). The authors provide the definition of “white lung syndrome,” when the results of the LUS showed numerous B-line artifacts dissolving into a single large vertical artifact, which meets the definition of a B line, showing a strong correlation with the presence of ground-glass opacity on HRCT; sensitivity and specificity were 0.95 and 0.99, respectively.

Recently, Gigante et al. [27] conducted a transversal study with the aim of correlating the results of LUS, HRCT thoracic and PFTs in 39 patients with SSc, including the modified Rodnan skin score (mRSS) as a clinical variable. A positive correlation was found between the number of B lines and the HRCT score (r = 0.81, p < 0.0001); the authors showed a negative correlation between the number of B lines and DLCO (r = − 0.63, < 0.0001), while no significant correlation was obtained between the LUS and mRSS data.

Forty-eight consecutive SSc patients were also evaluated by Cakir et al. [28] in a study aimed at investigating the ability of the LUS to assess the severity of the ILD-SSc. In this study, the authors demonstrated a good correlation between the B lines and HRCT (r = 0.89; p = 0.0001) and, interestingly, with the Medsger disease scale (r = 0.55; p = 0.0001). Sensitivity, specificity, positive predictive value and negative predictive value of the LUS data were 100, 84.2, 90.6 and 100%, respectively, when chest HCRT was taken as the gold standard.

Tardella et al. [29] in 2018 designed a cross-sectional study to determine a cutoff point of the number of B lines to detect the presence of significant ILD in 40 consecutive SSc patients in relation to the Warrick HRCT score. The authors adopted the previous LUS score of 14 LIS [17] for the evaluation of each patient and showed that a value of 10 B lines is highly predictive for the significant presence of SSc-ILD in HCRT, using as external criterion a Warrick score of 7. They also found a strong correlation between the total LUS score and the DLco and a moderate correlation between the total LUS score and quality of life measures.

Also in 2018, Hassan et al. [30] conducted a prospective cohort study to demonstrate that LUS is a useful screening tool for ILD in patients with SSc versus HRCT. This study involved 67 patients with SSc. In 29 patients with abnormal HRCT (Warrick score > 7) and LUS, two had a low score (6–15 B lines) and 27 had moderate or severe scores (≥ 16 B lines). Of the 38 patients with negative HRCT, 25 had some degree of lung involvement on LUS. LUS reported a sensitivity of 100% and a specificity of 34%. A significant relationship between the number of B lines and the presence of ILD on HRCT was demonstrated (area under the curve, 0.80; 95% confidence interval, 0.69–0.90).

The previous studies were designed to evaluate ILD in SSc patients using LUS, demonstrating a good correlation with chest HRCT. The next discussed studies included patients with different CTDs.

Aghdashi M. et al. [31] evaluated a cohort of 31 consecutive patients affected by RA with suspected pulmonary involvement. The included patients underwent HRCT and LUS evaluations. Taking HRCT as the gold standard, the authors obtained sensitivity, specificity, positive and negative predictive values of LUS of 73.58%, 88.23%, 95.12% and 51.72%, respectively.

Cogliati et al. [32] designed a monocentric study to verify the accuracy of LUS in the diagnosis of ILD in RA patients. They included 39 patients and evaluated LUS with both standard equipment and a pocket US device (PS-USD) as a screening tool. A full LUS scanning of 72 LIS (28 anterior and 44 posterior) was performed, and the number of B lines > 10 was considered as positive. The sensitivity and specificity of the LUS standard versus chest HRCT were 92% and 56%, respectively. The B-line score was highly correlated with the HRCT score. A total of 29 patients were studied with a PS-USD, whose sensitivity and specificity with respect to chest HRCT were 89% and 50%, respectively.

An interesting element was introduced by Hasan et al. [33] who studied the accuracy of LUS in ILD diagnosis by comparing it with HRCT chest data (including ground glass, reticular, nodular or honey combing) and with PFT. Sixty-one patients with ILD secondary to several diseases were included, including eight cases with CTD. The evaluation of LUS was performed using a 3.5-MHz convex probe. They divided the chest into four areas and considered a positive region when they found three or more B lines in a longitudinal plane between two ribs and defined an examination as positive when there were two or more positive regions bilaterally. All patients showed bilateral B lines that correlated positively with the severity of the disease on chest HRCT. They showed a positive correlation between the bilateral B lines detected by the LUS and the severity of the HRCT score proposed by Warrick et al. The new element of this study was the evaluation of the distance between two adjacent B lines: the opacity of the ground-glass on HRCT correlated with a distance of 3 mm, while extended fibrosis and honeycombing correlated with a distance of 7 mm.

Moazedi-Fuerst et al. [34] proposed a transverse study to estimate the value of LUS as a diagnostic screening tool in patients with RA who did not show clinical signs or symptoms of ILD. Sixty-four patients with RA and 40 healthy volunteers were included as a control group. They investigated not only the presence of B lines, but also pleural irregularities, introducing this new aspect in the study of ILD in RA. All patients underwent PFTs and DLco determination, chest HRCT and LUS evaluation. LUS detected pleural nodules or B-line artifacts in 28% of patients with AR. In these patients, HRCT scans showed signs of incipient subclinical interstitial lung disease. On the other hand, LUS showed sporadic abnormalities in 7% of healthy controls.

Recently, the same authors [14] conducted a new study with the aim of determining the diagnostic value of LUS in the diagnosis of ILD in patients with CTDs [RA, SSc and systemic lupus erythematosus (SLE)] and, as a second aim, to determine the possible correlation between the frequency of pathological results of LUS and the underlying disease. Forty-five patients (25 with RA, 14 with SSc and six with SLE) and 40 healthy subjects were involved as control groups. Chest HRCT was adopted as the gold standard for ILD diagnosis. In addition, in this study, B lines, subpleural nodes and pleural irregularities were considered features of the LUS examination. Twenty-eight percent of the RA cohort, 64% of SSc patients and four out of six SLE patients showed some degree of ILD on HRCT. Pathological US findings were significantly more frequent in the group of patients with ILD than without ILD (B lines: 100% vs. 12%, p < 0.001; subpleural nodes: 55% vs. 17%, p = 0.006; pleural line thickening: 95% vs. 12.5%, p < 0.001). Subpleural nodules were present ultrasonographically in 100% of patients with AR versus 22% of SSc patients (p = 0.003) and 50% of SLE patients (p = 0.049) with ILD. An irregular pleural line > 3 mm was documented in 100% of SSc and SLE patients with ILD, compared to 86% of ILD patients with RA.

Pinal-Fernandez et al. [35] proposed pleural irregularities as a new LUS sign for ILD detection in patients with SSc and antisynthetase syndrome (ASS). In their study, all patients performed HRCT, PFTs, DLCO and LUS. US pleural irregularities and B lines were evaluated using a score of 72-LUS, while lung abnormalities in HRCT were quantified by the Warrick score. Thirty-seven patients were included (21 with ASS-2 without ILD and 16 with SSc-6 without ILD). This study reported a positive correlation between the LUS pattern of pleural irregularities and a Warrick score in both SSc and ASS patients, showing superior performance in detecting ILD with pleural irregularities compared to the use of B lines.

Recently, Vasco et al. [36] studied the accuracy of LUS to diagnose ILD in Sjögren’s syndrome, in patients with PFT alterations or respiratory symptoms. LUS was correlated with chest HRCT showing a sensitivity of 1 (95% CI 0.398–1.0), a specificity of 0.89 (95% CI 0.518–0.997) and a positive probability of 9.00 (95% CI 7.1–11.3) to detect ILD. LUS achieved an excellent correlation with HRCT data in Sjögren’s syndrome with ILD.

Table 2 shows the feasibility, reliability, sensitivity and specificity of LUS, while the technical characteristics of LUS and the type of score used in all studies involved in the review are represented in Table 3.

Table 2 Feasibility, reliability, sensitivity and specificity of US in the studies included in the review
Table 3 Technical characteristics of US and types of scoring used in the studies involved in the review

Discussion

Identification and quantification of early manifestations of ILD in CTDs are an important objective to improve the quality of life of patients affected by CTDs and for prognosis [1, 37,38,39,40,41]. In this regard, LUS has recently demonstrated a high sensitivity in the detection of signs indicative of ILD, even in the early stages of the disease and, especially, a high negative predictive value. LUS has therefore been proposed as a potential screening tool in ILD evaluation. HRCT remains the gold standard in ILD identification; however, LUS, due to the absence of ionizing radiation and its simplicity of execution, even in the patient’s bed, can be considered as an excellent screening tool. LUS has demonstrated encouraging validity, reliability and feasibility, and currently, it could be considered as an excellent methodology to establish the correct timing of HRCT in ILD assessment.

How to behave in the preclinical stages of disease, however, remains a topic of debate [30]. In this regard, a recent study conducted by our group on the diagnostic possibilities of LUS in the detection of subclinical ILD in 133 patients with SSc revealed that 40.6% of patients with SSc show signs of subclinical ILD, compared to healthy controls (4.8%) (p = 0.0001). Preliminary data demonstrate that the sensitivity and specificity of LUS in detecting even greater ILD are 91.2% and 88.6%, respectively [42]. Interesting the “cutoff point” of ten B lines to detect the presence of SSc-ILD proposed as a reference on the basis of which to send patients to perform a HRCT [29].

Despite the progress of studies in the literature, there is still much to be done, and some crucial points need to be discussed. In the first instance, the studies currently available (Table 1) have included small cohorts of patients. Secondly, how to perform a standardized ultrasound examination of the lung should be precisely defined, particularly in terms of which and how many LIS to assess. Currently, the number of LIS used in the studies is very variable, from 10 to 72 (Table 2). Thirdly, there is no consensus on the semiquantitative scoring system for ILD quantification.

On the other hand, new possibilities of interpretation of pathological findings are emerging, in particular with regard to pleural abnormalities. Pleural abnormalities appear to be adequately related to specific HRCT findings such as ground-glass opacity and extensive fibrosis and could therefore be considered an ILD imaging biomarker [15]. Moreover, in the future, there will be an increasing interest in pocket-sized ultrasound machines, which can be used to evaluate lung diseases without the need for top of the range machines to acquire reliable information.

In conclusion, this review showed how LUS can become an important technique for assessing lung disease in rheumatic diseases with suspected lung involvement. Applications of this imaging method are still growing, and further opportunities are likely to arise in the light of the vibrant field of research.