Introduction

Systemic sclerosis (scleroderma, SSc) is a disease characterized the complex interplay of vascular injury, immune system activation, and fibrosis. Of the autoimmune connective tissue diseases, SSc has the highest case specific mortality rate, with half of patients dying as a result of pulmonary or cardiac causes [1, 2]. SSc is heterogeneous. Some patients experience rapidly progressive, fatal disease, and others have a benign course. SSc is divided into two subtypes: (1) diffuse cutaneous (dc)SSc characterized by rapidly progressive widespread (diffuse) skin thickening and early internal organ involvement, and (2) limited cutaneous (lc)SSc with slow accumulation of organ manifestations. Diffuse patients have a higher risk of death, but both subtypes experience internal organ manifestations and morbidity from the disease. Although there have been substantial advances in our understanding of pathogenesis, the cause (or causes) of SSc is incompletely understood at this time.

The study of treatment of patients with SSc is challenging for multiple reasons. SSc is an uncommon disease with a prevalence rate of 276 cases per million adults [3]. It is heterogeneous with different clinical phenotypes and rates of progression, and clinicians are limited in their ability to predict risk of future organ system complications and prognosis. While clinical outcome measurements for specific disease manifestations such as the degree and extent of skin involvement as assessed by the modified Rodnan skin score (mRSS) have been validated and are accepted by the research community [4], outcome measures for different aspects of the disease are not as accepted. Manifestations of SSc may improve or worsen slowly and in some systems stabilization compared to clinical worsening may be a desirable outcome. To distinguish these outcomes, clinical trials of long duration may be necessary. Finally, although a focus of some researchers has been the development of biomarkers to be used as surrogate endpoints to decrease the duration of trials, such surrogate markers are not presently accepted as substitutes for clinical endpoints. The objective of this paper is to review challenges that clinical investigators face as they attempt to design and conduct studies of targeted therapy in SSc.

Clinical Trial Design

A well-designed, randomized clinical trial provides the highest level of evidence for the management of patients and the development of practice guidelines. However, there are potential pitfalls in designing clinical trials. (See Table 1) These include patient selection, choice of primary and secondary outcomes (Table 2), the intervention or therapy, randomization or blinding method, and power calculations to determine sample size.

Table 1 Challenges to clinical trials in SSc
Table 2 Outcome measures in systemic sclerosis trials

Patient Selection

The choice of eligibility criteria is fundamentally important in defining the patient population to be studied. This involves a balance between identifying patient groups likely to demonstrate an effect of the intervention and patient recruitment. In SSc it can be difficult to identify the at risk population for disease worsening, particularly with respect to trials where skin or interstitial lung disease (ILD) parameters are the primary outcome. While there are externally validated models to predict mortality in early diffuse SSc [5•, 6•], there are no externally validated models to predict those likely to have progression of skin thickening or clinical worsening from ILD.

The low incidence of SSc also creates a significant hurdle to designing clinical trials. This necessitates a multicenter or multinational approach to achieve sufficient power. Complicating this is that SSc patient phenotypes, even with disease subtype, differ between different geographic regions, particularly with respect to autoantibody profile, which varies greatly across continents [7, 8, 9•]. It is recognized that there are clear associations between autoantibodies and both cutaneous subtype and internal organ involvement, such as anti-RNA polymerase III with diffuse disease and scleroderma renal crisis (SRC) [10], and anti-Scl70 with diffuse SSc and ILD [11]. It is our opinion that all clinical trials collect information on autoantibody profile using gold standard testing. We predict that stratification by autoantibody will be necessary in dcSSc clinical trials.

Clinical Outcome Measures

Skin Disease

The mRSS is an assessment of skin thickness which is used standardly as the primary efficacy outcome in skin-focused trials in SSc. It is calculated by adding skin thickness assessments (0 = normal, 1 = mild thickening, 2 = moderate thickening, 3 = severe thickening) in 17 different anatomic surface (maximum 51) and has been shown to correlate with histologic evidence of dermal thickening [12]. The mRSS is also used as a secondary outcome measure in many SSc trials. The MRSS has been shown to be a reliable outcome measure based on inter and intra-observer reliability [13]. Moreover, higher mRSS has been associated with worse outcome and increased risk for cardiac involvement and SRC. Improvement in MRSS is associated with improvement in hand function, inflammatory indices, joint contractures, arthritis signs, and overall functional ability [13]. The rate of skin thickness progression has also been shown to predict short-term mortality [6•]. Certain body sites (hands, forearms, and chest) have been observed to be more sensitive to change compared with other body sites (lower extremities, face, abdomen, and fingers) [14]. It is possible that excluding relatively static areas of skin thickness would further increase the sensitivity of the mRSS to change over time.

Pulmonary Disease

Interstitial lung disease (ILD) is a major cause of morbidity and mortality in SSc [15]. The reported prevalence of ILD is variable since there is no standard definition or method of detection of ILD in SSc [16]. When pulmonary function tests (PFT) are used as the primary definition of ILD, there is a risk of underestimating the frequency of SSc-related ILD. Radiologic evidence of ILD should be included in the definition. In the assessment of ILD treatment, PFT test parameters including the forced vital capacity (FVC) and the diffusion capacity (DLCO) are frequently assessed. A clinically significant change in SSc-ILD has not yet been defined. Stability in these parameters can be interpreted as a treatment effect. This leads to difficulty in defining a meaningful PFT outcome measure in SSc-ILD trials. More recently, changes in quantitative HRCT measures of ILD have been shown to provide a sensitive indication of disease progression and response to treatment, and this will likely be used increasingly as an objective outcome in clinical trials. The importance of patient-reported dyspnea for assessing prognosis and disease progression in ILD is well recognized. According to recent recommendations from the OMERACT CTD-ILD group the Dyspnea 12 [17] and the Medical Research Council Dyspnea Scale [18] are the “best currently available instruments in CTD-ILD” [19]. Both have been demonstrated to have truth (face and criterion validity) and feasibility. The clinical course of patients with SSc-ILD is variable. The variability of progression in SSc-ILD means that large randomized trials are necessary, and is important to consider in power calculations for SSc-ILD clinical trials.

Vascular: Raynaud Phenomenon and Digital Ulcers

Raynaud is the most common symptom directly attributable to SSc and demonstrated to have the second highest impact on daily life [20]. Raynaud has been assessed in clinical trials with a combination of patient-reported and objective outcomes. In 2002, the Scleroderma Clinical Trials Consortium (SCTC) proposed a core set of measures for use in clinical trials of RP in SSc patients including the Raynaud condition score (RCS), patient and physician visual analog scales (VAS) ratings of RP activity, a digital ulcer/infarct measure, measures of disability and pain (HAQ), and measures of psychological function (AIMS2). This was further reviewed by OMERACT in 2003 and endorsed by the SCTC in a Delphi exercise [21]. However, these measures have been used to different degrees in clinical trials. The most frequently used have been the RCS and Raynaud VAS. The RCS is a self-assessment tool using a 0–10 ordinal scale to examine daily frequency, severity, impact, and duration of Raynaud attacks. The Raynaud VAS asks the patient is asked how severe Raynaud symptoms were in the last week. The reliability and validity of both the RCS and the Raynaud VAS scale have been previously demonstrated [22]. Another commonly used outcome measure has been the daily diary, which consists of daily logs assessing the number of attacks, duration of attacks, patient pain, patient numbness, patient tingling, and RCS. The diary has been favored by some as it is patient-reported and can be used to assess symptoms over a longer period of time (for example, up to 14 days). More recently, a key paper [23•] analyzed the outcome measures in the placebo groups of 3 RCTs investigating Raynaud drugs. They observed a high placebo response rate in the individual components, with lower variability and placebo responses when several core measures were used. This led them to suggest a composite score of Raynaud measures, but this remains in need of further assessment of potential discriminatory characteristics.

Several methods of digital blood flow have been evaluated and applied to proof-of-concept studies. Mainly, these have included thermography and laser Doppler perfusion imaging (LDPI). Thermography is expensive and requires tightly controlled environmental conditions, which poses a practical barrier for clinical trials. LDPI has been successfully used for proof-of-concept and open-label studies with several medications with positive results [24]. Two recently published placebo-controlled RCT showed improvement with ambrisentan based on RCS [25] or frequency and severity of Raynaud attacks [26], but no change in digital blood flow in short-term studies. Conversely there are proof-of-concept studies demonstrating improvement in LDPI imaging but no change in pain, tingling, or numbness scores [27]. Taking this real-world clinical trial experience in aggregate, we believe that the current subjective and objective assessments do not correlate well and that further investigation is needed. Of note, more novel imaging includes the use of laser speckle contrast analysis (LSCA) which assesses cutaneous microcirculation. An advantage of LSCA imaging compared to LDPI in SSc patients is that LSCA is not dependent on capillary density. LSCA has been studied in SSc patients, who demonstrate a unique pattern compared to healthy controls [5•, 28, 29]. Pauling et al [30] demonstrated moderate to good correlation between LSCA and infrared thermography (r = 0.58–0.84) [30]. This is a promising new technique with application to SSc-associated microvascular clinical outcomes, but is in need of further development.

Gastrointestinal

Approximately 90 % of patients with SSc have gastrointestinal involvement, and these problems contribute significantly to morbidity [31, 32]. The earliest and most common GI manifestation in SSc is esophageal disease including reflux and dysmotility, but any part of the GI tract may be involved in SSc. Severe GI involvement including malabsorption and intestinal pseudo-obstruction is seen in less than 10 % of SSc patients, but is associated with increased mortality [33]. Although small SSc-specific clinical trials have been performed for various aspects of SSc-GI disease, much of the therapy literature includes case-reports or small case series. The relative paucity of high quality literature in this area relates to the heterogeneity of SSc-GI manifestations in addition to the uncommon nature of SSc. Multiple objective outcome measures are possible in GI disease, depending on the involved section of the gut. Examples include impedance monitoring, manometry, or pH monitoring for esophageal motility. However, the number of studies looking at these outcomes is small, and these measures need to be validated in SSc to be considered for use in a large-scale clinical trial [34•].PROs which assess SSc-GI involvement have been developed. The UCLA Scleroderma Clinical Trial Consortium (SCTC) GIT 2.0 includes 34 items and 7 multi-item scales (reflux, distention/bloating, diarrhea, fecal soilage, constipation, emotional well-being, and social functioning) and a total GIT score to assess HRQOL and GI symptom severity. The GIT 2.0 has been shown to be feasible, valid and reliable, and discriminates between mild, moderate, and severe disease [35]. It has been used both in observational studies [36], in small clinical trials [37] and in clinical trials with non-GI primary outcomes, as a way to test for GI side effects [38]. The GIT 2.0 is a well-developed patient-reported outcome measurement primed for use in GI-specific clinical trials.

Pulmonary Hypertension

Pulmonary hypertension studies in SSc have used outcome measures similar to those in non-SSc pulmonary arterial hypertension trials. The primary endpoints in PH trials are usually surrogate measures, with the 6-min walk test (6MWT) being the most common [39]. One criticism of the 6MWT in SSc patients is that the result can be affected by other SSc complications, such as lower extremity musculoskeletal disease which limits ambulation. Although hemodynamic measurements have been proposed as an outcome, and may be an informative secondary outcome measure, the baseline variability of these measurements in small studies (such as in SSc-PAH) has made it difficult for them to be used as primary outcome. More recent studies in PAH have used combination endpoints, such as time to clinical failure (hospitalization, death, or worsening PH) [40, 41] as the primary outcomes. It seems likely that this practice will be adopted in future SSc-associated PH trials.

Other SSc Manifestations

There are several aspects of SSc that are less frequently studied. These include calcinosis, myopathy, arthritis, and cardiac disease (excluding pulmonary hypertension.) There are different reasons for less frequent study of these disease manifestations, and again, figuring importantly is rarity and heterogeneity as well as paucity of validated outcome measures. The SCTC has devoted working groups to address these clinical problems and to develop outcome measures [42].

Composite Measures

A composite response index in SSc (CRISS) has been developed for use in clinical trials using expert consensus and data driven approaches [43••]. The CRISS can discriminate between dcSSc patients who have improved and those who have not over one year, validated against SSc expert opinion of 150 patient profiles. The CRISS is composed of change in MRSS over 1 year, FVC, the Health Assessment Questionnaire disability index, and patient and physician global assessments. The CRISS is currently considered provisional, and is planned to be validated in RCTs of diffuse SSc. It may prove to be a valuable diffuse SSc outcome measure. Under development is a scleroderma damage index through the SCTC, which may help to quantify permanent disease burden in the future [44].

Biomarkers in SSc

Biomarkers are characteristics that are objectively measured which can be used to detect disease, provide prognosis of disease course, or evaluate response to treatment [45]. Serologic biomarkers are used in SSc in varying contexts [46•]. The use of biomarkers in clinical trials in SSc is a topic of great interest, but their utility is not clearly defined at present.

Autoantibodies are used in both diagnoses cutaneous subtyping and assessing risk of internal organ involvement. There are ten confirmed SSc-associated autoantibodies. Three antibodies—anti-centromere (ACA), anti-topoisomerase I (TOPO, anti-Scl-70), and anti-RNA polymerase III (RNAP), are part of the 2013 ACR/EULAR classification criteria for SSc and are widely clinically available [47]. They also provide information regarding prognosis with respect to the development of specific clinical features as well as mortality [48]. RNAP and TOPO are associated with a diffuse cutaneous involvement. Patients with TOPO are at high risk of developing clinically important ILD, and those with RNAP are at specific risk for developing severe skin disease and SRC. ACA is associated with limited cutaneous involvement as well as with an increased risk of the development of PAH [10]. Description of autoantibodies in SSc has been reviewed in depth in other publications [5•, 49]. From the perspective of their use as biomarkers, they are useful in diagnosis and in risk stratification. At present, these autoantibodies do not have a role as a measure of disease activity or response to therapy.

C-reactive protein (CRP) is a general inflammatory marker used in the context of multiple rheumatic diseases. A high CRP in patients with SSc has been associated with the diffuse cutaneous subtype as well as more severe skin and lung involvement. Higher CRP has been associated with progression of ILD and decreased survival, and thus CRP may also be biomarker of prognosis [50]. The utility of CRP as a secondary endpoint or as a way to enrich for disease activity in clinical trials is under evaluation.

Newer, investigational biomarkers include gene-expression profiles derived from the skin or peripheral blood. Gene-expression profiling on whole blood has shown that patients with SSc have a type 1 interferon signature that correlates with the severity of skin, lung, and muscle involvement [51, 52]. In SSc skin, there are distinct gene-expression profiles dubbed “intrinsic subsets” which may associate with differential response to treatment [53, 54]. Gene-expression profiles [55] apart from intrinsic subsets of genes as assessed by whole genome microarray may change with treatment and subset assignment may change over time. It is possible that patients with an inflammatory signature in the blood may have a better response to immunosuppression, but more work is needed before these profiles can be used in clinical trials as more than an exploratory outcome [56, 57].

Gene expression can also be measured by quantitative PCR or custom nanostring with expression levels of THBS1, MS4A4A, CTGF, CD163, CCL2, and WIF1 correlating strongly with MRSS in independent studies [58•]. This concept was used successfully in the context of a recent clinical trial where treatment with fresolimumab, an anti-TGFβ monoclonal antibody, led to a rapid decline in the expression of thrombospondin1 (THBS1), a TGFβ-regulated gene [59••]. Other TGFβ-regulated genes such as cartilage oligomeric protein (COMP), SERPINE1, and CTGF declined as well. This decline in gene-expression correlated with MRSS, and may prove to be an objective read out in future clinical trials.

Proof-of Concept and Early-Phase Clinical Studies

Early-phase clinical studies in SSc need to demonstrate safety as well as to provide enough preliminary data to justify larger and more expensive trials. Open-label trials are easy to recruit because the patient is certain to receive active treatment. Such trials can provide important safety data as well as biological samples from which target engagement of the interventional therapy can be determined. Open-label studies provide information on potential effect size if further drug development is pursued, but do not answer the question of efficacy. Placebo-controlled studies are necessary for this, even as they can be hard to gauge efficacy with small numbers.

Engaging Patients in Placebo-Controlled Studies

Studies with a placebo control can be difficult to recruit, particularly as an ideal trial design for SSc skin involves patients with very early disease. Patients may prefer to try more conventional therapies, entering a placebo-controlled clinical trial of an investigational drug only if they experience worsening or no improvement with first-line therapies. In small pilot studies, the presence of a background therapy or actively treated control may dampen changes in clinical or biological effect and thus to truly learn from these studies, a placebo control is needed.

The ethical use of placebo treatment in SSc is important to consider and differs depending on the organ systems involved. For example, there have been two randomized, placebo-controlled, double blind trials studying the efficacy of methotrexate for the treatment of skin disease. Neither trial demonstrated a statistically significant improvement in skin score, although trend toward improvement in the treatment group was observed [60, 61]. Based upon these and other data, The European League Against Rheumatism (EULAR) endorses the use of MTX for skin involvement in early SSc [62]. Mycophenolate has been used increasingly in the treatment of skin disease, and its benefit has been shown in small observational studies [6365] mostly with a larger effect size than reported with MTX. However, no placebo-controlled RCT has been performed, and it is unlikely to be performed in the future. Given this lack of clear evidence, a placebo control arm in a clinical study for the treatment of skin disease is ethical because no clearly proven effective intervention exists. However, clinicians are accustomed to offering immunosuppressive treatment for patients with SSc, and patients may expect such treatment. This can lead to a complex shared decision-making process when a placebo-controlled trial is considered.

Approaches to improve patient acceptance of placebo control studies include an escape plan for treatment of those with progressive disease, a placebo period followed by open-label extension, and a cross-over design (appropriate only for certain manifestations of SSc, such as RP). Another approach is to allow active background therapy. However, this may make it more difficult to read biological signals. For other SSc manifestations, e.g., ILD, benefit of immunosuppressive treatment over placebo has been shown, and trial design with placebo control would not always be necessary. Although ambivalence regarding the degree of treatment effect versus side effects still exists, and so a placebo-controlled trial, even in ILD, may be considered in some scenarios.

Conclusions

In 1995, White et al. published guidelines to improve the design of clinical trials of disease modifying agents in SSc [66]. Recommendations included recruitment of patients with less than a 24-month duration of disease from first symptom attributable to SSc, preference for double blinded RCTs, and need for trials of sufficient duration. They urged that response measures accurately reflect disease activity and be sensitive to change, and that surrogate responses be desirable but not necessarily validated. The recommendations of that group still hold true. Despite considerable progress in the last two decades, work is still needed on response measures and in development of surrogate responses. In the last several years, our understanding of cellular and molecular mediators of the pathogenesis of SSc has improved greatly, and the number of options for targeted therapies has increased. This has been greatest for patients with early diffuse cutaneous disease. This has brought about a new but important challenge—competition for early stage diffuse SSc patients, as there are several trial options for each individual patient. This heightens the importance of designing and executing clinical trials well as an imperative to the scleroderma research community.