Keywords

Introduction

During the last century, the evidence base of treatment for patients with myasthenia gravis (MG) has been derived largely from retrospective case series, open-label studies, and expert opinions. More recently, evaluation of therapeutics for MG has become much more rigorous using randomized clinical trial methodologies, and several phase III trials have been successfully undertaken. Advances in immunotherapeutics and the biological basis of MG have accelerated the number of drugs under development for MG leading to a greater necessity to optimize approaches for evaluation of new treatments. Clinical trials are rigorously constructed experiments designed to restrict confounding variables beyond the influence of the therapy under evaluation while attempting to mimic standard medical practice. There is limited value to a trial that may be so controlled that it could not be reproduced in a typical outpatient or inpatient setting. Internationally accepted guidelines have been established for clinical trial performance and reporting [1]. This chapter will provide a broad overview of clinical trial development for MG based on general principles and experience from completed trials.

Study Rationale

The heart of clinical trial development lies with the question, “what compelling reason justifies the expenditure of time, effort, and money as well as the exposure of subjects to harm including potential for death?” Present therapies (see Chap. 11) have significant adverse effects and upward of 30% of patients are treatment-resistant with current therapies. MG is not the nearly uniformly lethal condition that it once was, but death may occur with myasthenic crisis, which one third of patients still experience [2, 3]. Need exists for improved therapeutics with an ultimate goal for a cure. Further, advances in understanding the mechanisms of autoimmune disorders are leading to therapies that more rationally target MG pathology providing a clearer rationale for specific trial performance.

Trials must have the expectation to improve present therapies based on improved efficacy or reduced adverse effects . Given the effectiveness of prednisone but its numerous complications, a focus of clinical trials has been to evaluate immunosuppressive agents for their corticosteroid-sparing effect while achieving improvement in clinical manifestations. Despite excellent biological and clinical rationale, the failure of trials of mycophenolate, tacrolimus, and methotrexate for generalized MG [4,5,6,7] suggests a need to more precisely determine optimal outcome measures and trial design.

Trial Design

The four phases of clinical trials—phases I, II, III, and IV—present their own unique processes and challenges for patient identification, enrollment, and retention. Phase I trials are the first investigation of a potential new drug to determine the mechanism of action and often pharmacokinetic properties and safety . This phase is most commonly conducted in a small number of patients, can be conducted in healthy patients with the target disease, and can last several months. Phase II and III trials focus on the safety and then effectiveness of the drug and are conducted on patients with the target disease. Phase II trials are meant to determine the short-term side effects and identify safety risks associated with the investigational drug as well as providing some information on dose and efficacy either through a biomarker or clinical endpoint. These trials can last several months to 2 years. During phase III trials, the obvious side effects are known, and the drug is compared against a placebo or drug designed for the target disease currently on the market. Phase III trials are designed to provide more complete safety information on a larger population to uncover rarer negative effects to balance against efficacy in assessing benefit versus risk. Phase III trials can last up to 4 years or longer as in some cancer trials. Phase IV studies are post-marketing trials to examine longer-term safety issues and sometime durability of the treatment effects and usually last several years in duration.

Phase I

Phase I studies focus on dose finding and schedules of administration in healthy humans based on pharmacokinetic and pharmacodynamic properties . They examine characteristics of response as well as the immediate safety of a treatment. At times phase I trials are only performed with subjects with the disease of interest because of the potential toxicity of the agent not being justified for testing in a healthy individual. Most studies are based on the assumption that the more agent, the greater the response. Often a number of animal studies have been used to inform dose ranging in humans and assume some direct generalization from animal to man. Some new considerations for these designs have been so-called adaptive designs. A few newer designs adjust the dose based on response as well as toxicity and side effects, such as continual reassessment method or CRM. Many past efforts have evolved examining dose response, but newer agents may have varied effects at differing doses. They challenge the assumption that more is better on the efficacy side of the equation, which requires many of the dose escalation designs to be reconsidered. Without the monotonic assumptions on dose, there may be substantial increases in the sample size even in these early studies. To date for MG, therapeutics have all gone through phase I to III evaluations in other disorders prior to evaluation for MG.

Phase II

Phase II studies serve as proof-of-concept evaluations for preliminary safety and estimates of treatment efficacy . Phase II trials offer greater flexibility in design and especially newer designs than do phase III trials. Phase II trials have multiple goals and even an exploratory component to them compared to phase III where the central focus is on demonstrating predefined effectiveness and safety. Phase II trials can be designed to define the best dosage, estimate efficacy to plan a phase III trial, identify the presumed safest dose, examine the responsiveness of various endpoints, or identify target populations most likely to respond. Phase II designs may use biomarkers as outcomes or purported surrogate outcomes, where phase III designs usually have as the primary outcome a clinical measure. In MG, phase II trials have nearly all had straightforward designs as randomized, double-blind controlled investigations comparing active drug versus placebo [5, 7,8,9,10,11,12,13]. Some trials have compared treatments [14,15,16,17]. Safety is an important consideration for all of the trials but of primary importance for phase I and II trials.

Phase III

Phase III trials are considered pivotal in that their goal is to alter practice, are generally directed at so-called clinically meaningful endpoints, and are usually large multicenter trials. Trials have generally been designed to demonstrate superiority of therapy compared to placebo. Superiority trials aim to demonstrate that one treatment is better than another. Non-inferiority trials attempt to show that one treatment is not worse than another, and equivalence trials attempt to show that a treatment is neither worse nor better than a standard treatment. For MG, investigations of mycophenolate, azathioprine, and methotrexate [6] were designed as superiority trials with placebo controls, but there was no evidence of a difference between groups. MGTX compared thymectomy plus prednisone versus prednisone alone and demonstrated the superiority of the surgical arm; however, if the outcomes were equivalent, common practice would have changed since thymectomy could no longer be justified.

Randomization

Randomization is a key requirement for treatment trials and serves to control for potential confounding factors by completely unbiased allocation of treatments to participants. These confounding factors range from the obvious such as gender and age to much more challenging factors, which include genetic makeup and environment. As medicine enters the era of whole genome sequencing, even more information will be available to define an individual’s phenotype, and methods will be needed to incorporate such considerations into trial assessment beyond the bucket or basket and umbrella approaches of today. The bucket design tests the effect of one or more drugs on one or more single mutations in a variety of diseases versus the umbrella designs which test the impact of different drugs on different mutations in a single disease. This becomes additionally complicated when one considers ongoing advances in the understanding of the microbiome of humans and its potential influence on disease and alteration by therapeutics.

Entrance Criteria

Trials must consider a host of decisions that impact both the generalizability of results and the likelihood of successfully performing an investigation. There is often a trade-off between generalizability where the more heterogeneous the sample, the wider the results can be applied and the desire for homogeneity in patients to minimize extraneous variability thereby making it easier to see treatment differences. This dilemma is often faced in designing the specifications of the study population.

Diagnostic Specificity

Assuring that subjects have the diagnosis of MG is a critical first step in trial design. All subjects must have typical clinical manifestations of MG, including objective weakness, and studies to date have required elevations of serum autoantibodies, essentially always the acetylcholine receptor (AChR) binding antibody, as entrance criteria. Recent phase II studies have included a mix of AChR antibody and muscle-specific kinase (MuSK) antibody-positive subjects. Both these tests are highly specific for MG and simple to obtain, which is the rationale for assuring diagnostic specificity. Because of variability in performance of AChR antibody assays, the thymectomy plus prednisone versus prednisone alone (MGTX) trial used a higher cutoff level than set by commercial labs to allow entrance into the study. In contrast, repetitive stimulation and single fiber EMG require specialized personnel to perform and have unknown reliability dependent on examiner skill making them a challenge to use for diagnostic confirmation for a trial.

Investigations of ocular myasthenia face a particular challenge in that upward of 50% of patients with the disease do not have AChR antibodies detected by the standard radioimmunoassay. Trials have thus far relied on identification of a characteristic clinical presentation and exclusion of other diagnoses as an entrance requirement with the addition of at least one confirmatory test including a positive response to a cholinesterase inhibitor, elevation of serum AChR antibody, decremental response to repetitive stimulation, or abnormal single fiber EMG [18]. Therefore, ocular myasthenia trials would be expected to have a greater number of inclusion criteria than generalized MG trials.

Therapeutic Target Decisions

As described in several chapters of this text, MG is a heterogeneous disorder with differences based on autoantibody status, age of onset, association with thymoma, and clinical presentation (ocular vs. generalized) [19,20,21]. Genetic factors associated with pathogenesis and treatment response are beginning to be defined [22,23,24,25]. The critical aspect for trial design is that these subtypes are likely to have a differential response to therapeutics. For example, the primary effector mechanism of AChR antibodies is through activation of complement, while MuSK antibodies induce disease mediated by IgG4, which does not activate complement [26]; thereby eliminating MuSK patients from trials of complement inhibition may be appropriate. The thymus of MuSK antibody patients does not show the characteristic hyperplasia of early-onset AChR antibody-positive patients, which again brings into question thymectomy as a treatment for this group as well. Patients with ocular manifestations may remain with ocular symptoms or develop generalized weakness. These distinctions suggest variations in pathophysiology from patients with initially generalized disease as well as a need for weakness-specific outcome assessments, i.e., double vision and ptosis vs. generalized weakness. Persistent symptoms among patients with double vision may be a function of the requirements of the ocular motor system to maintain precise ocular muscle alignment even in the face of a largely suppressed immune attack [27]. Statistical designs often assume a common response to therapy, and thus potential differences in response to patient subtypes should be considered, and those likely not to respond or those who likely will respond differently may be eliminated via inclusion or exclusion criteria.

Another aspect of target engagement relates to the expected duration for a biological response . An appreciation of the expected biological effect of a drug is required to determine study duration. The failure of trials of mycophenolate and tacrolimus for generalized MG was at least partially related to the trials being too short for a meaningful reduction of circulating lymphocytes to occur as well as not adequately dealing with the durability of prednisone treatment [4, 7, 28,29,30]. The expected mechanism of action of an intervention is critical for trial design and is a review criterion for NIH in assessment of clinical trials for funding.

Age and Gender

Clinical trials for MG typically restrict enrollment to individuals over the age of 18 years with a variable older age cutoff. No restrictions have been made based on gender. However, gender and age may be factors in underlying pathophysiology (see Chap. 3) [21]. At present there is no compelling evidence that existing therapeutics have the potential for a differential effect based on gender. However, given the differential occurrence between men and women over the life span, it may simply be that sufficient sized trials have yet to elucidate the differences in response. Clinical subtypes of MG are characterized and grouped into early- and late-onset with a poorly defined dividing line of 45–60. Thymic pathology observed among patients with MG differs among early- and late-onset patients, which suggests fundamental differences in pathogenesis and therefore potential difference in response to therapeutics. Adverse effect profiles are likely to differ based on age and gender. This must be considered carefully in the design of MG trials.

Disease Severity

A critical consideration is the level of weakness for trial entrance. Therapeutic trials for MG have been performed almost exclusively on generalized MG patients with MGFA classifications of II–IV. No investigations beyond retrospective evaluation of myasthenic crisis (MGFA classification V) have been performed with the exception of evaluations of IV Ig or plasma exchange, and none of these were performed in a randomized, controlled fashion [14, 31,32,33]. Given the uniform clinical agreement of plasma exchange’s efficacy for severe MG, it is unlikely a trial can be performed in an ethical fashion from the practicing physician’s perspective [34]. From the societal perspective, which is often grounding the FDA’s perspective, exceptional efforts may be needed to design ethical trials [35]. Outcome measures, such as the Quantitative MG (QMG) score and MG Composite, are now also used to set a level of weakness for study entrance with the MG Foundation of America Clinical Research Standards Committee setting a QMG score of 12 or greater being used generally as the minimal severity of disease for a clinical trial [36]. Designs need to consider floor effects as well as ceiling effects when selecting patient populations. The critical importance of disease severity is illustrated by the negative results of a trial of tacrolimus, which entered subjects in minimal manifestation status, a population that likely would be difficult to demonstrate a treatment effect [28]. Further, if too stringent criteria for weakness are used, the trial may have excessive regression toward the mean as a result of the high hurdle to qualify, thereby overaccentuating the benefits. Whether there are fundamental differences to be expected in treatment response based on severity of weakness, as assessed by QMG or other measures, is an important question to consider in design. Further, the outcome scales are nonlinear ordinal scales, and, therefore, a three-point improvement in QMG from 20 to 17 versus 3 to 0 has different clinical and biological significance.

Exclusion Criteria

Exclusions for MG trials are those typical for any disease based on significant coexistent medical or psychiatric disorders for safety considerations, competency for informed consent, and likelihood that the participant can follow the instructions of the trial. MG trials typically have excluded individuals with history of thymoma. The basis for this exclusion presumably lies with concerns of an immunosuppressive agent leading to recurrence of tumor and the observation that thymoma-associated MG patients have greater weakness and are more likely to be treatment-resistant [37]. This exclusion criterion may needlessly eliminate a subgroup of patients for involvement in many therapeutic trials, when the decision to exclude is based on convention rather than a specific biological risk. In the clinical setting, treatment approaches for thymoma-associated MG are often identical beyond the obligatory indication for thymoma resection and monitoring for tumor recurrence.

Outcome Measures

Objective outcome measures are of critical importance for all trials, and the last decade has seen efforts to more precisely develop outcome measures specific for MG (see Chap. 19) [38,39,40]. For MG trials, a grading of weakness severity and a reduction of cumulative corticosteroid exposure have been used as primary outcome measures [36, 41]. The QMG, manual muscle testing, and modified Besinger’s score are examples of simple ordinal scales that have been used [12, 16, 42,43,44,45,46,47] with the QMG being the most extensively studied and used in clinical trials [30, 44, 46, 47]. A three-point reduction in the QMG has been considered to be clinically significant [47].

In the last 15 years, the FDA has placed greater emphasis on patient-reported outcomes and is further refining expectations of definitions of a positive clinical response [48]. An analysis of outcome measures of a trial of mycophenolate mofetil for generalized MG revealed that the MG-ADL could serve as a reliable substitute for the Quantitative MG score and be easier to administer [49]. A prospective study suggested that a two-point reduction on the MG-ADL was clinically significant [38]. A phase III trial of eculizumab for treatment-resistant MG used the MG-ADL scale as the primary outcome measure, and while multiple secondary outcome measures, including the QMG and MG Composite, were significantly improved in the treatment group, the MG-ADL showed no statistical difference between treatment and control despite all secondary outcome measures being significantly different [50]. At the time of this writing, the results have not appeared in a peer-reviewed format, and therefore, explanations for this discrepancy are not immediately apparent.

A Task Force of the MG Foundation of America with international representation has recommended the MG Composite as the preferred quantitative measure for assessment of changes in subject response for generalized MG. The MG Composite is a mix of examination and patient-reported measures and is easily administered [40, 51, 52]. The scale has thus far not been used as a primary outcome measure in a clinical trial.

Reduction of corticosteroid treatment has been used as a primary outcome measure in several trials. The principle that underlies the use of steroid sparing as a primary outcome measure is its importance as a safety measure. The adverse effects of corticosteroids are so severe that therapies limiting their use would be beneficial for patient care [53]. Investigations of azathioprine and mycophenolate assessed the difference in corticosteroid dose at each assessment time over the course of the study [4, 54]. Steroid sparing was demonstrated in the azathioprine study but not until the 18-month time point, while the 36-week-long mycophenolate study was negative. In contrast, the MGTX trial used a measure of prednisone dose over the 3-year study, an integrated assessment, which reflects the cumulative exposure of prednisone. MGTX also assessed the QMG score over time to assure and found subjects to be on a lower cumulative dose of prednisone but also had improved cumulative MG scores [44]. An investigation of methotrexate for corticosteroid-sparing effect also used the area under the dose time curve as a primary outcome measure [6]. Comparison of these studies emphasizes the need for appropriate study duration and expected action of the therapeutic in study design [55].

Trial Duration

There is no reproducible process that reliably predicts the duration of a clinical trial. However, it is important to understand that all trials for MG have lasted longer than originally anticipated for reasons such as the variation in the durations of specific phases, rates of enrollment, or unanticipated events. In the MGTX trial [56], wide variations in the regulatory process were evident in start-up.

A major determinant of trial duration is the ability to enroll subjects. Recruitment rates for recent multicenter trials have varied from less than one to at best two subjects per month [6, 30, 44], and all studies initially overestimated the ability to identify and enroll patients for the trials. Inclusion criteria for some investigations, in particular the MGTX study, lead to a reliance on the incident rate of the disease, which for MG is extremely low, thereby extending the duration of recruitment.

The regulatory burden for trials is large and adds to trial duration. Over the past 20 plus years, a steady increase in the rigor and requirements of trials has occurred. In part, these increased efforts have been in response to poor practices. Good clinical practice (GCP) is an international ethical and scientific standard for design and performance and reporting of clinical trials that involve human subjects [57]. A thorough discussion of GCP is beyond the scope of this review but involves items ranging from traceable data input to source documentation to investigator training and competence in protocol compliance. GCP increases the cost and complexity of trials, and there has been little to no measurable impact on the quality of the outcomes or the information [58]. There is no class I evidence that all of the GCPs improve drug discovery or treatment outcomes.

In addition to GCP, other regulatory requirements and trial activities all conspire to increase trial duration. For example, the time for MGTX study centers to obtain full regulatory approval to recruitment was approximately 10 months for US sites and for non-US sites 13.5 months [56]. The difference related to non-US sites needing Federal Wide Assurance certification and State Department clearance along with ethics reviews, which can be more involved with surgical trials. An investigation of slightly more than 10,000 trials from 1999 to 2005 in a variety of phases found that procedural frequency had grown by nearly 9% annually and the mean number of inclusion criteria escalated threefold [59]. Burden of work for study sites increased by 10%. Investigations of phase III and phase IV protocols identified increases in study endpoints, procedures, and inclusion/exclusion criteria, while subject randomization actually decreased. There is a need to make trials as simple as possible without reducing the key processes for successful trial completion. While there is a call for more pragmatic trials, they often become much more complicated when the regulatory requirements and ethics committee considerations begin to encroach on the concept of pragmatic designs. It is then the role of the investigators, sponsor, and coordinating center to anticipate and minimize these potential challenges when designing the study.

Statistical Considerations

There are two major philosophies of statistical analyses to trial designs: the classical or frequentist and the Bayesian approach. Frequentist approaches utilize p-values to summarize how likely an outcome is from a conceptualization, so how often by chance would we observe the result obtained, if we repeated the same trial numerous times? Bayesian methods are models that take prior knowledge and based on an experiment update that knowledge yielding a degree of belief in a result. The two approaches do differ. The former is useful for setting up a straw man and finding results that refute it, whereas the latter is superior for estimation making optimal use of all available data. For example, a clinician considers a certain complex of symptoms, variable ptosis, double vision, and generalized weakness and determines that there is a high likelihood of myasthenia gravis. She then orders an AChR antibody test . This is a Bayesian interpretation of the analyses of the physical examination. She believes that there is a high probability of MG.

Bayesian approaches have intuitively familiar interpretations and in fact are a more natural interpretation. Bayesian methods do have another benefit. An interpretation that a result has a trend toward significance in a situation that the observed p-value is not significant is not appropriate within the frequentist hypothesis-testing paradigm. From a frequentist perspective, being close is one of the many outcomes that can occur by chance with a frequency that is not rare enough to be considered more likely to have occurred by the null hypothesis. The result is either rejection or failure to reject area and thus the outcome is binary. However, the concept of trending toward significance has more meaning in a Bayesian context.

The Bayesian approach uses information in a way that enables one to use prior information to make an assessment and then updates that assessment with each new piece of data. In terms of clinical trials, the idea of incorporating prior information into the final analysis seems natural. The fundamental question is whether one believes a trial is designed to find the best estimate of the treatment effect (the Bayesian approach) or whether the expectation is an independent demonstration of the effect of a treatment garnered from the prior work (the Frequentist approach). This perspective leads to greatly different views on the value of Bayesian statistics in phase III trials, whereas there is much less controversy regarding their use in phase II trials.

Adaptive designs are now under consideration for trials at several phases. Adaptive designs include (1) adapting on allocations (how subjects are allocated to treatments), (2) sampling rule (adapting how many subjects are used in each stage), (3) stopping rule (adapting when to stop the trial, e.g., for efficacy, harm, or futility), and (4) decision rule (adapting to how the next steps move forward). Such designs may save resources and time, if there are unequivocal signs that a treatment is not effective.

There are challenges with the decision rules and the implementation of the changes. Adaptive designs could seamlessly move from phase I through phase II and onto phase III. The same information could be obtained without the expected time loss between phase II and phase III. A portion of the data is monitored as it is being generated and allows for design adjustment typically by sample size re-estimation and/or stop a trial arm and continue. This process would eliminate noninformative or poorly performing doses and remove the interval between phase II and phase III. However, the design commits investigators in advance to the predefined corrections from the phase II trial. When a classical phase II approach is used followed by a phase III trial, the similar adjustments may be made, but with a time lag to work through the alterations, which often can be as much as 1–2 years. Hence, there must be better understanding of endpoints, recruitment patterns, and expected treatment responses. The knowledge gained from a typical phase II trial on other outcome assessments would not be available. Further, the decision rules must be completely and clearly spelled out in advance. Decisions to add subjects may be required if initial assumptions were incorrect, and this could lead to increased sponsor costs, investigator frustration, and even lower subject participation. Recruitment motivation may diminish if signals of safety or lack of efficacy are observed. Nevertheless, there are the advantages of altering design when initial assumptions are wrong, for example, the event rate in the control group differs greatly from expected and limiting unrealistic expectations for trial outcomes. Logistical decisions must be made as to when to make adaptations and critically who sees results and makes decisions. A problem for MG trials is that rates of events are not necessarily constant in time and there, if review of data is taken too early during follow-up, the assessment may not accurately represent the study period. However, a delay in adaptation decisions to obtain longer-term outcomes creates its own challenge with delayed enrollment and compromises the next enrollment period.

Conclusion

The design of clinical trials involves a multidisciplinary approach encompassing the disease and its characteristics, the drug or treatment and its putative mechanism of action , the inclusion and exclusions criteria, the endpoints, the logistics and duration of the study, and the ability to recruit and complete the trial in a reasonable time frame. The type of design is wide ranging and involves what has been done in the past as well as what might be more optimal. No design is perfect and all designs involve trade-offs. The rigor required in clinical trials today is increasing, but the payoff from high caliber clinical trials may well be worth the effort.