Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction: Evidence-Based Decision-Making in Health Care

1.1 The Primary Cause of Things

In an early publication over a decade ago [1], we proposed that it was necessary to approach the then novel and emerging field of evidence-based dentistry from a teleological perspective. To state it in brief, the Aristotelian concept of teleology proposes that we must look for the primary causes of things, acts, or events, and that when such things, acts, or events are initiated and sustained toward a clear and definable end, then they are “teleological.” In simpler terms, a process or action is teleological when it is driven and conducted for the sake of an end that can be outlined; that is to say, a teleological process is one that displays a articulated finality, that has a cogent final cause, a “telos” (Greek for final cause, τελοσ).

Aristotle (384 BC–322 BC) identified and discussed two types of final causes. He proposed that:

  • A process endowed of an “external” final cause was one driven for the sake of, and dependent upon, a primary reason external to itself, such as for example a parent acting for ensuring the well-being of the child.

  • By contrast, a process is said to have intrinsic telos (intrinsic cause of finality) when it operates for itself, and not for the sake of something external to itself. One could cite the example of the crook taking advantage of the rich widow for the simple outcome of enriching himself by tricking her into bestowing upon him expensive gifts.

Contemporary philosophy of science errs in dismissing the fundamental tenets of teleology, since it is evident that its principles could significantly contribute to the interpretation of existing observations in a range of fields, from behavioral neuroscience to clinical psychology. Furthermore, teleological propositions could help advance the field of health care by providing sound emphasis to novel models toward increasing the health literacy of the patients, and their compliance to treatment, and well as toward yielding more powerful, Footnote 1 valid, and reliable modes of treatment intervention.

Whether one consults the tradition of what has now been called “Western medicine,” or the traditional medical models found in innumerable cultures and societies around the world, which are subsumed in the term “alternative medicine,” and whether one considers either of these medicines alone or in combination to each other, in a “complementary” mode as we now call it, it is clear that there is a fundamental finality to the action of he who provides health care to another: to assist and to cure, causing no harm, while grounded in a moral of conduct that strives to respect, to celebrate and to benefit the life of the person being cared for [2].

Clearly, the actions of those who provide health care, be they physicians, dentists, nurses, or clinical psychologists, have a certain, clear, and distinct extrinsic teleological finality: the health-care provider actions, decisions, and interventions are driven for the sole purpose Footnote 2 of ensuring that the patients regain, sustain, or maintain his or her physical, mental, and psychoemotional well-being. The health-care provider’s action may therefore be either predominantly interventional in nature – that is to say, there is a disease process or an illness that needs to be countered so that the patient can regain physical and mental well-being; or it can encompass largely preventative measures that are meant to assist the patient to sustain and to maintain health and well-being.

The finality of Western medicine is principally intervention, although clearly not totally so— many Western medical interventions (increasingly, one should note in fact) are directed toward preventive medicine, but it is the case that the Western health-care tradition is called on, predominantly, to cure disease rather than to prevent it. By contrast, non-Western health-care traditions—“alternative medicines,” such as the traditional Indian Ayurvedic corpus of medical interventions—generally aim to prevent illness primarily, and secondarily, when necessary, to intervene to regain the state of non-illness.

In part because of this fundamental divergence in the extrinsic teleological ­primary cause of Western and “alternative” health care, and because of the distinct philosophical traditions, Western medicine has come to rely more on continual novel developments in experimental evidence than on rigid and outdated tenets. By contrast, “alternative” medicines are grounded primarily on centuries-old ­knowledge and tradition, which now are tested, characterized, and confirmed by sound experimental designs.

Thus has evolved the misgiving that whereas Western medicine is based on research evidence, “alternative” medicines are not. Indeed, it has also been charged that “alternative” medicines cannot, perhaps can never be based on research evidence for the simple reason that “alternative” medical procedures and protocols have not, and in some instances cannot be subjected to “mainstream” experimental methods [3]. That is, of course, a fallacy of the like of proving that something cannot be proven not to be round because it is not round to begin with.

What we know today in the Western world as Western medicine has grown and evolved from our Western scientific tradition, and it is what it is because of its foundations in Western culture and the Western view on science, health, and disease. It gathers experimental evidence to prove or to disprove hypotheses in support of further growth and evolution that particular—one could say biased—view of health and illness.

By contrast, “alternative” medicines have arisen from distinct philosophical tradition and are ground in views and tenets of health and illness that are different, sometimes diametrically opposed to the Western view. To subject “alternative” medicines to the criteria of the Western tradition is as ill-conceived as the reverse: to subject the Western medical tradition to the test of “alternative” medical ­schemata. Whereas we, in the West, will seek to characterize molecular pathways, proteomic signatures and profiles, and cell lineages; “alternative” scientific traditions will seek to capture the essence of the totality of the balance among the multiple aspects—mind and body—that constitute the individual and determine his/her state of health or ill-health, ease or disease.

To be overly simplistic, one could state that the Western view on the health sciences might be that the primary cause of a disease process might be certain cellular, biochemical, or molecular events gone wrong. Therefore, the primary cause of health care might be to identify, by means of stringent experimental work, what the responsible biological events might be so that then, based on that research evidence, they can be targeted in focused and directed therapeutic interventions.

By contrast and by opposite, the “alternative” medicines view would suggest that the primary cause of a disease might be the imbalance among certain energies (cf. Qi in Traditional Chinese Medicine) among certain organs or bodily regions or systems. Therefore, the teleological drive behind health-care intervention might be a complex and well-articulated, individualized program, which might include certain manipulations (e.g., acupuncture), medicaments (e.g., infusions), or self-directed interventions (e.g., Tai Chi).

Recent decades have witnessed an attempt by many to bring together these two fundamentally distinct tradition for the ultimate benefit of the patient. The Western tradition has become increasingly well-versed in “integrative medicine” and ­“systems biology,” in an effort to elucidate, based on the traditionally Western hypothesis-driven “scientific process” the fundamental tenets of mind–body interaction. Footnote 3

Be that as it may, what is clear is that health care is an evolving science. Whether we consider it from the current Western perspective, or the “alternative” views held in non-Western cultures, the challenge of procuring interventions to our fellow humans, for the purpose of preventing disease, maintaining and sustaining health, or countering an illness and regaining health, is always dependent upon our training, skill, and expertise. Whereas we receive our training during our formative years, we are called to sharpen our skills and expertise continuously through a process of continuing education. We continued improvement of our clinical judgment rests upon our sustained effort to update our knowledge base with the best newly available information, and our continuously improved clinical decisions depend upon the extent to which we utilize the informational evidence at our disposal to ensure effective and efficacious preventive or treatment interventions.

In other words, and to return to the primary cause of the point brought forth here, the extrinsic teleologic determinant of our service to our fellow human beings, the external final cause of our providing health care, whether in the Western medicine context or in the context of “alternative” medicine, is to proffer benefit to them in the form of maintaining, sustaining, or regaining health. That is achieved, regardless of the medical tradition we espouse, depending to a large extent on our training, and on how well we keep sharpening our skills and expertise by ensuring updates of the best and most reliable new information and evidence.

We could say therefore that, whether we are trained in Western health care or in “alternative” health care, we must update our skills and expertise continuously only with the best available evidence in order to ensure that we provide effective and efficacious care to our patients. The external final cause, the prima causa, Footnote 4 from a teleological viewpoint, of our updating our skills and expertise with the best available evidence is to perfect our clinical decision-making.

1.2 Based on the Evidence Versus Evidence-Based

It is argued that the Western view of delivering health care is superior because it rests on research evidence. This, one might argue is a fallacy as well, simply because there is good research evidence and there is bad research evidence. If research evidence is tainted by a suboptimal research methodology, if bias and error abound, if data are misanalyzed and misinterpreted, then it is possible and even probable that the utilization and integration of that evidence in the clinical decision-making process will result in, at best, a useless and, at worst, a harmful intervention for the patient. It is critical that the research evidence we utilize to sharpen our skills and expertise be the best available.

That, in and of itself, seems self-evident and routine: do we not have a reliable, albeit complex, responsible, albeit overburdened, efficient, albeit imperfect system of peer review to assess and to determine the quality of the research evidence, which we might eventually integrate in our clinical decision? Indeed, we do—but, it is precisely because of its inherent complexity that our system overburdens the reviewers and more often than not leads to imperfect reviews, incomplete assessments, and biased evaluations of research, that we face the real risk of encountering all too often peer-reviewed published evidence that is laced with errors, bias, and weaknesses.

Should we be so fortunate to have the time materially to scrutinize each report so as to eliminate what is not acceptable due to excessive error and bias, and keep only what is the best available evidence to update and to sharpen our skills and expertise, then we would indeed do great service to our patients. Short of that, short of ensuring that we only integrate into our clinical decisions the best available evidence, we put our patients at risk, and we contradict the very oath we hold dear of “do not harm.”

The medical literature is gargantuan. Even if we had the expertise to do so, we could not exhaustively peruse the published reports in the manner just outlined, and still have the material time to take care of our patients. Therefore, we would become selective on which report we are going to peruse. By doing so, inevitably, we insert into the very process the gravest fault of all research: the bias of selection. By selecting what report we shall consider in our perusal, we de facto select the kind of evidence we will be willing to utilize in the process of sharpening our skills and expertise: we de facto taint the very process of our clinical decision-making with a bias that is inappropriate because it is not related to the condition of the patient, to the intervention we are considering, or to the outcome sought.

That is to say, health care based on the evidence suffers from an unalienable bias. It is thus inappropriate and can even be dangerous to the well-being of our patients. By contrast, when a systematic process of synthesis is applied to the entire body of the available evidence, such that the acceptable evidence can be obtained, from which a consensus of the best available evidence can be derived, evidence-based health care is procured, which the optimal and safest manner to update skills and expertise to provide effective and efficacious health care. In brief, the best available evidence emerges from a concerted process of systematically synthesizing and analyzing all of the available evidence that pertains specifically to the patient under consideration, the interventions under consideration, and the clinical outcome under consideration.

Evidence-based Western medicine, therefore, entails making fully informed clinical decisions that integrate not only the patient’s medical history and clinical test results but also the training of the clinician, and his/her skills and expertise updated by the consensus of the best available research evidence, itself derived from a systematic process of research synthesis. To exactly the same extent, evidence-based complementary and alternative medicine (CAM) [4] utilizes and integrates the patient’s information, with the clinician’s training, skills and expertise, as well as the consensus of a research synthesis process that yields the best available evidence for judicious clinical decision-making that relies upon comparative effectiveness and efficacy research and analysis for practice (CEERAP) [5].

2 Research Synthesis

2.1 Introduction

The essence of the science and role of research synthesis in the context of CAM can be rendered by the following two quotes:

  • The French moralist and essayist, Luc de Clapiers, Marquis de Vauvenargues (August 6, 1715 to May 28, 1747), stated in his Réflexions et Maximes that …il est plus aisé de dire des choses nouvelles que de concilier celles qui ont étés dites: that is to say—it is easier to say new concepts than to reconcile those things that have been said. That is precisely the purpose and ultimate goal of research synthesis for evidence-based medical practice: to reconcile research evidence toward obtaining the best available evidence for effective and efficacious treatment intervention.

  • The British physicist John William Strutt, the 3rd Baron and Lord of Rayleigh (November 12, 1842 to June 30, 1919) also said that “… the work which deserves, but I am afraid does not always receive, the most credit is that in which discovery and explanation go hand in hand, in which not only are new facts presented, but their relation to old ones is pointed out…”. In the vast and complex domain of alternative and complementary medicine, that is particularly the case: that is, the clinical importance and relevance of juxtaposing new facts and evidence to century-old non-Western medical tradition.

2.2 Protocol

Research synthesis follows the scientific method [610], which can be outlined in brief as follows:

  • Statement of the hypothesis and research question

  • Crafting of the research approach to test the hypothesis and answer to the research question (i.e., research design, sampling issues, tools of measurement)

  • Presentation of the findings and summary of the results by means of descriptive statistics

  • Statistical analysis of the data

  • Inferences, discussion of limitations and intervening variables, identification of future research toward further testing the hypothesis and answering the research question in greater details

2.2.1 The Question

Firstly, it is critical to set the question of the research at hand and to realize that a research question, when stated in the affirmative, is nothing but the study hypothesis. Thus, for instance, one could set out to test the research query of whether or not Ayurvedic intervention can prevent the onset of ulcerative colitis by stating the following research question:

  • Is Ayurvedic intervention effective and efficacious in preventing the onset of ulcerative colitis?

In the same vein, the research hypothesis will become:

  • Ayurvedic medicine is effective and efficacious in preventing the onset of ulcerative colitis.

When a piece of research is built and crafted to answer specifically one such research question, it is qualified as a hypothesis-driven study. The search for the best available evidence, which is obtained through the research synthesis design, is hypothesis-driven because it addresses a specific type of research question that is rendered by the acronym P.I.C.O. (patient, interventions under consideration, outcomes). The more specific nature of the comparative effectiveness question as it entertains as well timeline and settings considerations engenders a more specific acronym from those studies: P.I.C.O.T.S. [5, 10, 11].

The P.I.C.O. and P.I.C.O.T.S. research questions direct the search for evidence about the which intervention under consideration may, or may not be more effective or efficacious Footnote 5 for the particular patient population targeted in the study, and in light of the specific clinical outcome of interest. In that regard, the P.I.C.O. and P.I.C.O.T.S. questions drive the process of search and analysis of the best available evidence by means of the research synthesis design.

In brief, the P.I.C.O. and P.I.C.O.T.S. questions define and determine the sample of publication to be scrutinized to obtain the available evidence, the tools of evaluations that serve to assess the best evidence, the statistical analysis required to establish reliability and validity of the results, and the inference of the findings for immediate implication to clinical practice. The P.I.C.O. and P.I.C.O.T.S. questions also set the criteria for deductive reasoning leading incremental progress of research in the future.

The question is crafted based on descriptors of:

  • The clinical problem and patient population (P)

  • The clinical interventions (I) under

  • Consideration/comparison/contrast (C), and

  • Clinical outcome (O) of interest: PICO. The PICO question may undergo minor changes and alterations, as per the specific research question: it may examine a

  • Predictive (P), rather than a comparative model (hence, PIPO); or it may incorporate

  • Elements of time (T) and

  • Settings (S) (hence, PICOTS, PIPOTS)

In brief, the method of science instructs that the creation of new knowledge that is obtained through research is driven by the scientific method. The scientific method consists of a series of sequential steps that arises from a theory, a hunch, or a simple observation.

2.2.2 The Methodology

Secondly, it is important to note the two principal domains of methodology, as they pertain to the research synthesis process. On the one hand, the sample of a research synthesis design consists in the peer-reviewed and non-peer-reviewed published research literature, as well as unrecorded observations. Thus, the term “available” underscores the fact that we limit the subjects of study in a piece of research synthesis investigation, in the same manner as any other piece of research, to the accessible sample: that is to say, the accessible research literature that specifically targets Footnote 6 the question under study.

Unpublished evidence and evidence that is published in non-peer-reviewed journals are often excluded from a research synthesis design, in part, because it is exceedingly difficult to obtain these types of evidence in a valid and reliable manner. The literature available through the proceedings of scientific meetings, dissertations, and non-peer-reviewed journals is termed “gray literature” and is likewise most often excluded from research synthesis endeavors. In brief, it is argued that the evidence that has not been sifted through the widely accepted peer-review process is likely to be fraught with issues of validity, quality, and bias, which will interfere with the research synthesis process.

The research synthesis process is most often focused, otherwise indicated, on peer-reviewed literature. The search for that sample is obtained by utilizing the medical subject headings (MeSH terms) and keywords that can be derived from the P.I.C.O./P.I.C.O.T.S. question. That is, the attention given to crafting a superior P.I.C.O./P.I.C.O.T.S. will determine the quality of the sample.

The search is actualized by accessing the National Library of Medicine (Pubmed-Medline, www.ncbi.nlm.nih.gov/pubmed), and usually at least two other search engines (e.g., Cochrane, www.cochrane.org; Bandolier, www.jr2.ox.ac.uk/bandolier; EMBASE, www.embase.com; Center for Review Dissemination; www.york.ac.uk/inst/crd; Google scholar; etc.). Footnote 7 The purpose of the multiple search is to ensure comprehensive inclusion of all of the available literature within the confines of the inclusion/exclusion criteria dictated by the research synthesis process, while at the same time minimizing as much as possible dangers of selection bias and systematic sampling errors.

It must be underscored that some degree of publication bias cannot be avoided simply because, as a general rule, papers that are statistically significant, whether they demonstrate clinical relevance or not, tend to be preferentially published in the scientific literature, compared to reports that demonstrate clinical relevance but fail to reach statistical significance. The problem of publication bias is inherent to our present system of scientific literature and is an unavoidable issue of the research synthesis process, which is generally discussed as a limitation of the utilization of the best available evidence in considerations of the clinical relevance of the findings, and clinical decision-making [510].

As noted, a well-stated P.I.C.O./P.I.C.O.T.S. question will reveal imbedded keywords for the literature of interest. When the sample of literature thus obtained is very small, a reconsideration of the P.I.C.O./P.I.C.O.T.S. question will be required to make them broader, and therefore encompassing of a larger segment of the available research bibliome. Footnote 8 That is so, principally, because a research synthesis protocol on a sample of literature that is less than 5 may lead to meaningless analyses and interpretations. By contrast, when the resulting sample of literature is very large, then inclusion and exclusion criteria must be set to restrict the search outcome in order to make it more specific to the P.I.C.O./P.I.C.O.T.S. question.

It actually may occur that the sample of literature that is produced by the initial search remains gargantuan, following and despite stringent inclusion and exclusion criteria. Then, a process of random sampling of the resulting literature subpopulation may be confidently entertained, and the research synthesis design may be ­conducted on a random sample, in a process akin and identical to that used to obtain a random sample of subjects in an experimental design or a clinical trial.

Furthermore, the sampling process in research synthesis suffers from the same threats and limitations as the process of sampling in other research designs (i.e., observational designs, experimental designs, randomized clinical trials). For example, the threat of selection bias adulterates the sampling process in experimental studies when sampling is driven by convenience rather than by chance. Sampling of the literature suffers likewise from selection bias, when, for instance, our evaluation capabilities (i.e., critical reading, assessment tools) fail to be all-inclusive, including such barriers as include language, search engine, library availability, among others. That is another facet of the publication bias noted above.

On the other hand, the second major domain of methodology in the research synthesis designs pertains to the assessment of the level and quality of the evidence. As the sample process described above yields the available evidence, the assessment of the quality of the evidence uncovers the best evidence.

The goal of research synthesis is to obtain the best research evidence pertaining to any given scientific question, and making available and accessible. At issue, therefore, lie the specific definition and the practical quantification of the term “best.” Two contemporary schools of thought can be succinctly described as such:

  • On the one hand, there are those who defend the original proposition that a ranking system can be arbitrarily devised to evaluate the strength of the results of a study purely on the basis of the nature of the design.

  • On the other hand, some argue that the best research is that which most strictly adheres to the fundamental tenets and standards of research methodology, design, and analysis.

The first system inevitably establishes one research design as superior, and another as inferior, and has evolved in a pictorial representation, that is, as we have stated elsewhere [710], as ludicrous as it is useless to the pursuit of the best available evidence.

To represent a ranking system graphically, such as a pyramid, which places clinical trials about the top, and animal studies about the bottom, is to ignore two important facts about health-care research:

  1. 1.

    Animal studies are a sine qua non to clinical trials—every and any intervention clinical trial on a group of patient cannot be initiated unless the proper safety and toxicity studies have been run on animal models.

  2. 2.

    Clinical trials encompass in fact a family of research protocols that begin with fundamental mechanistic studies on human materials, that is why, even at that very early stage, the National Institutes of Health Footnote 9 refers to this research as ­“clinical research” continue on testing with animal subjects and, only when deemed safe, are tested for efficacy and for effectiveness with human normal subjects first, only then is a sample of patients tested (clinical trial, Phase III), and ultimately with a larger group of patients across study centers (Phase IV). Footnote 10

The level of evidence pyramid simply ignores these facts, and in a wantonly oversimplified approach—some would say—assigns a rank close to the best to any study that tests an intervention on patients. This is achieved by means of a checklist, the consolidated standards of reporting trials (CONSORT) [511, 13]. Originally developed over a decade ago [13], it continues to suffer from its fundamental flaws even in its most recent upgrade and revision [14], and varied applications and modifications, including the 22-item checklist for evaluating the conduct of randomized controlled trials in livestock with production, health and food-safety outcomes (REFLECT) [15], the statement developed to ensure the developed the strengthening the reporting of observational studies in epidemiology (STROBE) [16, 17], or for strengthening the reporting of genetic association studies (STREGA; strega-statement.org) [1820].

In the exact same mode, the STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA) were developed, published in the original format about a decade ago [21], and revised recently [22]. To this end, a collaboration between the STRICTA Group, the CONSORT Group, and the Chinese Cochrane Centre was established, and a panel of experts consulted. A consensus was obtained for the revised the STRICTA checklist to include 6 items and 17 subitems, which probe the acupuncture rationale, the details of needling, the treatment regimen, other components of treatment, the practitioner background, and the control or comparator interventions. The revised STRICTA benefits from a set of clear explanations of the criteria for each item, as well as examples of good reporting for each item. The revised STRICTA checklist is intended for use in conjunction with both the CONSORT statement and its extension for non-pharmacological treatment [22].

In the case of Ayurvedic medicine, concerted efforts have been deployed to utilize and to integrate CONSORT criteria [23, 24], but they have met with fundamental difficulties because of the characteristic complexities of the multimodal facets of this mode of integrative medicine. As of late, it was recognized Footnote 11 that individual standards of a nature similar to CONSORT, STROBE, and STRICTA are needed for all systems of traditional complementary and alternative medicine, including Ayurveda [25, 26].

In the assessment of the level of evidence, the very top level of the pyramid is given to the systematic reviews, perhaps because early on in the establishment of research synthesis in evidence-based and comparative effectiveness research, it was presumed that systematic reviews in the health sciences ought to incorporate clinical trials exclusively.

The level of evidence is established on the basis of the type of study design that was used to generate the evidence under evaluation. Typically, a hierarchy is generated as follows (cf. US Preventive Services Task Force):

  • Level I: Evidence obtained from at least one properly designed randomized controlled trial.

  • Level II-1: Evidence obtained from well-designed controlled trials without randomization.

  • Level II-2: Evidence obtained from well-designed cohort or case–control analytic studies, preferably from more than one center or research group.

  • Level II-3: Evidence obtained from multiple time series with or without the intervention. Dramatic results in uncontrolled trials might also be regarded as this type of evidence.

  • Level III: Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.

The UK National Health Service uses a similar system with categories labeled A, B, C, and D:

  • Level A: Consistent Randomized Controlled Clinical Trial, cohort study, with clinical decision rule validated in different populations

  • Level B: Consistent Retrospective Cohort, Exploratory Cohort, Ecological Study, Outcomes Research, case–control study; or extrapolations from level A studies

  • Level C: Case-series study or extrapolations from level B studies

  • Level D: Expert opinion without explicit critical appraisal, or based on physiology, bench research or first principles

In more recent years, since the fast emergence of systematic reviews, it is generally accepted that systematic reviews have a level of evidence that is even higher than I or A—a level “super-I/A.” The complication of course arises at present, when one considers that the science of research synthesis continues to evolve, such that multiple systematic reviews on a given clinical questions can now be pooled into what has been referred to as complex systematic reviews, and as clinically relevant complex systematic reviews (CRCSR) [27].

The initial attempt to quantify the CONSORT checklist was the Jadad scale [28], which overwhelmingly suffers from low reliability (i.e., unsatisfactory inter-rater and intra-rater reliability, unsatisfactory Cohen k coefficient of agreement) and construct or content validity, as discussed elsewhere [710]. Nevertheless, proponents of the assessment of the level of evidence purport to establish the best available evidence based on those criteria alone.

We and others have proposed that the best available evidence is not what stands atop a pyramid, but rather the research evidence that emerges from top-quality research: that is, research that satisfies the fundamental and widely accepted standards of superior research methodology, design, and data analysis. High-quality research answers the research question and tests the hypothesis in a scientific approach that is the most sound possible, considering all the limitations, intervening variables, and other possible confounders. Therefore, high-quality research, whether it is a clinical trial, an observational study, or an experimental design, whether it addresses Western medicine, Chinese medicine, or Ayurvedic medicine, will be of high-quality if it satisfies the criteria and standards of sound research methodology, design, and analysis; and in that regard, it promises to generate the best evidence.

What really is important is not so much what type of research was done, but how it was conducted. That only determines the excellence of the evidence produced [711, 13].

That is the view espoused by the second school of thought about how to obtain the best available evidence. The best evidence is not to be inferred by a checklist, but rather quantified on the basis of stringent and commonly shared criteria of excellence.

Increasingly, systematic reviews address the concern of the quality of the evidence. Usually, it is obtained by means of an in-house tool developed ad hoc, and only briefly described. Increasingly, however, well-constructed instruments to assess the quality of the evidence are used that are psychometrically tested for reliability and for validity and that generate continuous, or semicontinuous, score measurements [2931].

Specifically with respect to evaluating the quality of systematic reviews, Shea and colleagues developed and characterized the assessment of multiple systematic reviews instrument (AMSTAR), through a process of factor and cluster analyses of previously existing instruments for this purpose (e.g., Overview Quality Assessment Questionnaire, OQAQ; Sacks’ checklist; quality assessment of studies of diagnostic accuracy included in systematic reviews, QUADAS) [32, 33]. This process resulted in the identification of 11 domains that are essential for high-quality systematic reviews:

  • “A priori” design provided

  • Duplicate study selection and data extraction

  • Comprehensive literature search

  • Status of publication (i.e., gray literature) used as an inclusion criterion

  • List of studies (included and excluded) provided

  • Characteristics of the included studies provided

  • Scientific quality of the included studies assessed and documented

  • Scientific quality of the included studies used appropriately in formulating conclusions

  • Methods used to combine the findings of studies

  • Publication bias

  • Conflict of interest

The AMSTAR was recently updated, revised, and made stringently quantifi­able [31].

2.2.3 The Design

The third main point of the protocol of a scientifically driven research process is the design. The design of studies aimed at obtaining the best available evidence for the effectiveness and efficacy of clinical interventions are, by definition, research synthesis designs. The elements of these designs are the very components we have outlined to this point. When research designs in general and research synthesis designs in particular are planned and conducted correctly, they must produce quantifiable measures, which can be analyzed statistically.

2.2.4 The Analysis

Thus, the fourth and critical step in the pursuance of comparative effectiveness and efficacy is the analysis of the data. Over a decade ago, it became apparent that standards must be established for the appropriate reporting of meta-analytical analyses, especially when these pertained to the identification of the best available evidence for health care. The Quality of Reporting of Meta-analyses (QUOROM) statement [34] presented in 1999 as a checklist, and a flow diagram to outline the optimal flow of presentation of the abstract, introduction, methods, results, and discussion sections of a report of a meta-analysis. They were structured and organized into 21 headings and subheadings, which had the advantage of providing a set of guidelines for investigators, but were often arduous to understand and follow for the neophytes. In a recent development, QUOROM was revised and improved, and presented as the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement in 2009 [35, 36]. Whereas longer and more complex than CONSORT, PRISMA Footnote 12 consists of a 27-item checklist and a four-phase flow diagram, which is actually more user-friendly than QUOROM.

It is customary to think of research synthesis and meta-analysis as one and the same. But, whereas research synthesis is the structure by which the investigator obtains the systematic review, the meta-analysis is one of the protocols that the investigator will utilize judiciously to obtain one specific aspect of analysis of the data of the systematic review. There may be instances where a meta-analysis is not needed, or impossible to conduct in a given systematic review. That, in and of itself, does not diminish the value of the systematic review product, and the strength of the evidence it presents [6, 7].

In and of itself, meta-analysis is simply a statistical protocol, a combinatorial process of analysis that is extraordinary sensitive to several properties of the data. Two principal properties deserve mention in the context of this discussion are heterogeneity/homogeneity of outcome, and data quality

  1. 1.

    Clinical outcomes, whereas they may seem to clear and crisp measurable entities, more often than not can be quantified in more than one way. The heterogeneity in outcome measure is one clear danger for the validity of any meta-analytical reasoning, because it speaks directly to what, really, are we combining together, what really are we making overall inferences about. There are statistical tests that we must run on the outcome measurements that establish whether or not homogeneity is verified (cf. Cochran Q and its transformation as the I2 test)—that is to say, whether or not the extent of outcome measure heterogeneity is within the level of confidence and is, in fact, not statistically significant [37].

  2. 2.

    The data pooled together into a meta-analysis be from reports that are deemed of good quality. If the data in the input are all of high quality, then the variability due to residual inexplicable error will be small, and the effect, if there is one, will be apparent and clearly statistically significant. If, on the other hand, the data that are used in the meta-analysis originate from studies that are fraught with serious quality issues, then each of these sets of data will carry into the meta-analysis its contribution of residual inexplicable error, and the total overall variability will be large and negate the ability of a statistically significant overall effect to become apparent over this residual error “noise.” Similarly, albeit not as dramatically, if a meta-analysis should incorporate some solid and good studies and a few studies with serious quality issues, the contribution of the former to the variability due to residual inexplicable error will be small, but the contribution of the latter to the overall error will be disproportionately large. That will, more often than not, mask a statistically significant overall effect.

For that reason, many—most, but not all—investigators argue in favor of a two process of data analysis for systematic reviews:

  • One school of thought that argues in favor of including all—bad and good—studies in a meta-analysis, akin perhaps to including all—good and bad—materials in the construction of a skyscraper. Should we be surprised if a high proportion of meta-analyses conducted in this manner evince no statistical significance overall?

  • The other school of thought proposes to establish first the quality of the research evidence by acceptable sampling analysis [6, 7, 31]. Then, based on these assessments, eliminate the studies that demonstrate excessive flaws, as determined by the score of the quality of evidence assessment tools (i.e., acceptable sampling analysis). For the studies that remain, test for homogeneity, and if no significant heterogeneity is noted with the accepted studies, then run the meta-analysis. The forest plot thus generated has the best likelihood of evincing overall significance, if there is one to be shown. Stated in statistical terms, it is necessary to perform both acceptable sampling and homogeneity analyses in order to ensure power of a meta-analysis.

The question remains as to what might be the recommended statistical approach to follow when performing a CRCSR, a synthesis of several systematic reviews. Following assessment of quality (e.g., R-AMSTAR, 30), and acceptable sampling, the CRCSR must be tested for homogeneity, as noted. If homogeneity is established, then meta-analysis will be permissible. But, from a purely statistical standpoint, our current conceptualization of the meta-analytical protocol pertains to coalescing data obtained from primary studies (e.g., clinical trials), not secondary studies (i.e., systematic reviews) that themselves present their own individual meta-analyses. The current attempts to generate “cumulative meta-analyses” as the simplistic additive product of a new meta-analysis generated every time a new piece of evidence emerges Footnote 13 [38, 39] appear to be incongruent with statistical theory on several grounds [40]. For example, the suggested approach implies repeated analytical testing of data set (n), as the data set grows to include the new piece of evidence (n  +  1). As stated, the principles do not proffer any limit to these repeated testing events, which seem at prima facie to incorporate the same bias Footnote 14 one finds upon performing repeated t tests. Further exploration of the theoretical tenets that impinge upon cumulative meta-synthesis is urgently needed, and lest cumulative meta-analyses accumulate in the literature needlessly.

2.2.5 The Consensus Inference

The question that researchers ask pertains to whether statistically significant differences obtain. Then, somehow, forest plot summary data and confidence intervals, which are coalesced and analyzed group data, are transformed by means of the magical—it may seem—process of interpretation and inference into clinical ­relevance, a consensus inference.

The consensus statement must be clear statement of the clinical implication and relevance of the research synthesis and meta-synthesis. It must present clearly the strength of the clinical recommendation thusly conceptualized. The GRADE (Grades of Recommendation, Assessment, Development, and Evaluation) approach is an instrument for grading the quality of underlying evidence and the strength of clinical recommendations [31, 32]. In a similar vein, the AGREE (Appraisal of Guidelines and Research and Evaluation, Europe) is an instrument developed to provide a basis for defining steps in a shared development approach to produce high-quality clinical practice guidelines revised based upon the best available evidence [33, 34].

Case in point is a recent systematic review of clinical trials and quasi-experimental studies aimed at testing certain among the plethora of the Ayurvedic medicine pharmacopoeic herbs that might contribute to a decrease in cholesterol and therefore reduce the risk of ischemic heart disease (P.I.C.O.). For this particular investigation, the pertinent literature was searched in the National Library of Medicine, the National Center for Complementary and Alternative Medicine, Ovid, and EBSCO Information Services at three time points (T). Three standardized reviewers were used to ascertain the inter-rater reliability of the quality of evidence assessments. Both issues of effectiveness and efficacy were examined and led to the overall consensus inference that Ayurvedic herbs can significantly benefit patients with hyperlipidemia [41]. In a similar vein, a retrospective meta-analysis of observation studies conducted on 85 Ayurvedic herbal interventions with reported anticancer efficacy pointed the fact that herbs with Katu, Tikta, Kasāya Rasa (i.e., bitter, pungent, and astringent taste), Usna Virya (i.e., hot biopotency), and Katu Vipāka (i.e., catabolic active metabolites), and herbs with dry, coarse, light, and sharp biophysical properties are endowed with both effective and efficacious anticancer properties [42].

In brief, perhaps the single most important use of the science of research synthesis and research meta-synthesis in the health sciences, including complementary and alternative medicine, pertains to empowering the clinician to make fully informed decisions for treatment that rest not only on the patient’s wants and needs, clinical tests and history, or the clinician’s experience and personal awareness of the available research, but, as well, on the best available evidence. It is important to stress the summative quality of this sine qua non: in addition to all the previous, which equate the best current clinical practice, reliance on the science of research synthesis and meta-synthesis signifies adding to the decision-making process the best available evidence. Hence, the need to have reliable instruments to assess and to establish the strength of the clinical relevance and recommendations for the uncovered best available evidence.

Whereas both the GRADE and the AGREE instruments are laudable efforts in the direction of fostering the growth and expansion of research synthesis and meta-synthesis, they also suffer from inherent psychometric weaknesses. Therefore, we have endeavored in the process of expanding the GRADE tool (Ex-GRADE Footnote 15) in an effort to emphasize not only in dual applicability to systematic reviews and to CRCSR but also in the solid conceptualization it offers of the strength of the clinical recommendation the instrument proffers. The information produced by the research synthesis process, and by the Ex-GRADE evaluation of the thusly obtained best available evidence, can then be safely utilized in clinical decision-making for effectiveness and efficacy [9, 10].

As stated above and in absolute terms, efficacy refers to whether or not a clinical intervention tested in the context of a clinical trial yielded valid and replicable outcomes. In lay language, we might say that efficacy tells us whether or not the treatment “worked,” and it does so because of its inherent dependence upon the effort the investigator in constructing the research project correctly, and fractionating as much as the random error as possible. In that regard, efficacy establishes the replicability of the clinical outcome. By contrast, effectiveness relates to the experiential reality of the clinical practice and pertains to whether or not the intervention minimizes risk, maximizes benefit, and yields these outcomes at the lowest (or at least the most reasonable) cost. It is fair to say that effectiveness does not pertain to a clinical trial study per se, but rather to the pragmatic implementations of its findings to the intricate complexities of clinical treatment. Considerations of effectiveness seeks to evaluate costs, benefits, and harms of clinical interventions, such as complementary and alternative medicine in general and Ayurvedic medicine in particular to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. Its purpose of is to assist consumers, clinicians, purchasers, and policymakers to make informed decisions that will improve health care at both the individual and population levels [43, 44].

2.3 Resources (Appendix) for Complementary and Alternative Medicine

It should be evident from the discussion above that, as the body of scientific information in health care and in complementary and alternative medicine grows, and because of differing criteria of establishing the quality of research reports, the scientific literature is becoming replete with multiple systematic reviews that pertain to the same original clinical question, but that may differ in their conclusions. This observation leads to the realization that the science of research synthesis is itself growing. Therefore, it is important that resources be identified that guide the search for, and the interpretation of the best available evidence.

One such resource is CAMline is an evidence-based website on complementary and alternative medicine (CAM) for health-care professionals and the public. It represents a successful collaboration of conventional and CAM organizations, interests, and expertise (www.camline.ca/index.html). Other similar resources are provided in the Appendix.

3 Conclusion

In the preceding paragraphs, we have touched upon the salient fundamental elements of the evidence-based process in complementary and alternative medicine. We have proposed that in the domain of the science of research synthesis, we must engage in the gargantuan task of establishing methodologies, designs, modes of statistical analysis, and appropriate inferential criteria for the process of synthesis of systematic reviews into “meta-systematic” reviews. We need to go beyond the current protocols of research synthesis that pertain to primary research reports, and develop and validate new and effective procedures for “research meta-synthesis” for the evaluation of the best available evidence now existing in the form of systematic reviews. We have identified established protocols, and uncovered new and novel avenue of emergence of this rich and active field.

Specifically, we have proposed the thesis of this chapter is that CAM is enriched by the systematic approach of comparative effectiveness and efficacy research and analysis for practice (CEERAP). We discussed the implications and applications of EBDM in CAM, the systematic nature of the CEERAP process toward EBDM, and the pitfalls and limitations of this approach as it pertains specifically to Ayurvedic medicine.