Introduction

Despite impressive interindividual differences at the clinical level, individuals with ‘autism spectrum disorder’ (ASD) essentially share two major characteristics: (a) deficits in social interaction and communication and (b) behavioural abnormalities, including stereotypic behaviours, insistence on sameness and/or restricted interests. Genetic factors largely contribute to the pathophysiological mechanisms underlying ASD (Persico and Bourgeron 2006; Persico and Napolioni 2013). Twin studies performed two decades ago recorded concordance rates of 73–95 % in monozygotic twins compared to 0–10 % in dizygotic twins, yielding heritability estimates above 90 % (Steffenburg et al. 1989; Bailey et al. 1995). Moreover, first-degree relatives of individuals diagnosed with autism often display behavioural traits qualitatively similar, but much milder in severity, compared to those present in their affected siblings (Piven et al. 1997): This ‘autism spectrum’ or ‘extended phenotype’ strongly points towards the existence in the general population of several continuous dimensions pertaining to social cognition, rather than ‘health-or-disease’ categorical conditions. Unfortunately, the genetic underpinnings of ASD are neither simple nor consistent. In approximately 10 % of cases, autism is secondary to known genetic or chromosomal syndromes, including fragile X syndrome, tuberous sclerosis, 15q chromosomal syndromes and many others (Zafeiriou et al. 2013; Persico and Napolioni 2013). An understanding of the pathophysiological underpinnings of syndromic forms, especially fragile-X and tuberous sclerosis, has been instrumental in defining targeted molecular strategies currently under scrutiny for ASD (Vorstman et al. 2013). Another estimated 7–10 % carry monogenic forms, due to de novo pathogenic mutations or copy number variants (CNVs), the latter including either microdeletions or microduplications; the majority of cases suffers from oligogenic or polygenic forms, stemming from ‘multiple-hit’ gene–gene and gene–environment interactions, typical of non-linear complex genetics (Persico and Napolioni 2013). For example, a recent twin study produced heritability estimates down to only 37 %, while as much as 55 % of variance was explained by shared environmental factors (Hallmayer et al. 2011). Other approaches yield somewhat higher heritability, estimated at approximately 40–60 % (Klei et al. 2012), but nonetheless all these recent heritability estimates are much lower than those reported in the 1990s (Steffenburg et al. 1989; Bailey et al. 1995). During the same two decades, the incidence of ASD has dramatically risen from 2–5/10,000 to approximately 1–2/1,000 children for strict autism (Fombonne 2009) and 6–10/1,000 for the broader spectrum (Baron-Cohen et al. 2009). Broader diagnostic criteria and increased medical awareness have certainly contributed to this trend (Rutter 2005; King and Bearman 2009). However, a real increase in incidence is also likely (Grether et al. 2009; Hertz-Picciotto and Delwiche 2009), especially considering the progressive increase in parental age at conception which characterizes Western societies in recent decades, a well-known risk factor for autism (Parner et al. 2012). Conceivably ASD may be drifting from a low-incidence, highly heritable, primarily monogenic disorder, compatible with latent class analyses, family and twin studies performed in the 1990s (Pickles et al. 1995; Steffenburg et al. 1989; Bailey et al. 1995), to a high-incidence, primarily oligogenic/polygenic disorder with complex genetic and epigenetic components, as supported by lower heritability estimates and relatively large contributions by common variants (Hallmayer et al. 2011; Klei et al. 2012).

This clinical heterogeneity and its molecular complexities have spurred increasing interest into biomarkers and endophenotypes, measurable quantitative parameters able to facilitate earlier and more reliable diagnoses, as well as the identification of subgroups of patients possibly sharing common pathophysiological underpinnings.

Biomarkers and endophenotypes: similarities and differences

The term ‘endophenotype’ was initially coined by John and Lewis (1966) who, working in insect biology and evolution, determined that the geographic distribution of grasshoppers depends primarily on ‘the endophenotype, not the obvious and external but the microscopic and internal’, rather than on their visible characteristics or ‘exophenotypes’. This concept was later applied to behavioural genetics by Gottesman and Shields (1973) in the context of genetic theories of schizophrenia and here refers to internal phenotypes detectable using a ‘biochemical test or microscopic examination’. By definition, an ‘endophenotype’ must satisfy five criteria, namely it must be: (a) associated with the disease in the general population, hence significantly more frequent or elevated among patients compared to population controls; (b) associated with the disease within the family, where endophenotype and illness must co-segregate; (c) heritable, indicating that it must have a genetic basis; (d) familial, meaning that it should have the highest frequency or amounts among patients, intermediate levels among their unaffected first-degree relatives and lowest levels among population controls, especially if screened for unaffected status and (e) trait dependent and not state dependent, i.e. it must reliably tag vulnerable individuals regardless of whether they are in a state of acute illness or in remission (Gottesman and Gould 2003). Endophenotypes, also defined as ‘intermediate phenotypes’, ‘subclinical traits’, and ‘vulnerability markers’, can potentially be very helpful in autism research, as simple, quantitative and heritable phenotypes can be typically linked to smaller sets of underlying genes compared to complex dimensions of human social cognition.

‘Biological markers’ of disease or ‘biomarkers’ can be defined as biological variable associated with the disease of interest and measurable directly in a given patient or in his/her biomaterials using sensitive and reliable quantitative procedures. Biomarkers are not necessarily genetically based, familial and trait dependent. Hence, all endophenotypes are also biomarkers, but not vice versa. Reliable sets of autism biomarkers would be immensely useful in clinical practice, as they could: (a) provide risk estimates at birth in ‘baby siblings’ of children already diagnosed with autism, in order to design and pursue preventive health care strategies; (b) foster earlier and more reliable diagnoses, especially between the ages of 12 and 30 months; (c) predict spontaneous developmental trajectories; (d) predict treatment response to specific rehabilitation strategies and (e) identify individuals pharmacogenetically at high risk for rare and severe adverse reactions to psychoactive drugs. Meanwhile, endophenotypes possibly included in broader biomarker panels could further contribute to define biologically homogeneous subgroups of ASD patients, uncover yet unknown causes of autism and promote a deeper understanding of pathophysiological processes underlying the disorder. In particular, the numerous genetic and environmental factors contributing to autism pathogenesis are bound to impinge upon a much smaller set of neurodevelopmental mechanisms which, once identified through appropriate biomarkers, should be amenable of partial or complete restoration by administering personalized molecular therapies (Vorstman et al. 2013).

Established biomarkers and endophenotypes in autism research

The search for biomarkers in autism has suffered from non-replications, primarily due to either unrecognized clinical heterogeneity (ironically often accompanied by insufficient clinical characterizations and descriptions of patient samples), or methodological limitations and bias. Not surprisingly, the most replicated biomarkers also coincide with endophenotypes, whose heritability, familial aggregation and trait dependence ensure greater reliability and more solid biological underpinnings. The following discussion does not represent a systematic review, but rather focuses on biomarkers selected for reliability and consistency, as well as for their heuristic potential.

Biochemical biomarkers/endophenotypes

Elevated blood serotonin (5-HT) levels, consistently recorded in 25–41 % of individuals with ASD (Anderson et al. 1990; Gabriele et al. 2013), represent to this date the most consistent and best-characterized biomarker in autism research. Hyperserotoninemia in autism is seemingly due to excessive accumulation of 5-HT inside platelets, while free 5-HT plasma levels are not affected (Cook et al. 1988; Anderson et al. 1990; Piven et al. 1991). Serotonin uptake in platelets is mediated by the same 5-HT transporter expressed in neurons (Lesch et al. 1993). Two studies report increased density of 5-HT transporters on the platelet membrane (V max) in autism, while transporter affinity (K d) for 5-HT appears unchanged (Katsui et al. 1986; Marazziti et al. 2000). Blood 5-HT levels are especially elevated in pre-pubertal autistic children, whereas after puberty this excess is less pronounced (McBride et al. 1998). Hyperserotoninemia is both a biomarker and a genetically based endophenotype, since blood 5-HT levels (a) in first-degree relatives are intermediate between autistic and control levels and (b), compared to unaffected controls, are higher in autistic individuals from simplex families (i.e. with only one affected child) and the highest in patients from multiplex families (i.e. families with two or more autistic children) (Piven et al. 1991).

An excess of urinary solutes has been described in 10–60 % of autistic individuals, depending on ethnicity (Reichelt et al. 1997; Yap et al. 2010). These solutes were initially designated as ‘oligopeptides’ (Reichelt et al. 1997), but many are not peptidic and the existence of casein-derived oligopeptides with opioid activity has not been confirmed (Hunter et al. 2003; Dettmer et al. 2007; Cass et al. 2008). Urinary solutes instead represent a chemically heterogeneous set of small molecules, seemingly able to produce diagnostically useful metabolomic patterns (Emond et al. 2013). Some of these compounds originate from gut bacteria and subsequent hepatic metabolism, such as p-cresol and p-cresylsulphate, respectively (Yap et al. 2010; Altieri et al. 2011). Collectively this excess of urinary solutes has been shown to also represent an endophenotype, both associated with autism and familial (Sacco et al. 2010). Whether and to what extent familiality also applies to specific metabolomic patterns awaits further investigation.

Morphological biomarkers/endophenotypes

Head circumference measures above the 97th percentile have been consistently found in 18 % of autistic children recruited in all 23 studies published to date on this biomarker (Sacco et al. 2007; R. Sacco and A.M. Persico, submitted for publication). Despite great interindividual differences, on average head growth in autistic children follows a peculiar developmental trajectory: (a) It is within normal limits or slightly below average at birth; (b) it starts accelerating during the first year of life, peaking sometime between 6 months and 4 years of age and (c) it then decelerates so that at puberty head size typically does not significantly differ between autistic individuals and controls (Courchesne et al. 2007). Cranial development is paralleled by the overgrowth of the frontal and temporal lobes, as documented in post-mortem and brain imaging studies (Courchesne et al. 2007). Macrocephaly is highly familial, since 45 % of macrocephalic autistic patients have at least one macrocephalic parent (Miles et al. 2000; Sacco et al. 2010). In most autistic children, macrocephaly is part of a broader macrosomic phenotype characterized also by excessive height and weight (Sacco et al. 2007; Chawarska et al. 2011). Macrosomy in autism is interestingly associated with the presence of allergies or autoimmune disorders in the patient and in his/her first-degree relatives, as well as with obstetric complications during pregnancy (Sacco et al. 2007). Enlarged head size thus appears as part of a systemic overgrowth, mechanistically linked with abnormal functioning of the CNS and immune system. The mTOR pathway represents the most likely candidate both linked with autism and designated to mediate at the intracellular level genetically based or immune-produced overstimulation of brain and systemic growth (Ma and Blenis 2009; Crino 2011).

Minor dysmorphic features represent another morphological biomarker of interest in ASD. An abnormal cephalic index and palate dysmorphology represent the most frequent minor physical anomalies in ASD individuals (Tripi et al. 2008). Atypical facial asymmetry, especially prominent in right supraorbital and in anterior periorbital regions, was detected in 72 ASD children, contrasted with 128 first-degree relatives and 254 controls using dense surface-modelling techniques (Hammond et al. 2008). Also mothers of ASD children display a vertical asymmetry, especially visible in orbital regions. Conceivably, the same genetic factors can influence both facial morphology and brain development (Hammond et al. 2008). Alternatively, since during embryonic development the skull assumes the underlying shape of the brain, these orbitofrontal asymmetries could represent further evidence of abnormal frontal lobe development, in accordance with neuroimaging studies (Courchesne et al. 2007).

Hormonal biomarkers/endophenotypes

Physiologically oxytocin (OT) plays a major role in the establishment of affiliative bonds (Young et al. 1998; Feldman 2012). Reductions in mean plasma OT levels are particularly prominent in a subgroup of autistic children (Modahl et al. 1998). Interestingly, OT in autism is negatively correlated with 5-HT blood levels (Hammock et al. 2012). Polymorphisms of the OXTR gene, encoding for the OT receptor, are associated not only with autism but also with pair bonding, social behaviour, emotional affect and autism spectrum traits continuously distributed in the general population (Lucht et al. 2009; Walum et al. 2012). In general, the major limitation of these studies is the uncertain correlation between CNS and plasma OT levels, as well as the lack of studies documenting familiality. Nonetheless the neurobiological relevance of the OT system in human social cognition remains unquestionable. Not surprisingly, initial randomized clinical trials of intranasal oxytocin are yielding promising results both on autism core symptoms (Anagnostou et al. 2012; Tachibana et al. 2013) and on parent–child play interactions (Weisman et al. 2012; Naber et al. 2013), when administered to ASD children and to their parents, respectively.

Melatonin (MT) play a well-known role in circadian and seasonal rhythms, in the modulation of immune responses and in neuronal plasticity. MT is synthesized from 5-HT, which is transformed into N-acetylserotonin and then into MT, the latter step through the action of the enzyme acetylserotonin methyltransferase (ASMT). This process is inhibited by light and stimulated by darkness. Plasma levels of MT are abnormally low in many autistic children, seemingly due to a deficit in ASMT activity (Melke et al. 2008) and circadian rhythmicity in MT synthesis and release is altered (Rossignol and Frye 2011; Tordjman et al. 2012). ASMT gene variants are associated with autism and possibly with the absence of physiological nocturnal increases in MT plasma levels (Melke et al. 2008). The same gene also hosts disruptive coding mutations in six of 398 (1.51 %) ASD individuals, compared to none of 437 controls (Wang et al. 2013). Blunted MT plasma levels are especially interesting for their potential, yet unproven link with the disrupted sleep-wake cycle frequently seen in many autistic children, especially during their early infancy.

Immunological biomarkers/endophenotypes

Many autistic individuals display immune abnormalities (Ashwood et al. 2006), including elevated IL-1, IFN-γ and TNF-α levels in the plasma and/or cerebrospinal fluid, increased production of the anti-inflammatory cytokine IL-10 and abnormal post-thymic maturation of T lymphocytes with increased ‘naïve’ and decreased differentiated (i.e. CD4+ and CD8+) T cell counts. Similar abnormalities, albeit less prominent, are also present among unaffected first-degree relatives of ASD patients (Saresella et al. 2009), demonstrating their familiality and a likely genetic basis. Conceivably, autistic individuals may either be more prone to neuroinflammation, or probably less protected than their non-autistic first-degree relatives, who may possess more efficient anti-inflammatory mechanisms. The consistent subgroup of ASD patients characterized by dysfunctional immunity seemingly shares some distinguishable features even at the clinical level (see the ‘ICS’ patient cluster, enriched in ‘immune, circadian and sensory abnormalities’, as described in Sacco et al. 2012). Finally, approximately 7 % of mothers and 21 % of ASD children carry autoantibodies directed against a variety of brain antigens, localized primarily in GABAergic neurons (Croen et al. 2008; Rossi et al. 2011). Preliminary results limited to ASD children positive to 45 and 62 kDa autoantibodies indicate that these children do not belong to the ‘ICS’ patient cluster (Sacco et al. 2012), but either display greater cognitive impairment (45 kDa), or tend to fall into the ‘S’ cluster enriched in motor stereotypes (62 kDa), respectively (I.S. Piras, J. van de Water, A.M. Persico et al., manuscript in preparation). These results converge with previous reports linking stereotypic behaviours, cognitive deficits and language impairment to the presence of these two anti-cerebellum antibodies (Goines et al. 2011; Wills et al. 2011).

Neurophysiological and neuroanatomical biomarkers/endophenotypes

This section shall focus on clinical imaging in ASD, while neuroimaging in animal models is the object of another contribution in this same Special Issue (Petrinovic et al. 2013). In general, despite significant interindividual differences and contrasting results, fMRI findings in ASD tend to converge upon the following abnormalities (for review, see Dichter 2012): (a) social processing tasks yield hypoactivation of regions involved in the ‘social brain’, such as the fusiform gyrus and the amygdala (Pierce et al. 2001), although this may largely depend upon reduced gaze fixation and lack of familiality with the social stimulus (Dalton et al. 2005; Pierce and Redcay 2008); (b) cognitive control tasks produce aberrant frontostriatal activation, relevant to repetitive behaviours and insistence on sameness (Schmitz et al. 2006; Gomot et al. 2008); (c) verbal language and communication yields reduced left > right lateralization, decreased network synchrony often involving areas that do not typically process language, decreased automaticity of language processing and greater neurofunctional deficits for speech than songs (Kleinhans et al. 2008; Tesink et al. 2009); (d) social and nonsocial rewards are associated with anomalous mesolimbic responses involving the anterior cingulate cortex, nucleus accumbens, amygdala and ventromedial prefrontal cortex (Schmitz et al. 2008; Dichter et al. 2012) and (e) long-range functional hypoconnectivity and short-range hyper-connectivity demonstrable in most (Kana et al. 2007), though not all tasks (Shih et al. 2010). Diffusion tensor imaging studies generally highlight initially increased and later decreased fractional anisotropy, supporting age- and region-specific delayed and abnormal myelination patterns (Wolff et al. 2012).

Some studies have explored these features both in ASD patients and in their unaffected siblings, providing strong evidence for abnormal connectivity in the former and for compensatory mechanisms in the latter. Hypoactivation of regions involved in the social brain is particularly well-exemplified by the reduced response to biological motion of a neural network encompassing the left ventrolateral prefrontal cortex, the right amygdala, the right posterosuperior temporal sulcus (pSTS), the ventromedial prefrontal cortex and the fusiform gyrus bilaterally (Kaiser et al. 2010). Here a ‘state-dependent’ hypoactivation of the pSTS is specific of autistic individuals and correlates with the severity of their social deficits, whereas ASD patients and first-degree relatives share a ‘trait-dependent’ hypoactivation of the fusiform gyrus bilaterally, the left dorsolateral prefrontal cortex and the right inferior temporal gyrus (Kaiser et al. 2010). An over-activation of the right pSTS and the ventromedial prefrontal cortex present only among unaffected siblings, again suggests the existence of compensatory mechanisms able to efficiently counteract the increased liability to develop an ASD shared by autistic and non-autistic family members (Kaiser et al. 2010).

On the other hand, the existence of long-range hypoconnectivity in ASD patients is well-exemplified by an abnormally delayed and long-lasting activation of the prefrontal cortex during a non-social visual attention task (Belmonte et al. 2010). Importantly, unaffected brothers display an atypically enhanced activation of the prefrontal cortex in the presence of intact functional connectivity (Belmonte et al. 2010). This enhanced activation points towards compensatory mechanisms likely involving broader recruitment and alternative routes for information processing in non-autistic first-degree relatives.

Several other electrophysiological and brain imaging parameters can be regarded as ASD biomarkers, while the lack of studies involving first-degree relatives does not allow to address their potential as ASD endophenotypes at this time:

  1. (a)

    Autistic individuals display an atypical activation of the operculum in the inferior frontal gyrus, during the imitation and observation of human actions and emotional expressions (Dapretto et al. 2006). These studies were spurred by early findings from single-cell electrophysiological recordings in monkeys unveiling a set of motor neurons whose firing rate increases regardless of whether the action is performed by the animal itself or is observed in another individual performing the same action (Gallese et al. 1996). ‘Mirror neurons’, whose putative human homologue can be indirectly studied using fMRI, are even able to encode motor acts in accordance with their final goal. This preliminary mapping between self-other actions is indeed required to develop empathy, an interpersonal mirroring at the motor, emotional and cognitive levels (Blair 2005), the latter extending into a broader ‘theory of mind’, the ability to understand mental states, intentions, goals and beliefs, irrespective of the emotional state (Baron-Cohen 1995; Leslie et al. 2004). Interestingly, ASD patients do recognize the motor act itself, but seemingly lack an understanding of the goal of the action. It is still debated whether this mirroring deficit is primary or whether it stems from an insufficient feeding of sensory information to the mirror system, especially in the social and affective realms (Enticott et al. 2013).

  2. (b)

    The combination of cortical thickness and cortical surface measures assessed in the entire neocortex using multidimensional MRI-based techniques, paired with measures involving many subcortical regions, is seemingly able to distinguish autistics from controls and from attention deficit hyperactivity disorder (ADHD) patients with up to 90 % sensitivity and 80 % specificity (Ecker et al. 2010, 2012). Whether these neocortical and subcortical differences are genetically based and familial, or the consequence of long-standing pathological functioning, remains to be established.

  3. (c)

    Many of the abnormalities in neural connectivity uncovered using fMRI and summarized above have also been demonstrated using electrophysiological methods, such as auditory, visual, somatosensory evoked potentials and especially event-related potentials. In general, electrophysiological findings provide converging evidence for face-specific recognition memory impairment and for deficits in holistic processing, as reviewed elsewhere (Jeste and Nelson 2009; Gomot and Wicker 2012). In addition, low consistency and poor evoked response reliability, even after non-social visual, auditory and somatosensory stimuli, yield smaller signal-to-noise ratios in all sensory systems and less predictable perceptions (Dinstein et al. 2012). Whether and to what extent all these biomarkers are specific for ASD patients or are also abnormal in non-autistic family members is currently unknown.

Neuropsychological biomarkers/endophenotypes

In addition to theory of mind and empathy, described above, several other neuropsychological constructs display clear deficits in ASD, including joint attention, central coherence, face/emotion processing and executive functions. Joint attention is intimately connected with the development of theory of mind skills, as well as with verbal and nonverbal communication. It involves the coordinated sharing of attention between the patient, another person and an object or event (Bakeman and Adamson 1984). This complex response involves several triadic behaviours including gaze and object following, showing and pointing. Deficits in joint attention are clearly present in individuals with ASD (Bakeman and Adamson 1984). No consistent difference in joint attention has been reported between unaffected siblings and typically developing individuals after age 5, although during early infancy the former do display less joint attention than the latter, suggesting again the existence of compensatory mechanisms among non-autistic family members (Malesa et al. 2012).

Central coherence is a neuropsychological construct describing balanced attention between global patterns and specific details in perception. Autistic individuals tend to display weak central coherence, characterized by difficulties in putting information together to perceive global or gestalt patterns (Happé 1999), and/or even more by a superior ability to capture details (Mottron et al. 2003; Happé and Frith 2006). This excessive attention to details is accompanied in visuospatial tasks by greater activation of visual processing areas, as compared to greater activation in controls of mainly frontal brain regions involved in executive functions and higher perceptual skills (Kumar 2013). Weak central coherence may not significantly contribute to social deficits in ASD, but may represent an independent feature (Happé and Frith 2006), which is also present to an intermediate degree in many parents of ASD children (Briskman et al. 2001; Happé et al. 2001).

Visual scanning of human faces likely represents one of the most promising neuropsychological parameter in autism research and can be reliably explored thanks to the advent of ‘eye tracking’ technologies. Autistic individuals indeed spend significantly more time scanning the mouth and neck regions compared to the eyes, which instead represent the area most targeted by typically developing controls (Klin et al. 2002, 2003; Spezio et al. 2007; Rice et al. 2012). This abnormal face scanning must be, however, considered within the overarching social deficits of the autistic child, who in naturalistic contexts focuses less on human faces altogether (i.e. eyes + mouth) and more on body and object regions (Rice et al. 2012). High-risk infant sibling studies suggest that greater mouth over eyes fixation recorded as early as at 6–9 months is associated not with autism per se but with superior verbal language development at 24–36 months, which in turn reduces autism severity (Young et al. 2009; Elsabbagh et al. 2013). Importantly, parents of autistic children display similar abnormalities in visual face processing when aloof and socially isolated, whereas face scanning strategies are superimposable to those applied by controls when social skills are well-developed (Adolphs et al. 2008).

Executive functions underlie goal-directed behaviour, which requires holding plans on-line until executed, inhibiting irrelevant action, planning a sequence of actions and shifting plan if needed. Deficits in executive functions, particularly spatial working memory, response inhibition, cognitive flexibility and strategic planning, have been recorded in autistic patients and in their unaffected siblings (Delorme et al. 2007, O’Hearn et al. 2008, Corbett et al. 2009). Interestingly, both autism and obsessive–compulsive disorder share many of these cognitive abnormalities (Delorme et al. 2007). The developmental trajectory of deficits in executive functions may vary, due to interindividual variability and function specificity, with some studies reporting age-related improvements (O’Hearn et al. 2008) and others worsening by adolescence (Rosenthal et al. 2013).

Behavioural biomarkers/endophenotypes

Phenotypic measures used in genetic studies to stratify patient samples have sometimes been inappropriately designated as ‘endophenotypes’. Examples include (a) IQ; (b) age at first words, age at first sentence and presence/absence of verbal language and (c) Autism Diagnostic Interview—Revised (ADI-R) scores in social interaction, stereotypic behaviours and restricted patterns of interests adaptation (Bradford et al. 2001; Spence et al. 2006; Liu et al. 2008; Alarcón et al. 2008). While this experimental approach is well-justified and has been quite successful in identifying several autism genes, it is inappropriate to designate observable or measurable behavioural parameters as ‘biomarkers’ and especially as ‘endophenotypes’, particularly when referring to signs or symptoms listed among the diagnostic criteria for autism. On the other hand, hyperdeveloped cognitive abilities (i.e. ‘savant’ skills) represent an independent behavioural phenotype present in a small subset of ASD individuals. The study of a single autistic subject with extraordinary skills in mathematics and art provides an interesting paradigm for the neuropsychology and brain morphometry associated with savant skills (Wallace et al. 2009). The neurocognitive profile of this autistic patient was characterized by exceptional memory, mathematical skills and visuospatial functions, as well as knowledge of calendar structure and weak central coherence. This translates into an extraordinary memory for details and a relative inability to recall essential data in ecological contexts, normal implicit learning and insufficient visual exploratory skills. Brain imaging analysis revealed significantly reduced neocortical thickness in regions involved in social cognition, as well as in the medial and superior prefrontal cortex, whereas the superior parietal lobule, involved in visuospatial and mathematical functions, was significantly thicker. However, even in this case, savant skills would be better designated as a peculiar phenotypic or clinical feature rather than a ‘biomarker’ or ‘endophenotype’.

Towards the discovery of new laboratory biomarker panels

Multiple approaches are being used to discover new biomarker panels for ASD. For example, novel strategies in neuroimaging have been recently discussed (Ecker et al. 2013). Special interest is raised by the identification of molecular markers that could be easily implemented in clinical practice through conventional laboratory medicine, following the routine collection of bodily fluids, such as blood, urine, or saliva. Molecular biomarkers can be searched at different levels: genomic, epigenomic, transcriptomic, proteomic and metabolomics (Fig. 1). In a complex disorder with strong genetic underpinnings, the genetic/genomic level is conceivably closest to its biological origin. Unbiased methods to uncover genetic/genomic markers include array CGH for CNVs (namely microdeletions and microduplications), genome-wide association studies for common variants and whole-exome or whole-genome DNA sequencing using next-generation sequencing (NGS) technologies to identify rare variants. A wealth of hypothesis-driven and unbiased genetic studies has now identified over 100 autism genes (Persico and Napolioni 2013; Vorstman et al. 2013), and sex-specific genetic biomarker panels have recently been proposed to estimate autism risk even in clinical settings (Carayol et al. 2011).

Fig. 1
figure 1

Schematic representation of the identification of a putative 36-item multibiomarker panel, including biomarkers pertaining to the genetic, epigenetic, transcriptomic, proteomic, and metabolomic levels. The combinations of biomarkers carrying maximum predictive power are determined using artificial intelligence networks (see text). Biomarker levels closer to ASD-related abnormal functioning may be predicted to be overrepresented in the final biomarker panel

The epigenetic level is represented by the ‘methylome’, which can be best studied at single-base resolution using bisulphite conversion of genomic DNA followed by NGS, in order to identify methylated and unmethylated cytosine residues (Krueger et al. 2012). Although this approach is still technically challenging, it holds great promise especially in combination with genome-wide expression analysis performed using microarray technologies or RNA sequencing (Lintas et al. 2012). These combined strategies can boost informativeness in a diagnostic setting (Luo et al. 2012) and can be beneficial in driving therapeutic choice, as in the case of the mGluR5 antagonist AFQ056 which seemingly ameliorates behavioural symptoms only in fragile-X patients with full methylation of the FMR1 promoter and no FMR1 mRNA copies (Jacquemont et al. 2011). Within the framework of transcriptomics (i.e. patterns of transcripts typically measured in peripheral and easily accessible cells, such as leukocytes), regulatory microRNAs (miRNAs) represent an especially interesting target for biomarker studies (De Smaele et al. 2010; Sarachana et al. 2010; Mellios and Sur 2012).

The following level of biological complexity is investigated by proteomic studies, typically contrasting protein/peptide patterns and amounts in peripheral tissues or bodily fluids of cases and controls. Proteomics generally uses rapidly developing unbiased techniques based on mass spectrometry (MS), such as matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometer (Pan et al. 2008; Altelaar et al. 2013). Finally, metabolomics defines all small molecules present in complex biological fluids using nuclear magnetic resonance spectroscopy or MS-derived techniques (Nicholson and Lindon 2008). Proteomic and metabolomic studies have already provided initial evidence of a strong potential for biomarker identification in ASD (Schwarz et al. 2011; Yap et al. 2010; Emond et al. 2013).

A multimarker panel encompassing biomarkers derived from different levels of biological analysis (i.e. genetic, epigenetic, gene expression and miRNAs, proteomic, metabolomic) is likely to possess the greatest amount of information content and predictive power, as compared to biomarker panels tapping into single levels of analysis (Mayr et al. 2013). Identifying the most informative combination of biomarkers in large data sets greatly benefits from the use of artificial intelligence networks (ANN) over classical parametric statistics (i.e. principal component and cluster analyses) (Grossi and Buscema 2007; Bradley 2012; Orrù et al. 2012). In fact, the a priori assumptions required by parametric approaches and the near impossibility to compute all the necessary joint probabilities in the presence of a large number of variables hamper the reliability of traditional parametric methods. Instead, ANN-based approaches, such as the autocontractive map (Buscema et al. 2012), ‘spatialize’ the correlation among all variables, ultimately producing a graphic theory representation of the underlying phenomenon whereby all relevant correlations are selected and organized into a coherent picture. Secondly, proteomic and especially metabolomic targets, representing the cellular and systemic levels of biological complexity farthest away from the genome but closest to abnormal function, may likely possess the greatest heuristic potential. In fact, disease-specific proteomic and metabolomic biomarkers may enjoy broader generalizability and greater patient subtyping power: On one hand, they have a greater chance of bypassing some limitations intrinsic to purely genetic biomarkers, such as interethnic differences in linkage disequilibrium and population genetic structure; on the other, they rely upon functional more than structural data. RNA splicing, post-translational modifications and differential protein–protein complex formation indeed create great functional divergence, as well as tissue- and time-dependent specification at the proteomic level, often starting from a single genomic DNA sequence. Similarly, metabolic phenotyping contrasting the global, dynamic metabolic response of affected and unaffected autistic individuals would be predicted to carry maximum levels of informativeness. Hence, molecular multimarker panels for ASD can be predicted to encompass many known biomarkers/endophenotypes described above and several genetic variants, especially of known pathophysiological function, but may likely benefit from larger subsets of proteomic and metabolomic biomarkers (Fig. 1).

The identification of biomarker panels in complex disorders like autism builds upon strong logistic foundations, including broad-based collaborative recruitment of large samples of cases and controls, detailed clinical phenotyping, solid infrastructures for biomaterial collection and storage, updated technologies and reliable laboratory procedures and efficient data management using databases able to support efficiently large data collections. These logistic components will now be reviewed.

Bioresource infrastructures in biomarker discovery

Systematic collections of biological samples, referred to as biobanks or bioresources, linked to phenotypic data are crucial to investigate the biological mechanisms underlying diseases, to discover and validate biomarkers used in clinical diagnostics and to develop treatments. Currently there are two major autism bioresources, both of which are located in the USA. These are the Autism Genetic Resource Exchange (AGRE) (Lajonchere 2010) and the Simons Simplex Collection (SSC) (Fischbach and Lord 2010). AGRE (http://agre.autismspeaks.org) is a DNA repository and family registry, housing a database of genotypic and phenotypic information on 1,264 multiplex pedigrees. The overall aim of AGRE is to identify heritable genetic factors for autism. Diagnoses of individuals within AGRE have all been made using the ADI-R algorithm—a so-called Gold Standard research diagnostic instrument. In addition, extensive behavioural characterization has been conducted by AGRE, and screens for Fragile X mutations and other karyotypic abnormalities as well as genome-wide microsatellite and SNP analyses have been carried out in the majority of AGRE families. The SSC (https://sfari.org/simons-simplex-collection) has established a permanent repository of genetic samples from over 2,000 simplex families (i.e. encompassing one child affected with ASD and his/her parents). The SSC is geared towards identification of rare de novo genetic mutations causing ASD. Each sample has a uniform phenotypic characterization, which includes ADI-R as well as extensive behavioural and neuropsychological assessments. Genome-wide genotyping, including exon sequencing, is also conducted in this sample.

Both bioresources have a focus on genetics, with genomic DNA being the main target of collection. The European Autism Interventions—A Multicentre Study for Developing New Medications (EU-AIMS) project (Murphy and Spooren 2012) offers a unique opportunity to create a European bioresource that will complement these US-based biobanks by allowing identification and validation of non-genetic biomarkers.

While the importance of establishing bioresources is widely recognized, their development still presents many challenges of scientific, organizational and financial nature. The first challenge is the selection of biological materials that will enable to answer the scientific questions of the project taking in consideration also costs and feasibility. In addition to DNA extraction, the EU-AIMS bioresource will allow extraction of RNA, proteins and analytes for discovery and validation of non-genetic biomarkers (Fig. 2). Saliva will be collected from participants up to 3 years old for DNA extraction; collection of saliva is non-invasive and more suitable for participants of such a young age. For the other participants and for their family members, the EU-AIMS bioresource aims to collect a subset of the UK Biobank sample set including whole blood and urine (UK Biobank 2007). UK Biobank is a large prospective study in UK that collects biological samples from a cohort of healthy volunteers recruited by the NHS. They collect whole blood for DNA and RNA extraction, fractions of whole blood such as serum, plasma, buffy coat and red cells, immortalized peripheral blood lymphocytes and urine. They developed a standardised protocol and standard operating procedures (SOPs) for collecting, processing and storing samples in order to ensure high quality specimens that are regarded as gold standard for biobanking (Peakman and Elliott 2008; Elliott and Peakman 2008).

Fig. 2
figure 2

Schematic representation of the standard operating procedures for biomaterial collection, as implemented by the EU-AIMS consortium

The EU-AIMS project will focus efforts in the collection of whole blood for DNA and RNA extraction, as well as for plasma separation (Fig. 2). While collection of blood is a more invasive method compared to saliva collection, it will allow isolation of a wider range of biomolecules, to allow interrogation not only of the genome and epigenome but also of the proteome and metabolome. Tubes will be prioritised so that in the event of impossibility to complete the collection of the material, the most important samples, whole blood for DNA and whole blood for serum separation, will be collected.

At present, two clinical studies within the EU-AIMS projects, the high-risk infant sibling study and the accelerated longitudinal study, are planning to collect biological samples from ASD patients, family members and population controls. Both studies have different recruiting centres within Europe (UK, the Netherlands, Sweden and Italy). Thus, in order to ensure comparability between samples collected and processed in different laboratories, standardisation is crucial. A common set of SOPs, applied to each single stage of the collection, processing transportation and storage of biological samples, has been developed (see Supplementary Material for a copy of the SOPs). Each recruitment centre will identify a sample acquisition site equipped for blood withdrawal and sample processing, and staff will be trained to conduct sample acquisition and initial processing according to the common SOPs. In order to establish a protocol that is efficient and sustainable across the different collection sites, a pilot study will be run to test feasibility of the SOPs and to establish the quality of the samples following our ascertainment procedure.

Data management for biomarker discovery

Data sharing

Biomarker discovery is becoming increasingly collaborative—projects attempt to work towards connecting different types of biomarkers, rather than treating them in isolation (Gustaw-Rothenberg et al. 2010; Shtilbans and Henchcliffe 2012). This necessitates wider collaborations in order to either recruit very large samples, or in other instances to collect very large amounts of data from small-to-medium-sized samples. In either case, biomedical research is becoming more data-driven. Therefore, increased attention is given to data standards and to systems that enable data-rich collaborative projects, as well as to data reuse (Poline et al. 2012). Here we will consider three main relevant types of systems that enable data sharing in biomarker discovery: imaging data management, systems for psychological data entry and management and general purpose biomolecular data management systems. Our experience with data management for a variety of biomedical projects indicates that it is not possible to define a single system setup that would equally well serve different types of studies. Our intention here is to provide a list of pointers that will be useful when setting up a data management solution for a collaborative project. However, first we will look at data standardisation.

Standards

When creating a data management solution for a particular project, it is necessary to consider relevant data standards. This will be particularly important if connection to pre-existing software components for, e.g. data analysis or visualization is envisaged, or there are intentions to share generated and processed data sets with a wider community by submitting data to existing public repositories.

Digital Imaging and Communications in Medicine is a standard that describes medical imaging data manipulations, including a file format and a network communication protocol (Mildenberger et al. 2002). This is a comprehensive standard with almost 30 years history, developed and supported by imaging device manufacturers.

Health Level Seven (HL7) refers to a non-profit organization for the development of healthcare informatics standards; this acronym is also typically used when referring to standards developed by this organization (Dolin et al. 2006). These standards are widely employed in information systems that support clinical practice and are especially suitable for interoperability in the area of billing and insurance, but are not well suited for scientific information exchange. One of the attempts at producing a standard that would bridge HL7 and health informatics systems on one hand and the needs of the biomedical research community on the other hand is the Biomedical Research Integrated Domain Group (BRIDG) Model that has been mapped to HL7 Reference Information Model and covers protocol-driven research and associated artefacts (Fridsma et al. 2008). BRIDG model is a comprehensive initiative; therefore, it is also rather complex and yet to be supported by the wider community and software.

A more lightweight proposed standard is XML-based Clinical and Experimental Data Exchange, the XML schema developed with an aim to standardise capturing of metadata hierarchy as generated by scientific studies (Gadde et al. 2012). This effort provides generic solutions, but is better supported in the neuroimaging community.

One of the more recent trends in biomarker discovery is genotyping and using other kinds of high throughput techniques to obtain data at the biomolecular level. The standardisation work in the community of molecular biology has historically been more bottom-up, with grass root efforts covering smaller domains (Ball et al. 2002). This has led to practical solutions, but there is more fragmentation present. Three main types of standards can be distinguished:

  • ‘Minimal information’ community recommendations that define what metadata should accompany data sets for them to be understandable and useful in further research (Brazma et al. 2001; Deutsch et al. 2008)

  • Data exchange standards, both XML-based (Spellman et al. 2002; Hermjakob et al. 2004) and tab-delimited (Rayner et al. 2006; Sansone et al. 2008)

  • Ontologies (Smith et al. 2007; Malone et al. 2010)

All these standardisation components should work together in creating a usable data exchange solution. See Brazma et al. (2006) for more information on aspects of standards design and use in systems biology.

Software

Some of the better known open source informatics solutions for managing imaging data include eXtensible Neuroimaging Archive Toolkit (Marcus et al. 2007) and Human Clinical Imaging Database (Ozyurt et al. 2010); here we will not consider commercial solutions, due to lack of transparency. Some of the features to consider when choosing a data managing solution include:

  • Modes of data entry: online data entry forms, XML, programmatic access

  • Data management workflow: support for quality control procedures, change tracking

  • Modes of accessing data: query capabilities, e.g. federation across deployments, reports, online image viewing, programmatic access

  • Ease of administration: possibility to incorporate new data types without programming support

  • Richness of metadata capture: protocols, task parameters, demographic and clinical assessment information

Different study types require different approaches to psychological data entry and management systems. There are no widely used generic tools that enable this. In the field of autism research, the Internet System for Assessing Autistic Children (ISAAC) system is a web-based application for administering health research projects (Hollander et al. 2004). It offers security, data sharing, multi-site capability, flexible data access as well as a list of pre-existing assessment forms. ISAAC is a system to be used by a trained professional. Software systems exist that enable cognitive tasks and assessments to be performed at home, e.g. http://www.delosis.com/psytools/overview.html.

Looking from the perspective of general purpose biomolecular data management systems, there are many Laboratory Information Management System (LIMS) implementations, such as HalX (Prilusky et al. 2005), ms-lims (Helsens et al. 2010), Screensaver (Tolopko et al. 2010) as well as commercial solutions. LIMS is typically used in a single laboratory setting, serving as (an) electronic lab book(s). Emphasis is on the ability to track materials, equipment and data; automated interfacing with laboratory equipment is very important, as well as the ability to define and execute workflows.

In joint projects, the notion of collaboration data management tools is becoming more important (Krestyaninova and Tammisto 2012). These tools do not emphasize the tracking aspects of material and data flow, at least not to the level of LIMS, but instead concentrate on providing functionality important for data sharing and interpretation in multi-site collaborative projects: rich and flexible metadata descriptions, management of user access rights and ability to submit data to public repositories once a project is completed and a manuscript submitted. Some of the open source solutions are as follows:

  • SIMBIOMS manages high-throughput assay results and associated metadata; it is easy to configure the metadata template for a new project/technology (Krestyaninova et al. 2009).

  • ISA metadata tracking tools facilitate standards compliant collection, curation, local management and reuse of datasets and include, among other components, a stand-alone client program for gathering and formatting data (Rocca-Serra et al. 2010).

  • Biology-Related Information Storage Kit, in addition to basic data management functionality, also offers data analysis functionality support (Tan et al. 2011).

  • LabKey Server also offers built-in analysis and visualization support via a built-in R environment, as well as some LIMS-like functionality like web-based sample requests (Nelson et al. 2011).

Another approach to data management in collaborative projects is more ad hoc, utilizing general purpose collaboration tools such as Google Drive. Maguire et al. (2013) built a widget that can be used in a Google spreadsheet for accessing ontologies, therefore facilitating uniform metadata descriptions. There are attempts to utilize peer-to-peer networking for the exchange of large data files, e.g. Biotorrents (Langille and Eisen 2010).

Biomarkers for personalized psychopharmacology

Along with earlier and more reliable diagnoses, an additional asset of biomarker use in the ASD clinics could consist in the identification of patients amenable to personalized pharmacological treatment through a number of novel drug therapies currently under clinical trial. Conceptually, biomarkers would initially characterize each autistic patient in terms of the gene or protein networks most implicated specifically in his/her own neurodevelopmental pathology. In recent years, investigators have begun identifying the gene networks encoding proteins functionally linked into biochemical pathways most relevant to autism (van der Zwaag et al. 2009; Anney et al. 2011; Gilman et al. 2011; Voineagu et al. 2011; Ben-David and Shifman 2012; Kou et al. 2012; Lee et al. 2012; Skafidas et al. 2012; Noh et al. 2013; Poelmans et al. 2013). These bioinformatic analyses defining the interactome from large sets of genome-wide genetic or transcriptomic data indeed point towards a relatively limited number of signalling networks contributing to ASD, each to a different extent in different patients. Hence, biomarker analyses unveiling which functional networks are most involved in each single case could then pave the path to individualized pharmacological therapies. Novel pharmacological treatments targeted to the core symptoms of autism and currently under investigation are reviewed in detail in another contribution to this Special Issue (see Table 1 in Vorstman et al. 2013). Works on syndromic forms of autism, such as fragile-X syndrome, Rett syndrome, tuberous sclerosis, neurofibromatosis and treatment-resistant epilepsy, have played a pivotal role in the identification of these therapeutic approaches, including mGLUR5 antagonists, GABA-B agonists, IGF-1, mTOR inhibitors and diuretics antagonizing chloride import, among others (Vorstman et al. 2013). Though still speculative, this approach holds great promise to move the field of neurodevelopmental disorders from current ‘non-specific’ psychopharmacology to personalized drug therapy.

Conclusions

To this date, many autism biomarkers have been proposed (Ecker et al. 2010; Wang et al. 2011; Veenstra-VanderWeele and Blakely 2012), but scientific, ethical, clinical and practical issues still pose a major challenge to their use in clinical practice (Walsh et al. 2011). The two main limitations at this time are: (a) the lack of cross-talk between endophenotyping modalities. To this date, relatively few studies address the degree of correlation between different biomarkers (see Hammock et al. 2012), assess multiple biomarkers or endophenotypes in parallel (see Sacco et al. 2010) and characterize the same patient sample using multiple unbiased -omics methods (Luo et al. 2012); (b) the vast majority of biomarker studies contrasts ASD patients and controls, leaving unexplored familiality and disease specificity. Future biomarker studies should also include first-degree relatives and children with other neurodevelopmental disorders, such as speech or learning disabilities, ADHD and cognitive impairment.

Despite current limitations, the heuristic potential of biomarker research in autism is enormous. Preventive strategies targeting cardiovascular disease have been dramatically improved by the ground-breaking use of plasma lipoprotein and cholesterol levels as biomarkers for disease risk. Autism is a similarly complex and heterogeneous disorder, characterized by constellations of signs and symptoms displaying variable developmental trajectories and response to treatment in different patients. Despite the large number of genetic and environmental factors likely underlying autism, these factors can be predicted to converge upon a relatively limited number of intracellular biochemical pathways and neurodevelopmental mechanisms which, once tagged and identified using appropriate biomarker panels, may be corrected administering personalized molecular therapies. Even currently available behavioural interventions, if applied prior to and not following the appearance of behavioural abnormalities, could conceivably minimize their severity or even result in prevention of a full-blown autistic disorder, at least in some children (Dawson 2008). Finally, longitudinal studies linking specific biomarker sets to developmental trajectories and treatment response will be critical in translating scientific knowledge into patient management.