Introduction

Autism is a heterogeneous neurodevelopmental disorder that is characterized by a wide spectrum of social and communication difficulties, repetitive and restricted behaviors, and a variety of clinical symptoms (Antshel et al. 2013; Georgiades et al. 2012; Kohane et al. 2012; Lai et al. 2013; Lenroot and Yeung 2013; Nazeer and Ghaziuddin 2012). While the DSM5 defined all individuals with autism as belonging to a single diagnostic category, it has been proposed that classification of autism into more homogenous subtypes will facilitate research, diagnosis, and treatment of the disorder (Dealberto 2013; Gabis and Pomeroy 2014; Kim et al. 2015; Lane et al. 2014). Previous classification schemes of autism have relied mostly on behavioral and cognitive assessments (American Psychiatric Association 2000; Levy et al. 2009; Lord and Jones 2012) or restricted clinical measures (Kong et al. 2013; Lane et al. 2014; Lord and Jones 2012; Ramsey et al. 2013) and have had limited success in creating a reliable basis for targeted treatment (Volkmar et al. 2012). Adding additional genomic and neuroimaging measures, as well as detailed clinical measures from the individual’s medical history may offer critical new data that could be used for the development of potentially more useful classification approaches to autism.

In the last decade, research into the etiologies of autism is shifting from small-scale studies that focus on a limited number of measures in a small sample, to the creation of large databases that combine multiple types of data from much larger samples. Successful examples include the Simons Simplex Collection (Buxbaum et al. 2014; Fischbach and Lord 2010) and Autism Speaks AGRE database (Geschwind et al. 2001), which have incorporated behavioral and genetic data from several thousand participants and have enabled a large number of studies to examine genotype-phenotype relationships. Other examples include the EU-AIMS LEAP project (Loth et al. 2016, 2014), the National Database for Autism Research (NDAR) (Payakachat et al. 2015), and the ABIDE database (Di Martino et al. 2014), which have incorporated behavioral, genetic, and neuroimaging data from hundreds of participants with autism. These and other large databases have enabled the identification of dozens of new susceptibility genes of autism (De Rubeis et al. 2014; Iossifov et al. 2012; Krumm et al. 2015; O’Roak et al. 2011; Sanders et al. 2012) as well as the identification of early neurophysiological (e.g., Courchesne et al. 2011; Wolff et al. 2012) and behavioral (e.g., Chawarska and Shic 2009; Jones and Klin 2013) characteristics associated with autism development.

While these large efforts have enabled tremendous progress, they have also revealed great heterogeneity across individuals with autism by demonstrating that many of the reported findings seem to vary substantially across individuals [e.g., the ongoing debate regarding the existence of abnormally large brains in autism (Dinstein et al. 2017; Zwaigenbaum et al. 2014)]. Furthermore, most of the existing databases and collaborations have incorporated relatively limited pre-diagnostic data (e.g., birth records), and usually do not collect sufficient follow-up data to “connect the etiological dots” from early risk factors through biological mechanisms to precise phenotypic manifestations. There is, therefore, great motivation to construct autism databases containing both prospective and retrospective data that is relevant to diverse disciplines (e.g. genetics, clinical history, behavioral assessments, neuroimaging, etc.) from the same individuals. Such information may enable in-depth characterization of specific autism etiologies and greatly improve our ability to classify autism into more homogeneous sub-types that may benefit from targeted treatments.

To address these issues, we have established a hospital-university-based (HUB) database of autism at the Soroka University Medical Center (SUMC) and Ben-Gurion University (BGU) in the Negev. SUMC is the only hospital in the southern part of Israel, the Negev, which provides outpatient services to more than 75% of the Negev population (with a catchment area of ~700,000 people), including child development and child psychiatry services. The database was launched in January 2015 and already contains a wide variety of unique clinical and behavioral information regarding each participating child and his/her family members. These data include: standardized behavioral and cognitive measures, socio-demographic data, detailed clinical history from electronic patient records (i.e., a wide variety of prenatal, perinatal, and neonatal clinical measures), as well as parental questionnaires regarding sensory sensitivities and sleep quality. We have recently expanded our ongoing data collection to include eye tracking recordings, overnight EEG recordings, and saliva samples for DNA extraction from a considerable subset of the children. The cohort is currently growing at a rate of ~16.5 new families per month. Here, we present initial data from the 188 children who were diagnosed with autism at SUMC during the first 18 months of this initiative. The results demonstrate the remarkable potential of the Negev HUB database to be an extremely valuable resource for various interdisciplinary studies of autism and in particular for comparing autism etiologies across ethnically diverse populations.

Methods

Population

Most of the families in our cohort are members of Clalit health services who live in the Negev, and receive most of their medical services at SUMC (located in Beer-Sheva). This population represents ~75% of the ~700,000 citizens in the Negev (Israel 2016), and is composed of ~60% Jews and ~40% Bedouin Arabs: two ethnic groups that differ in their genetic background and environmental exposures (i.e. different lifestyle and cultural norms). The population of the Negev is relatively static such that many multigenerational families receive medical services at SUMC throughout their entire lives.

Recruitment and Clinical Evaluation

Children who are referred to the Child Development Institute (CDI) or to the Preschool Psychiatric Unit (PPU) at SUMC with a suspicion of autism go through a rigorous clinical assessment that includes a comprehensive intake interview regarding the clinical and socio-demographic background of the diagnosed child, assessment with the Autism Diagnostic Observation Scale-2 (ADOS-2) (Lord et al. 2000) test, and a cognitive evaluation using either the Bayley Scales of Infant and Toddler Development-third edition (Bayley-III) (Bayley 2006) or the Wechsler Preschool and Primary Scale of Intelligence—version three (WPPSI-III) (Wechsler 1989) (Fig. 1). Diagnosis of autism is determined by a child psychiatrist or a pediatric neurologist according to DSM5 criteria (American Psychiatric Association 2013). Children with a positive diagnosis of autism are invited to the clinics for follow-up visits every 6–12 months until the age six. During these visits, their diagnosis is re-evaluated and families are invited to participate in additional ongoing experiments as detailed below.

Fig. 1
figure 1

A flowchart of the evaluation of children and data collection processes in our cohort. Recruitment of families to the study is done either during the initial diagnosis, or during the follow-up visits that are also used to collect additional data. Currently collected data are presented in solid frames. Planned data collection is presented in dashed frames

Parents or the legal guardian of all referred children who agree to participate in additional types of data collection, which are not part of the basic clinical evaluation, are asked to sign a consent form that allows the research team to freely contact the family for additional data collection. This study was approved by the ethic committee (i.e. IRB) responsible for human studies at SUMC.

Data Collection and Storage

Details of the data that is currently available in the HUB autism database and the data that we plan to collect in the near future are listed in Supplementary Table S1. Specifically, existing data from all children who were referred to SUMC with a suspicion of autism include sociodemographic data and clinical history from the intake questionnaire, ADOS-2 scores, cognitive test scores, and clinical assessment summary using DSM5 criteria. In addition, clinical history from the SUMC patient records is available for the vast majority of these children and their family members. For example, we have retrieved a range of potential prenatal and perinatal risk factors directly from the electronic database of the Obstetrics and Gynecology Department (OGD) at SUMC (the only operative maternity center in the Negev). This includes information about maternal characteristics, pregnancy and perinatal outcomes (e.g., gestational age, birth weight and Apgar scores), and peripartum maternal outcomes (e.g., hypertensive disorders, hypoxic-related complications, and gestational diabetes). The OGD database is used regularly for research purposes, and its accuracy is ensured through a standardized review of the data by a specialist medical secretary and a consulting obstetrician before it is coded (Amir et al. 2009).

Families who agree to participate in additional studies (over 70% of the entire sample) are asked to complete several questionnaires including the children’s sleep habits questionnaire (CSHQ, Owens et al. 2000) and the children’s sensory profile 2 questionnaire (Dunn and Westman 1997). Additional data collection includes: (1) genetic studies where we extract DNA from saliva samples of the affected child and their parents using Oragene-DNA kits (OG-575 or OG-500 for saliva samples). Aliquots of the extracted DNA are currently stored at −20 °C and kept for future genetic sequencing. (2) An eye-tracking study where children are asked to watch movies in order to assess their ocular-motor control and social preferences. (3) A sleep-EEG and polysomnography study which is being performed at the SUMC sleep lab to better understand sleep architecture and early brain function in autism. We are currently expanding these data collection efforts so as to create a large overlap of measures across as many children as possible. All of these studies have also been approved by the SUMC ethic committee.

All of the data collected from participants in this cohort are organized and stored in a designated secured computerized database that has been certified by the Israeli Ministry of Justice according to the privacy protection law of Israel (Ministry of Justice 1981).

Statistical Analysis

To demonstrate the potential utility of the collected data, we compared selected parameters between Jewish and Bedouin children using independent t test or Mann–Whitney test for continuous variables, and Chi square or Fisher-exact tests for nominal variables. We also used a 1-degree of freedom Chi square test for linearity to test for differences in trend of ordinal variables.

Results

During the first 18 months of the study (January 2015 to June 2016) 296 children [218 Jewish, 76 Bedouins, and 2 of mixed origin (Bedouin father and Jewish mother)] were referred to SUMC with a suspicion of autism (Fig. 2). Autism diagnosis was confirmed in 188 of these children (63.5%), and parents of 133 of these children (70.4%) agreed to participate in some or all of the additional data collection efforts described in the “Methods”. Children diagnosed with autism included 154 males and 34 females (a male-to-female ratio of 4.5) who were 38.88 ± 15.82 months (range 16–98 months) at the time of diagnosis (Table 1).

Fig. 2
figure 2

Growth of the Negev HUB database over an 18 month period. Families were referred to SUMC at an average rate of ~16.5 per month (range 8–28 participants/month; gray bars). This resulted in a total of 296 participants who were evaluated during the first 18 months of our study (continuous black line). Of these, 188 children were diagnosed with autism (dashed black line), and 133 of these signed the informed consent. Extrapolation of these data suggests that within 5 years of the study, we expect at least 444 children with autism and informed consent in our database

Table 1 Sample characteristics of children with autism at the Negev HUB database

To demonstrate the potential utility of the HUB database for comparing autism etiologies across different ethnic groups, we compared several initial characteristics across the Bedouin and Jewish families (Table 1). Differences across the two ethnic groups were apparent in the rate of positive autism diagnoses, which was significantly higher among Jewish children than among Bedouin children (68.3 vs. 51.3%; p = 0.0077). In addition, average maternal age at birth of Jewish mothers was 5 years older than that of Bedouin mothers (31.25 vs. 26.02; p < 0.0001), as well as 2 years older than the average age of all Jewish mothers at birth at SUMC in the last decade (31.25 vs. 29.22; p < 0.0001). In contrast, Bedouin mothers of children with autism were slightly younger than the average age of all Bedouin mothers at birth at SUMC in the last decade, but this difference was not statistically significant (26.02 vs. 27.15; p = 0.367).

There were significant differences in ADOS module utilization in the diagnosis of Bedouin and Jewish children (p < 0.0001). Specifically, there was a higher tendency to utilize the 1st module of the ADOS-2 (used with non-verbal children) over modules 2 or 3 among Bedouin children in comparison to Jewish children. Similarly, cognitive function was more commonly assessed with the BAYLYS test in Bedouin children as opposed to the WPPSI test, when compared with the Jewish children (p = 0.009). These differences suggest a higher proportion of non-verbal Bedouin children at the time of diagnosis. In addition, Bedouin children exhibited significantly lower cognitive test scores compared to Jewish children (p = 0.013), despite no significant differences in the levels of autism severity and developmental milestones as reported by the parents (i.e. age at first crawl/walk/talk) between the two ethnic groups (see Table 1).

Importantly, there were no significant differences in recruitment rates across the Jewish and Bedouins families, suggesting that future more in-depth comparisons across ethnic groups will contain similar relative sample sizes.

Discussion

The Negev HUB database contains a wide variety of socio-demographic, behavioral, and clinical measures from a cohort of children with autism who were diagnosed and prospectively followed at SUMC. An important aspect of this cohort is its unique ethnic composition that includes Jewish and Bedouin families, two populations that are known to differ in a number of clinical and demographic characteristics (Bilenko et al. 2014; Gotsman et al. 2015; Kridin et al. 2016; Lazarev et al. 2014; Leshem et al. 2015; Shental et al. 2010; Shimony et al. 2009; Smirnov et al. 2016; Treister-Goltzman et al. 2015). Indeed, we observed significant differences in maternal age at birth, language capabilities at time of diagnosis, and cognitive abilities at time of diagnosis in our preliminary analysis. These differences may stem from disparities in awareness, and access to referral clinics, or could be a reflection of ethnic differences in risk factors and etiologies of autism. We plan to investigate the reasons for these ethnic differences in a number of studies that will utilize data that is already available in the HUB database and additional types of data that will be collected from these families in the future. The HUB database, therefore, embodies a unique opportunity for developing our understanding of how autism risk factors and etiologies differ across ethnic groups with different genetic backgrounds and environmental exposures. This approach is of paramount importance for understanding differences in behavioral and clinical characteristics found across different ethnic groups, not only in Israel, but also in many other sites (e.g. CDC 2014).

An important strength of the HUB database is that ~90% of the children (all of the Bedouin and 88% of Jewish children) were also born at SUMC and continue to receive most of their clinical services at the same hospital. The availability of the clinical records of the children and their immediate family members allows us to examine multiple pre-and-post diagnostic types of data ranging from existing MRI scans and EEG exams to blood tests and clinical history that could be used for studies of risk factors for autism or as biomarkers for specific autism subtypes. A notable example of this is the computerized database of OGD at SUMC that contains data on all newborn infants. This database has already been used successfully to identify pre-and-perinatal risk factors for a wide variety of conditions (e.g. Kessous et al. 2013a, b; Lebel et al. 2012; Pariente et al. 2013; Ratzon et al. 2011) and embodies an excellent resource for studying the effect of these factors on autism susceptibility. Data from other units of SUMC such as the Division of Pediatrics and the Department of Neurology will be used to search for clinical conditions that are enriched among children with autism and their immediate family members.

Another notable characteristic of our cohort is the high rate of consanguineous families, especially among the Bedouin population. Inter-family marriages is extremely common (~40%) among the Bedouin population in southern Israel (Na’amnih et al. 2014). This high consanguinity rate facilitates identification of rare genetic disorders within this population (e.g. Harel et al. 2004; Landau et al. 2003; Shatzky et al. 2000). Hence, it could be an advantage in the exploration for the genetic causes of autism in our database.

About one third of the participants in the HUB database, who were initially referred with a suspicion of autism, do not receive a final diagnosis of autism (see Fig. 2). Instead, most of these children received a diagnosis of other developmental problems (mostly language delay). These children embody an important comparison group for identifying etiologies that are specific to autism versus etiologies that are common across multiple developmental disorders. In the future we plan to expand the HUB database to include follow up assessments of the non-ASD children to determine how their development differs from that of the children who are diagnosed with autism.

Examining existing clinical data at SUMC in a retrospective manner is possible without attaining parental consent. However, in order to maximize our ability to capture more comprehensive information from each family (e.g., parental questionnaires, genetic samples, eye tracking data, and EEG recordings) we have integrated our research team into the clinic, where they meet the families already during their first diagnostic visit. Thus, collection of these additional forms of data is carried throughout the diagnosis process and families are not requested to make a special effort to participate in the research. This situation facilitates parental recruitment (currently >70%), which ensures a fair representation of the local Negev population. Our research team also collects longitudinal data in a prospective manner from participating families via the follow-up clinical assessments that are scheduled for every 6–12 months. Behavioral and clinical measures are, therefore, acquired from the children at the earliest time-point possible (during their initial diagnosis), and as part of their regular follow-up visits to the clinic. This will enable us to follow the progress of these children across several developmental time points and assess the effects of different interventions and environmental factors on their development. We acknowledge however, that some of the participating families may move in the future outside of the Negev region and will start using other clinical facilities. In these cases our ability to follow up the progress of these children will be lost.

We envision that this ongoing project will continue to accumulate longitudinal data over a long period of time. Extrapolation of the current autism diagnosis rate to a period of 5-years suggests that by December 2019, the database will include data from over 620 families of children with autism. We expect that the actual number of families will be larger for the following reasons: (1) The birth rate in this region is continuously growing (Israel 2006–2016). (2) The incidence of autism is continuously rising (Raz et al. 2014). (3) We anticipate more diagnoses of Bedouin children (who currently account for >50% of the births at SUMC) as awareness for the disorder improves and services become more readily available. We have previously collaborated with the Israeli Ministry of Health to promote autism awareness in the Bedouin community. This effort has successfully increased the proportion of Bedouin children among the children who are diagnosed with autism at SUMC from nearly 0 to ~20% over a period of 6 years. We anticipate that this rate will continue to rise in the next years such that the number of Bedouin children diagnosed with autism will validity reflect their prevalence in the population.

The Negev HUB database joins other examples of relatively large autism research databases that have been built in the last several years (e.g. Croen et al. 2012; Fischbach and Lord 2010; Payakachat et al. 2015; Siegel et al. 2015; Stoltenberg et al. 2010). While, at present, it has a smaller sample size and lower number of variables than other databases, its unique ethnic composition and the availability of comprehensive retrospective and prospective longitudinal data makes it a unique and valuable resource for studying autism heterogeneity and identifying specific autism etiologies that will likely benefit from distinct interventions. We strongly believe that the existence of such databases in multiple international sites containing populations with different genetic backgrounds and environmental exposures will be crucial for advancing autism research in the near future.