Introduction

Images of the human brain, in form and function, seem to be everywhere these days—on television, in glossy magazines, and on internet blogs worldwide. This is due, in many respects, to the incredible amount of information these images present and the sheer number of brain imaging research studies being performed to spy on the brain in action or at rest, to examine how it is built and wired, and what happens when things go wrong. Indeed, neuroimagers routinely collect more study data in a few days than was collected in over an entire year just a decade ago. These data are a rich source of information on detailed brain anatomy, the subtle variations in brain activity in response to cognitive stimuli, and complex patterns of inter-regional communication. Taken individually, these various data types would have once formed the basis for entire research programs. Now, with interests not only in multi-modal neuroimaging but the inclusion of co-occurring biological and clinical variable collection requiring linkage between geographically distributed researchers, neuroscience programs are rapidly becoming the brain-focused versions of projects more akin to those involving particle physics. The methods by which these data are obtained are themselves contributing to this growth, involving finer spatial and temporal resolution as MR physicists push the limits of what is possible and as brain scientists then rush to meet those limits. It is safe to say that human neuroimaging is now, officially, a “big data” science.

Such examples of large-data, their promise and challenges, have not gone unnoticed. In the US, The National Science Foundation, the National Institutes of Health, the Defense Department, the Energy Department, the Homeland Security Department as well as the U.S. Geological Survey have all made commitments toward “big data” programs. The Obama Administration itself has even gotten in on the act. In response to recommendations from the President’s Council of Advisors on Science and Technology, the White House sponsored a meeting bringing together a cross-agency committee to lay out specific actions agencies should take to coordinate and expand the government’s investment in “big data”, totaling $200 million in support (see http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final.pdf). Among the examples of “big data” featured at the meeting was—no surprise—human neuroimaging. Additionally, the recent anouncement of the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative (http://nih.gov/science/brain/index.htm) forms part of a new Presidential focus aimed at revolutionizing understanding of the human brain. Initiatives surrounding large-scale brain mapping are also underway in Europe (http://www.humanbrainproject.eu/; Frisoni 2010) and examples of large-scale brain data sets have been on full display at recent annual meetings of the Organization for Human Brain Mapping (OHBM; http://www.humanbrainmapping.org) in Beijing, China in 2012 and Seattle, Washington in June 2013.

However, as the richness of brain data sets continues to grow and the push to place it in accessible repositories mounts, there are many issues to be considered on how to handle the data, move it from place to place, and how to store it, analyze it, and share it.

How big is “Big”?

While size is a relative term when it comes to data, medical imaging applied to the brain comes in a variety of forms, each generating differing types and amounts of information about neural structure and/or function. Moreoever, in vivo neuroimaging is not unimodal but, rather, remarkably diverse—examining brain form, function, and connectivity, and rapidly improving its ability to resolve finer spatio-temporal scales. Further advancements in magnetic resonance scanner field and gradient strength are an ongoing area of research (Barry et al. 2011). A brief look at the history of some central parameters for specific forms of functional and structural neuroimaging is illustrative.

Since its inception, the collection of blood oxygenation level dependent (BOLD) functional time series has often been among the data types having the biggest storage footprint. While in the 1990’s, the earliest BOLD imaging studies used sampling intervals of one volume every 4 s for a modest number of slice planes needed to image the full brain. During the mid-2000’s, with advances in multi-channel coil technologies and the emergence of 3 T imaging systems, 2 s intervals were the most frequently used for the same in-plane image size but having finer slice resolution. More recently, improved sampling methods have made it possible to routinely obtain multiple BOLD image volumes of the whole head per second (Feinberg et al. 2010).

Likewise, diffusion tensor imaging (DTI) has undergone its own steady progress both in the time to aquire data but also the degree of angular resolution by which to measure patterns of water molecule diffusion. Early diffusion imaging studies measured diffusion along 6 directions seeking to utilize diffusion signal to infer neural fiber orientation, quantifying this as a diffusion tensor, and to then link these tensor orientation patterns to estimate neural fiber pathways (Basser et al. 2000). Interest in examining finer degrees of angular difference in fiber orientations and to detangle crossing fiber pathways led to using more diffusion directions through high angular diffusion approaches having 32, 64, or greater MR gradient directions (Zhan et al. 2010). More recent approaches have decomposed the spectrum of diffusion which is capable of then resolving upwards of 512 fiber directions in approximately the same time and volume sizes as the original DTI imaging sequences (Wedeen et al. 2008).

In each example, as the imaging technology has improved, been extended, or made faster, so too has the amount of brain data which can be obtained. Once these improved methodologies have proven robust and dependable, with analytic methods available with which to utilize them, researchers have been quick to adopt them—doubling or tripling the amount of data they can then gather per subject by doing so. Indeed, a simple examination of fMRI articles from representative issues of the field’s touchstone journal, NeuroImage, indicates that since 1995 the amount of data collected has doubled approximately every 26 months (Fig. 1). At this rate, by 2015 the amount of acquired neuroimaging data alone, discounting header information and before more files are generated during data processing and statistical analysis, may exceed an average of 20GB per published research study. This is likely to be an under-estimate for raw dataset sizes since, as noted above, advances in MRI physics are accelerating the pace at which data can be aquired per unit time.

Fig. 1
figure 1

The amount of acquired neuroimaging data reported from published articles in representative isues of the journal NeuroImage has doubled every 26 months and can expect to top 20GB of purely raw data on average per study in only a few years. Amassing, curating, storing, and sharing of such data from neuroimaging archives presents a growing big data challenge

Individually, the neuroimaging data sets from a single study themselves may not pose major difficulties for processing and analysis using exising algorithms and statistical methods. However, as data sets are amassed into large-scale databases, considerable challenges emerge. How to rectify between study differences in acquisition rates, resolution, scanning parameters, the presence of image artifacts, etc., take on importance—even the same make and model scanner installed at two different sites can generate sufficient differences in image quality to complicate combined analysis. Along with the increasing interest in gathering study data from across the lifespan or focused on specific patient groups, concurrently recorded electrophysiological time courses, and comprehensive phenomic meta-data on each study participant, wrangling, let alone interpreting, these data will not be for the faint of heart.

Big neuroimaging + big genetics = REALLY big data

With the ability to obtain genome-wide sets of single nucleotide polymorphism (SNP) information becoming routine and the costs of full genomic sequencing rapidly becoming affordable, interest in linking the influence of genes on the brain is heating up. Soon, exomic-level genetics will be readily at hand to replace SNP-based approaches. At several gigabytes per genome using Next Generation Sequencing (NGS) methods, for major brain imaging studies such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Weiner et al. 2012), with its initially available sample of 832 subjects, one can expect to store multiple petabytes of genetics data alone. More ADNI data are on their way, too. Coupled with the existing and ongoing collection of multimodal neuroimaging data types, this represents some really big data. As the bond between neuroimaging and genomics grows tighter, with both areas growing at incredible rates, disk storage, unique data compression techniques (like those proposed for genomics (Hsi-Yang Fritz et al. 2011); see also http://www.sequencesqueeze.org/) and data processing considerations rapidly become a scientific imperative.

Multisite consortia and data sharing

Along with advances in the ability to obtain data there has been an increase in the numbers of multisite consortia like ADNI for examining the healthy and diseased brain. The demand for multiscale data in the investigation of fundamental disease processes has been recognized for several years (Jiang et al. 2008), as well as the need for cooperation across centers and even disciplines to integrate and interpret the data (Van Horn and Toga 2009b). Examples of multisite neuroimaging efforts can be found in the ubiquitous application of neuroimaging in health but also in devastating illnesses such as Parkinson’s (Evangelou et al. 2009; see also http://ppmi.loni.ucla.edu/), psychiatric disorders (Schumann et al. 2010), and also the mapping of human brain connectivity (Toga et al. 2012). In addition to databases of aging and aging-related diseases, large-scale examples from the NIH-based National Database of Autism Research (NDAR; Hall et al. 2012) and the Federal Interacgency Traumatic Brain Injury Research (FITBIR; Bushnik and Gordon 2012) exist to gather neuroimaging, genetic, and phenomic data on autism and brain injury, respectively. The various “grass roots” collections of resting-state fMRI data maintained as part of the “1000 Functional Connectomes” project (http://fcon_1000.projects.nitrc.org/) (see Biswal et al. 2010) and task-based OpenfMRI (http://www.openfmri.org) (Poldrack et al. 2013) are other notable examples.

What is more, pressures to share these data as openly as possible have put the onus on both the data collectors and database curators to store data efficiently and safely while providing data efficiently to anyone who would like to use it. However, inherent in multilaboratory projects are sociologic, legal, and often ethical concerns that must be resolved satisfactorily before they can work effectively or be widely accepted by the scientific community (Beaulieu 2001). While this often involves some expectations on the part of data collectors, etc., the exchange, storage, and computation on brain imaging data has many advantages including 1) maximizing the cost of their collection over the widest possible set of researchers, 2) allowing their use in new methods development, 3) promoting data re-analysis and re-purposing, 3) new means for collaboration, 4) hypothesis generation, and 5) clever forms of visualization. In other words, archived data can be subjected to novel approaches which can highlight relationships not envisioned by the original data collectors and shed new light on important mechanisms which are then worthy of additional study (Van Horn and Ishai 2007).

Examples of big neuroimaging databases

Despite the ever present challenges of archiving massive quantities of large brain imaging data, prominent examples for storing, sorting, and mining data exist. Key examples include XNAT Central (https://central.xnat.org/) (see Marcus et al. 2007) which contains data from several thousand subjects and is playing a key role in the data management and informatics of the Human Connectome Project (HCP) (Marcus et al. 2013); the SumsDB effort (http://sumsdb.wustl.edu/sums/index.jsp) for cortical surface-based atlasing of neuroimaging results (Van Essen 2005); and the NIH MRI Study of Normal Human Brain Development (http://www.bic.mni.mcgill.ca/nihpd/info/data_access.html) (Evans 2006) containing multisite neuroimaging data of children from ages 6 to 18 years old accompanied by comprehensive neurological assessments and neuropsychological testing results (Almli et al. 2007). Such resources represent leading efforts among a growing set of online repositories for the storage and sharing of raw and processed neuroimaging data and results.

Speaking from our own experience, the Laboratory of Neuro Imaging (LONI), formerly based at the University of California Los Angeles (UCLA) and now based at the University of Southern California (USC), has been serving as a repository for single and multi-site neuroimaging research studies for many years (Toga 2002b). The LONI Image and Data Archive (IDA) provides a comprehensive and interactive environment for safely archiving, querying, visualizing and sharing neuroimaging clinical and neurocognitive data. It presents a user-friendly environment to users for archiving, searching, sharing, tracking and disseminating neuroimaging data. All data is stored on redundant servers with daily and weekly on- and off-site backups. The IDA stores data from ADNI and the Michael J. Fox Foundation, among other prominent multi-site neuroimaging initiatives.

Archiving data in the IDA is straight-forward, secure, and requires no specialized hardware or software. The IDA automatically facilitates the de-identification and pooling of data from multiple institutions, protecting data from unauthorized access while providing the ability to share data among collaborative investigators. Integration of the LONI Debabeler file format translation engine (Neu et al. 2005) allows users to upload and download image data in a number of common neuroimaging file formats (e.g. DICOM, Analyze, NIFTI, MINC). Once archived, data can be downloaded and/or streamed into automated tools for processing and analysis (Dinov et al. 2009, 2010a).

Presently, the IDA repository contains over 14,000 imaging series for thousands of subjects, with growth averaging ~400 new image series per month, encompassing 130 TB of storage. This includes structural, functional, diffusion, and other MRI-based data types, in addition to studies of a range of PET ligands. Since the inception of the IDA, data drawn from it has been utilized in literally hundreds of research articles published in the peer-reviewed literature.

All in all, resources such as these are amassing brain imaging data at already impressive scales. However, will such archives be ready for still more data as investigators acquire more information through scanning as well as other measures of brain form and function? The next wave for neuroimaging genetic examinations of the brain will certainly test the computational and storage infrastructures of these and other databasing efforts.

The role of cyberinfrastructure

Individual desktop computers are now no longer suitable for analyzing potentially petabytes worth of brain and genomics data at a time. What is needed is to combine effort from across multiple, distributed processing elements and leverage the combined power they possess toward massive-scale analyses of neuroscientific resources. The motivating interest in emerging forms of computing for biomedicine is the coordination of resource sharing and problem solving in dynamic, multi-institutional, spatially dispersed, virtual organizations who can gather and exchange data. While the National Science Foundation (NSF) has made major investments in the computer architecture needed for physics, weather, and geological data (e.g. XSEDE, https://www.xsede.org/, and the Open Science Grid, https://www.opensciencegrid.org), the NIH has invested warily (e.g. BIRN, http://www.birncommunity.org/), despite strongly encouraging data re-use from online data repositories. More than just the sharing of files, this would involve direct access to computer resources, software, data, and services, increasingly required by a range of collaborative problem-solving and resource-brokering strategies emerging from industry, science, and engineering. The Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC; http://www.nitrc.org) and the International Neuroinformatics Coordinating Facility (INCF; http://incf.org) have begun to deploy local clusters with Amazon EC2 server technology toward this goal but a larger effort will be required involving dedicated processing centers or distributed grids of linked computer centers. Resource availability can be open or highly controlled with providers and consumers defining clearly and carefully just what is shared, who is allowed to use resources, and the conditions under which it occurs (Van Horn and Toga 2009a). It shouldn’t come as a surprise that, like so many other forms of data-rich science, big computing is an ideal frawework for big neuroimaging.

Models for data management and availability

The organizing, annotating, archiving, and distributing of neuroimaging and biomedical data in useable and structured frameworks have become critical elements across a range of neuroscientific efforts (Toga 2002a). With all this brain data flying around needing someplace to land, several well known efforts to construct and populate large brain anatomy (Mazziotta et al. 2001), function (Van Horn et al. 2005), and genetics (Saykin et al. 2010) databases have arisen over the years in order to make such data open to a still broader audience of researchers. In fact, for many recent large-scale neuroimaging projects, such as the Human Connectome Project (HCP; www.humanconnectomeproject.org), their existance is predicated on the expectation that the data obtained will be well organized and available to the community for examination and study.

As one might imagine, there are as many data organizational models as there are laboratories gathering the data. Because of the volume of all the study data itself, it is not uncommon for the imaging data to live apart from the meta-data describing it, e.g. on its own disk partition or even another computer system altogether. This information, in turn, can itself live apart from other data acquired (e.g. demographics, genetics, phenomics). Organizing and linking all of these distinct and possibly physically separated data types can be a major data management challenge.

What is more, multiple models exist for making data available to others (Van Horn et al. 2001). In the simplest form, anonymous FTP sites can be used as a place to post datasets where they may or may not be secure and the intentions of those accessing the data uncertain. Databases containing summary results are enormously valuable from a meta-analytic perspective (Laird et al. 2009), however, the original neuroimaging data themselves are likely unavailable. The use of results-only resources may also require care in due to the potential for factors such as publication bias (Jennings and Van Horn 2012). Federated archiving approaches seek to allow data to remain at the sites that collected it but to permit users at a consortia of participating institutions to search and access each other’s data (Helmer et al. 2011). Centralized approaches also exist in which a single center is the focal point for databasing and accompanying informatics for multi-site efforts or multiple-multi-site efforts (Neu et al. 2012). These and other models have their own unique challenges for data storage, management, and availability which can be expected to become more complex as study data sizes grow. Contributions of raw and processed study data to centralized or federated repositories can provide maximal information and utility to subsequent users (Van Horn and Gazzaniga 2005), where the submission of data can be voluntary, as a condition of publication, or as an agreed upon aspect of multi-site data consortia (Mazziotta et al. 1995; Toga and Crawford 2010; Van Horn and Gazzaniga 2012).

Standards are frequently non-standard

With a diversity of strorage models also comes a variety of ways in which study data is managed and meta-data is internally represented. Having common formatting standards for meta-data representation and organization has been an important topic for neuroinformatics over the past decade (Koslow 2000; Helmer et al. 2011). The creation of data standards enables both intra- and inter-disciplinary interaction (Van Horn and Ball 2008), encourages the development of novel software tools for helping to understand relationships within and among data elements, and encourages new investigation of database contents (Neu et al. 2012). Recent efforts from international data sharing working groups (Poline et al. 2012) have made considerable headway in helping to develop new frameworks for organizing study meta-data drawing from the best parts of extant meta-data frameworks. Best practices for fMRI results reporting have been recommended (Poldrack et al. 2008) though may naturally need to mature further as emerging analytic approaches come into favor. On the other hand, standards designers need to avoid the feature creep that often arises which can result in frameworks which are overly rigid and cannot adapt as new information becomes available and need to be included. Nevertheless, when fully mature, these modern schema will significantly improve the description of neuroimaging data sets, encode the provenace associated with data processing (Mackenzie-Graham et al. 2008; McClatchey et al. 2013), and help to populate large-scale archives prospectively, thereby encouraging common analysis frameworks. Such practical standards will be essential for multi-site trials and major neuroimaging initiatives, where data sharing has been expressly mandated by funding agencies.

Factors governing the utility of “Big data” resources

However, simply having “big data” neuroimaging and making it available online is not a means to an end but only the next step in making that data available for others to explore and mine. Several factors contribute to a database's use and utility, including whether it actually contains viable data accompanied by a detailed description of their acquisition (Van Horn and Toga 2009a); whether the database is well-organized and the user interface is easy to navigate; whether the data are derived versions of raw data or the raw data itself; the manner in which the database addresses the sociological and regulatory issues that can be associated with data sharing; whether the data is fully anonymized and open to anyone who wants it, only open to members of a multi-site consortium, or if it is necessary to get IRB approval before access can be granted; whether it has a policy in place to ensure that requesting authors give proper attribution to the original collectors of the data; and the efficiency of secure data transactions. The systems must provide flexible methods for data description and relationships among various meta-data characteristics (Bug et al. 2008). Moreover, those that have been specifically designed to serve a large and diverse audience with a variety of needs, represent the types of databases that can have the greatest benefit to scientists looking to study the disease, assess new methods, examine previously published data, or with interests in exploring new ideas (Van Horn and Gazzaniga 2005; Keator et al. 2013).

Mining data and digging for gold

The successes of molecular biological (Huang et al. 2007), systems biological (Hood et al. 2004), and astrophysics (Gray et al. 2002) infrastructures for archiving and mining data are well known. With the accumulation of neurobiological data into large databases and the availability of compute cluster enabled means for large-scale data processing, a new form of discovery-oriented neuroscience is on the horizon. Researchers are increasingly mining vast and disparate collections of data and hunting for unseen patterns that might provide clues to underlying biological mechanisms (Phan et al. 2002; Arnone et al. 2009; Frazier and Hardan 2009) and trends in how studies are conducted (Jennings and Van Horn 2012). The terabytes-worth of data provide input for informatics-driven pattern-seeking and other relevant algorithms (Jones and Swindells 2002; Ma et al. 2002; Schutte et al. 2002) which can provide further understanding into complex brain processes.

Departing from a purely repository-based approach, a collection of research groups from around the world has adopted what might be considered more of a social networking strategy for data sharing. The Enhancing Neuroimaging Genetics through Meta-Analysis (ENIGMA; enigma.loni.ucla.edu) network brings together interested researchers in imaging genomics, to understand brain structure and function, based on MRI, DTI, fMRI and genome-wide association study (GWAS) data. Among the network’s goals: to ensure promising findings are replicated via member collaborations, in order to satisfy the mandates of most journals; gain increased statistical power; and to share ideas, algorithms, data, and information on promising findings or methods. They have provided the means for imaging, genetics, as well as computer science, mathematics and statistics experts to make significant collaborative contributions to these fields from which, without the needed infrastructure, most of their expertise would have been excluded (Stein et al. 2012). These successes are not necessarily unique, but building upon them and extending them to a wider set of scientific research arenas is an ever-present theme (Persson 2000; Brookes 2001; Altman 2003).

New computer science designed with big data in mind

While it might be tempting to think that once all the data has been archived, indexed, and is ready to go, that all one would need is to start analyzing it and answers to all our questions about the brain will be revealed. Even examining the contents of an archive to know what data is available to be analyzed requires new, cleverly designed and user friendly software tools and novel approaches for exploratory inspection. Such tools are only now beginning to appear (Bowman et al. 2012) and their further development will be essential for dealing with existing as well as the expected size of neuroimaging data sets.

Once a selection of data worthy of further analysis has been identified, a new concern is realized—it becomes clear that many software packages for neuroimaging data analysis are ill-suited toward very large data sets involving potentially thousands of subjects. Algorithm optimization is not often considered for when data sets are small or modest in size but as data sets grow memory management is an important factor. New mathematics and informatics approaches will be needed to more completely model multi-modal brain imaging data in the context of cortical anatomy, white matter connectivity, and functional activity. These will need to work fast, be accurate, and be interoperable with other tools so that data processing can be automated as much as possible. Interactive workflow environments for automated data analysis will also be critical for ongoing or retrospective research studies involving complex computations on large multi-dimensional datasets (Dinov et al. 2010b; Gorgolewski et al. 2011). Yet, few tools, if any, now exist which enable the joint analysis of both genes and brain imaging data which would be capable of efficiently obtaining results while also achieving the requisite degree of statistical power. Moving forward, software engineers will need to create brilliant and innovative ways to tackle the massive amounts of brain and genomic data. These continuous interactions between neuroimagers, geneticists, software creators, and other biomedical scientists will be essential to develop these new, memory-efficient software algorithms and computational tools.

Big data in the era of ADNI2, the HCP, the HBP, and BRAIN

As the NIH and other major funding agencies spearhead big picture multi-center brain science efforts, the needs for big data solutions will grow. The HCP consortia are now releasing large datasets which even the best neuroimaging researchers are struggling to analyze. More of that data is on the way. ADNI2, the premier neuroimaging data collection effort for understanding the aging brain, will be providing imaging, phenomic, and exomic information which we have yet to get our heads around how best to analyze. The HBP, involving data collection sites throughout Europe, will likely eclipse that! And while the basis for infrastructure support is also changing, the simple thought that data and processing will simply be done “in the cloud” is somewhat naïve. The “cloud” must exist somewhere and system failures have been known to happen. Though commercial solutions such as those offered by Google, Amazon, and Microsoft can provide an attractive solution for individual centers, dedicated neuroscience computational databasing and data processing resources remain essential. As the BRAIN initiative takes shape, it is clear that demonstrable access to data, processing workflow methods, running on remote CPU clusters dedicated to this purpose is the best way to ensure that we can keep up in the extraction of real findings as opposed to a collection of results.

Conclusions

Neuroimaging research, by its very nature, is data intensive, multimodal, and collaborative—factors which have been instrumental in its success and growth. Indeed, we contend that neuroimaging is an emerging example of discovery-oriented science, wherein patterns of brain structure and activity present across multiple subjects and dozens of studies can be systematically extracted, examined, and result in new knowledge. Yet the infrastructure needed for supporting this advancing form of brain research where data is king is still maturing. The rapid processing of large quantities of data in this way will lead to new scientific outcomes and patterns of results not envisioned during the examination of each study individually. Patterns may suggest fundamental mechanisms. Confirmed mechanisms add to the knowledgebase of neurobiological science and provide the basis for further experimentation and the generation of still more valuable data that can be included in still greater analyses. Greater knowledge about fundamental brain processes then suggests new and testable hypotheses that lead to novel experimentation, the data from which should then be contributed back into a publicly available archive—continuing a healthy and helpful cycle.

The next steps for the development of resources supporting “big data” brain imaging at the Exabyte scale will require the further creation of new tools and services for data discovery, integration, analysis, and visualization. Components for discovering data residing in database architectures must be developed (sort of a PubMed for data discovery). Present examples exist such as the “EB-eye” resource for genomics (http://www.ebi.ac.uk/ebisearch/) or the Neuroscience Information Framework (NIF; http://neuinfo.org) for neuroscience terminologies. Such meta-resources will need to include contextual information that allows data to be accessed, understood, reused, and the results reproduced. Integrating a broader spectrum of neuroscience data and providing tools for interrogating and visualizing those data will enable investigators to more easily and interactively investigate broader scientific questions.

Beyond just neuroimaging data, architectures for peta-scale biomedical data must be flexible enough to allow integration of additional clinical and biochemical data and analysis results into the database and employ tools for interactively interrogating and graphically visualizing database contents (Bowman et al. 2012). Frameworks for storing and making available big neuroimaging data, their standards, and the infrastructure for so doing must be enriched with modern data processing workflow design and execution systems in place to permit exchange of processing methodologies between labs, among consortia members, or by independent researchers. Comprehensive mechanisms to gather, organize, and distribute data, results, and information between and among project participants, but also to the scientific community at large, are worth being examined, developed, and deployed. For the “big data” science of human brain imaging, now is the time to begin.