Introduction

Diffusion MRI fiber tractography is widely used to map the structural connections of the brain (Conturo et al. 1999; Mori et al. 1999; Basser et al. 2000; Behrens et al. 2003; Parker et al. 2003; Lazar and Alexander 2005). Tractography utilizes the directionality of diffusion of water molecules in brain tissue to estimate neuronal fiber orientation, and, subsequently, generates “streamlines”—typically by stepping along these orientation fields in some pre-determined ways (Mori et al. 1999; Mori and van Zijl 2002). These streamlines are representative of possible trajectories of white matter pathways of the brain and have been used to infer region-to-region connectivity (connectomics) or to identify and extract specific white matter tracts (bundle-segmentation). These techniques can additionally be informed by a priori knowledge of anatomy or trajectories of the pathways (Wakana et al. 2007). For instance, anatomical constraints can be employed by defining regions-of-interest (ROIs) to constrain the resulting streamlines, which is more generally used in bundle-segmentation applications. Most commonly, “seed” ROIs define where streamlines must start or end, “AND” or “inclusion” ROIs that pathways must include, and “NOT” or “exclusion” ROIs that pathways must not contact. These constraints are typically implemented post-tracking as a filtering technique (Garyfallidis et al. 2018; Zhang et al. 2018; Guevara et al. 2012), but can also be used during track generation (Warrington et al. 2020; Behrens et al. 2007; Catani and de Schotten 2015; Catani et al. 2002), and are most commonly associated with the field of bundle segmentation (i.e., as a virtual dissection of specific pathways following seeding throughout the entire brain).

Despite these significant achievements in human brain mapping, the field of diffusion MRI has uncovered and detailed a number of limitations in the anatomical accuracy of fiber tractography techniques, particularly in recent years. Early validation studies were mostly aimed at proving sensitivity of these techniques, and only recently highlighted the specificity issues, especially as it relates to connectomics. These studies have convincingly shown a fundamental trade-off between sensitivity (i.e., the ability to detect true connections) and specificity (i.e., the ability to avoid false connections) of tractography techniques (van den Heuvel et al. 2015; Azadbakht et al. 2015; Thomas et al. 2014), and an overall limited accuracy in estimating both structural connectivity and spatial extent of pathways in the brain (Thomas et al. 2014; Schilling et al. 2018a; Maier-Hein et al. 2017). These results have been confirmed in simulations, in phantoms, and in a number of animal models—with sensitivity/specificity trade-offs apparent across a range of tracking algorithms, parameters, and pathways under investigation (Azadbakht et al. 2015; Thomas et al. 2014; Schilling et al. 2018a, 2019; Donahue et al. 2016; Knosche et al. 2015; Dyrby et al. 2007; Dauguet et al. 2007; Calabrese et al. 2015; Aydogan et al. 2018; Neher et al. 2014; Cote et al. 2013). It is now well known that these techniques can be plagued not only by overestimation of the extent and connections of pathways (false positives), but also underestimation of the same (false negatives). One influential work presented by Thomas et al. (2014) highlights “an inherent limitation in determining long-range anatomical projections based on voxel-averaged estimates of local fiber orientation obtained from DWI data that is unlikely to be overcome by improvements in data acquisition and analysis alone.” Thus, it appears that high anatomical accuracy remains an elusive goal with current tractography algorithms and strategies, unless a “revolution” happens in the additional information provided to tractography algorithms (Maier-Hein et al. 2017, 2019).

However, these limitations have largely been highlighted in validation studies that have implemented tractography in a manner most similar to that performed in connectomics studies—i.e., with little-to-no anatomical rules or constraints in a relatively “unsupervised” approach lacking advantages of prior information. Thus, they represent a lower-bound, or worst-case, scenario for tractography, and may not be indicative of the anatomical accuracy in the process of bundle segmentation where filtering and anatomical rules are common (Garyfallidis et al. 2018; Yendiki et al. 2011; Warrington et al. 2020; Wasserthal et al. 2018, 2019; Rheault et al. 2019; Wassermann et al. 2013, 2016). In fact, several early works in this field share a quite optimistic view of the accuracy of tractography (Catani et al. 2002; Lawes et al. 2008; Mori and van Zijl 2007), and “virtual” dissections of individual fiber bundles are qualitatively similar to cadaveric dissections (Hau et al. 2017; Wang et al. 2016; Forkel et al. 2014; Sarubbo et al. 2013). Further, constraints have been heavily utilized in the previous validation studies for not only verifying anatomical accuracy, but identifying advantages of comparative anatomy across species (Jbabdi et al. 2013; Mars et al. 2011, 2016; Safadi et al. 2018), and confirming the trajectory or cortical origin of white matter bundles (Neubert et al. 2014, 2015; Mars et al. 2012; Innocenti et al. 2017). While these studies, and many others incorporating prior information and anatomical constraints (Rheault et al. 2019; Galinsky and Frank Jun 2017; Smith et al. 2012; Galinsky and Frank 2015; Frank and Galinsky 2016), suggest tractography can accurately reconstruct not only broad pathways but also the topology of smaller bundles within those pathways, the sensitivity and specificity when implementing anatomical guidance has not been explicitly quantified.

Along these lines, we hypothesize that to overcome the sensitivity/specificity curse, we simply (and intuitively) need to utilize anatomical knowledge and anatomically informed rules as is commonly done in bundle-segmentation studies, which will enable us to constrain where tracks can and cannot go (Rheault et al. 2019). With this in mind, the aim of the current study is to investigate and quantify the upper bounds of current tractography methods. Whereas previous quantitative validation studies have asked how well we can map connections from a given region, we ask how well we can extract known bundles and connections of the brain, i.e., given detailed (and painstakingly acquired) knowledge of the ground truth pathways (Schmahmann and Pandya 2006) we ask if existing algorithms can reach high anatomical accuracy in segmenting these pathways. Thus, we propose, and show, that simple guidance can be used to achieve a high sensitivity and high specificity at the same time (i.e., if we a priori know, and constrain, where the pathways start, where they end, and where they do not go)—confirming that the process of bundle segmentation, with the incorporation of a prior knowledge, has the potential to result in highly accurate representations of the desired neural pathways.

To do this, we utilize the validation dataset originally introduced by Thomas et al. (2014) and subsequently employed in an international tractography challenge (Schilling et al. 2018a), both of which came to the conclusion that alternative or new strategies are needed for mapping the brain’s fiber pathways. Here, we apply tractography methods to this ex vivo dataset of the macaque brain, and compare these methods to maps of known axonal projects from previous tracer studies in the macaque (Schmahmann and Pandya 2006). Importantly, by utilizing the very same detailed tracer maps and explicit descriptions by the authors, we perform virtual dissections of a full brain tractogram. We constrain the streamlines using varying combinations of inclusion and exclusion regions in a manner consistent with common approaches in bundle segmentation. We assess the results using the code and analysis used in Thomas et al. (2014), iteratively refining the constraints until both high sensitivity and high specificity are achieved. We use the subject-specific data to drive the results, but obey anatomical rules with clear landmarks, as one might when driving a car by following GPS instructions and road maps.

Results

The aim of the methodology is to duplicate the process of a clinician, neuroanatomist, or researcher that may be manually delineating a fiber bundle, i.e., by applying and adapting guidelines until the streamlines best replicate the ground truth WM anatomy of the pathway of interest (for example when comparing to neuroanatomy textbooks, prior knowledge, or tractography protocols). We selected the datasets and ground truth pathways from previous studies (Thomas et al. 2014; Schilling et al. 2018a), composed of anatomical locations of tracer-labeled regions from anterograde injections within (A) the precentral gyrus (PCG) corresponding to the foot region of the motor cortex [Case #28 in Schmahmann and Pandya (2006)] and (B) the ventral part of area V4 [Case #21 in Schmahmann and Pandya (2006)] of a rhesus macaque—the same injection sites utilized in Thomas et al. (2014). Tracer-labeled regions were transposed to the same space as the diffusion MRI data (Fig. 1), and agreement between tracer results and tractography was assessed in terms of the number of true-positive (TP), false-negative (FN), false-positive (FP), and true-negative (TN) connections, which are used to compute specificity [TN/(TN + FP)] and sensitivity [TP/(TP + FN)], which are defined by regions of the brain manually delineated by the authors of Thomas et al. (2014) (Fig. 1).

Two different methods of streamline generation and subsequent pathway delineation were investigated, representative of the approaches and software which the authors (KS and LP) chose in their own anatomical investigations. These are a manual-based approach and template-based approach. First, we utilized manually drawn ROIs (Wakana et al. 2007; Catani and de Schotten 2015), defining regions by hand where streamlines must go and where they must not go. These hand-drawn regions were typically in the form of planes or 2D free-form shapes, often orthogonal to the observed direction of streamline prorogation. Inclusion regions were placed in regions specific to the pathway of interest, whereas streamlines considered false positives were eliminated by placing exclusion regions where these were visually identified to share areas in common (most commonly along adjacent white matter bundles or at the sulcal depth of gyri). Example procedures and constraints are shown in Fig. 2 and described in detail in Materials and methods. Second, we made use of predefined anatomical regions defined in a macaque template to serve as inclusion and exclusion regions. The template was composed of labels in the form 3D volumes to be used as regions-of-interest. Example procedures and constraints shown in Fig. 3 and described in detail in Materials and methods.

Fig. 1
figure 1

Definition of ground truth and analysis. a Tracer substance delineated on individual slices [reproduced from Schmahmann and Pandya (2006)]. For the PCG injection (Case #28), tracer was described and detailed on 14 slices. b Example MRI b0 slice from approximately similar location. c Tracers were transposed to MRI data, as described in Thomas et al. (2014), and digitized as binary “ground truth” volume of pathways. d Gray and white matter ROIs were manually delineated on the high-resolution data to assess agreement between tracer and tractography results

Fig. 2
figure 2

Example procedures and constraints for manual dissection. Bundles, pathways, or groups of streamlines were individually segmented based on a priori anatomical knowledge written and pictured in Schmahmann and Pandya (2006). Injection region (blue), inclusion ROIs (green), exclusion ROIs (red), and streamlines (yellow tubes) are visualized in 2D, with green and red arrows used to highlight hard-to-see inclusion and exclusion regions, respectively, that are either in a plane perpendicular or oblique to the image slice, or those partially obscured by streamlines. Detailed anatomical descriptions and decisions used in the dissection process are given in Materials and methods

Fig. 3
figure 3

Example procedures and constraints for template-based virtual dissection. Bundles, pathways, or groups of streamlines were individually segmented based on a priori anatomical knowledge written and pictured in Schmahmann and Pandya (2006). Examples are shown for PCG injections for striatal and commissural pathways. ROIs are shown as colored volume renderings, and streamlines are colored based on directionality. Red arrows highlight apparent false-positive streamlines that are removed through the use of exclusion regions, and white arrows emphasize the dense cord of the bundle matching anatomical descriptions. Detailed anatomical descriptions and decisions used in the dissection process are given in Materials and methods. Briefly, for striatal streamlines, the original streamlines (a) are limited to those connecting to the PT (b), which pass through the CR (c) but still have false positives. These false positives pass through FX, TH, and AM (d), and are eliminated using these as exclusion regions (e), resulting in the final striatal bundle (f). The commissural streamlines pass through the BCC (g); however, many false positives are apparent (h). Using a number of exclusion regions (i) eliminates erroneous streamlines (j), resulting in commissural streamlines trajectory which agrees well with written descriptions (k)

Qualitative results of the final tractogram of connections to the injection region are shown in Fig. 4 for PCG connections, and in Fig. 5 for V4 connections. The reference atlas of digitized histological connections (i.e., the ground truth) is shown as well as a roughly anatomically matched MRI slice with tractography streamlines overlaid, showing both manual dissection results and template-based results. While the streamlines replicate the major pathways and connections from tracers, they do not do so on an individual axon/streamline basis. There are small inconsistencies from individual streamlines; however, on the scale of larger anatomical regions (see Fig. 1d), streamlines exist where expected and do not occupy regions that tracer does not. Visually, the manual dissections better replicate the ground truth in many regions, due to the ability to make subject-specific and location-specific inclusion and exclusion decisions.

Fig. 4
figure 4

Qualitative comparison of tracer and tractography for PCG injection. Tracer digitized on the reference atlas [and reproduced from Schmahmann and Pandya (2006), with permission] is shown alongside the anatomically matched b0 slice with streamlines shown in black (only streamlines within ± 1 slice are displayed) for both the manual-based dissection and the template-based dissection

Fig. 5
figure 5

Qualitative comparison of tracer and tractography for V4 injection. Tracer digitized on the reference atlas (and reproduced from Schmahmann and Pandya (2006), with permission), are shown alongside the anatomically matched b0 slice with streamlines shown in black (only streamlines within ± 1 slice are displayed) for both the manual-based dissection and the template-based dissection

Fig. 6
figure 6

Sensitivity and specificity results compared to the previous tractography validation studies. Results of the current study are shown as a filled star for manual dissection and outline start for template-based dissection, overlaid on plots and results from Thomas et al. (2014), and Schilling et al. (2018a) (left column and right column, respectively), which utilize the same data, same ground truth, and same quantitative analysis. ROC curves for PCG connections are shown on top row, with V4 ROC curves shown on bottom row

Quantifying accuracy [as done (Thomas et al. 2014) in and (Schilling et al. 2018a)] for PCG connections, we find manual dissections which result in a sensitivity of 0.949, specificity of 0.956, and Youden index (sensitivity + specificity − 1) of 0.906 (TP = 132, FN = 7, TN = 328, FP = 15), and template-generated dissections a sensitivity of 0.863, specificity of 0.869, and Youden index of 0.732 (TP = 120, FN = 19, TN = 298, FP = 45). For V4 connections, manual dissections result in a sensitivity of 0.852, specificity of 0.925, and Youden index of 0.777 (TP = 115, FN = 20, TN = 234, FP = 19), and template-generated dissections result in a sensitivity of 0.770, specificity of 0.866, and Youden index of 0.636 (TP = 104, FN = 31, TN = 219, FP = 34). These results are plotted as ROC curves on top of the results of Thomas et al. (2014) and those of Schilling et al. (2018a) (Fig. 6a for PCG connections, Fig. 6b for V4 connections). It is clear that a high sensitivity and a high specificity are achieved at the same time, with the values much higher than those from both the original investigation (Thomas et al. 2014), and the international community challenge (Schilling et al. 2018a). The highest Youden indices observed previously on this dataset were 0.59 and 0.56 from Thomas et al. (2014) and Schilling et al. (2018a) for the PCG injection and 0.53 and 0.58 for the V4 injection. We note that this is not a comparison of algorithms, since we, of course, had access or direct knowledge of the ground truth to help choose constraints to improve sensitivity and/or specificity.

Discussion

In this study, we aim to investigate the upper bounds of tractography performance. If we are given a detailed description of the ground truth, either depicted in a map or written explicitly as a set of rules, and liberty in manual editing of pathways, we ask if it is possible to overcome the sensitivity/specificity limitations of current tracking algorithms and achieve a high anatomical accuracy. We find our answer is ‘yes’—tracking can be highly accurate if we know where streamlines (or pathways) start, where they end, and (maybe most importantly) where they do not go.

The importance of prior knowledge

Importantly, a number of anatomical constraints were needed to achieve this accuracy. Thus, while our answer to the previous question is ‘yes’, it is not without caveats. Our conclusion should be amended to say that current algorithms, in combination with constraints, can achieve both high sensitivity and high specificity. Or alternatively, current algorithms “in combination with previous anatomical knowledge” can have high accuracy. This anatomical knowledge is what influences the regional constraints, and we find that both inclusion and exclusion regions were needed in our study. Thus, our results are exactly in agreement with the previous literature (Thomas et al. 2014; Schilling et al. 2018a; Maier-Hein et al. 2017), that simply utilizing local orientation information alone will not lead to accurate results, and more information is needed. However, we believe that this additional information can, and should, come in the form of existing knowledge of the trajectories of the white matter.

This study serves as the link between the existing validation studies emphasizing inherent tractography limitations and the ever-present sensitivity/specificity trade-offs (Thomas et al. 2014; Donahue et al. 2016; Knosche et al. 2015; Dyrby et al. 2007; Schilling et al. 2018a, 2019; Cote et al. 2013; Dauguet et al. 2006; Delettre et al. 2019; Ambrosen et al. 2020; Shen et al. 2019), and those studies that suggest anatomically faithful white matter bundles visually matching independent cadaveric or tracer data (Jbabdi et al. 2013; Mars et al. 2011, 2016; Safadi et al. 2018; Neubert et al. 2014, 2015; Innocenti et al. 2017; Sallet et al. 2013). The sensitivity/specificity trade-off has so far been quantified using algorithms or streamlines generated in a relatively unconstrained manner, without the use of prior knowledge or constraints. Alternatively, those that reveal accurate reconstructions of trajectories and connectivity patterns are nearly always performed using inclusion and exclusion criteria, chosen and implemented by those with expert a priori knowledge of the system or pathways under investigation. While the latter studies show similarities in bundle shape, location, and endpoints, the sensitivity and specificity has not been fully quantified. In this study, we quantitatively confirm that the use of a prior knowledge, in this case in the form of regional constraints, improves the anatomical accuracy of tractography. Thus, while false positives and false negatives still exist, the overall accuracy is significantly improved, suggesting that the use of constraints as is commonly employed in a number of studies using bundle segmentation can indeed result in highly accurate reconstructions.

Relying on local orientation information alone is insufficient to ensure both sensitivity and specificity (Thomas et al. 2014). Here, we show that current tractography algorithms can provide highly accurate maps of the white matter, utilizing prior knowledge to filter results. In this case, we chose an algorithm that was known have the ability to be highly sensitive (dependent upon thresholding) (high TP), and utilized added constraints as the solution to improve specificity. The sets of utilized streamlines, prior to filtering, exhibited a high sensitivity and poor specificity (sensitivity = 1, specificity = 0.27, and Youden index = 0.27 for the probabilistic streamlines prior to manual filtering), in line with the previous findings with similar reconstruction and tractography algorithms (Schilling et al. 2018a, 2019), which showed a much higher overall accuracy after anatomical constraints were employed. This emphasizes that the high specificity and sensitivity is due to the exploited anatomical knowledge instead of tractography algorithm choices.

The challenge, then, was to find how to guide dissection that improved tractography. Here, we utilized inclusion and exclusion ROIs, as well as maximum streamline lengths. To improve future tractography results, it is crucial to understand what constraints are necessary (a task which is beyond the scope of the current study), and what additional constraints (i.e., clustering, seeding, and filtering) may be successful. These constraints will almost certainly vary across the algorithm employed, and the system under investigation (i.e., the brain itself and the pathway of interest). In the human brain, these may vary on an individual subject basis (and animal regions will likely not be directly translatable to the human brain—see discussion below). For this reason, it is important to think critically about how these pathways are “defined” anatomically, the nomenclature used to describe them, and how best to replicate them using tractography tools. For example, pathways may be defined as connecting cortical region A to region B, or as a bundle that passes through/over/under region C (Wakana et al. 2007; Catani and de Schotten 2015; Wassermann et al. 2013, 2016; Mori et al. 2008; Landman et al. 2007). In summary, it is clear that how a pathway is defined may influence constraints, and a consensus is not clearly within reach in humans, although efforts (or discussions) are underway (Mandonnet et al. 2018; Panesar and Fernandez-Miranda 2019).

At first glance, these results may seem intuitive. If we choose a highly sensitive algorithm that connects everything to everything, the idea that we can detect connections to all areas of the brain seems obvious. However, the key is when ensuring specificity—the connections of these algorithms may go through regions the true pathways do not, and these are eliminated through exclusion regions. Despite these exclusions, we are still able to achieve high true-positive rates. This is certainly an encouraging result, as it means that removing false positives does not also necessarily remove excessive true positives at the same time (i.e., streamlines not only connect to the correct regions, but also pass through the correct regions along their route). The alternative, brute force algorithm would be to connect every seed voxel to every target voxel, through every inclusion voxel, to guarantee that all true-positive regions are able to be reached without traversing TN regions. However, the current algorithms do not do this, and we are able to achieve successful results employing widely used algorithms in the literature.

Challenges in validation, tractography, and “gold” standards

The presented results tell us that on a region-by-region basis, we are able to achieve a high specificity and high sensitivity. However, it is clear that there are still spurious streamlines within both cortical and white matter regions (see Fig. 3). While these will still contribute to false-positive areas (if they are truly false positive over the defined regions), these results may not end up on the optimal end of the ROC curves if this was analyzed on a voxel-wise basis. However, the ground truth is only defined over anatomically meaningful gray and white matter regions rather than on the scale of voxels, an analysis which would be complicated, because tracer and tractography are performed on different physical brains (although analysis on the same brain has been performed, we have chosen this dataset as it most clearly highlighted the fundamental accuracy limitations of tractography). As described in Thomas et al. (2014), while connectivity strength may vary across animals and injections, the presence or absence of connections is likely to be similar across monkeys.

This highlights the significance, and importantly the limitations, of what we choose to call our gold standards, and the methods used to validate tractography (Dyrby et al. 2018). In this study, it is clear that we do not match the reference atlas perfectly, although our ROC analysis suggests near perfect results. When validating, it is important to ask how important are individual streamlines to the analysis, how large are the anatomical white and gray matter regions which we should use to designate our ground truth, and how finely parcellated do we need these areas to be (entire gyral folds? Divisions into gyral crowns, walls, and sulcal fundi? Or on the scale of individual cortical columns?)? The answers to these questions may be based on the intended application of the tractography analysis. The use as a connectivity tool may require more finely detailed anatomical connections and localization, while the use as a segmentation tool may only necessitate accuracy on the coarser scale the size of the bundles themselves—although it is clear that accuracy in both cases requires prior constraints. However, use of tractography as an exploratory analysis (i.e., searching for new pathways, or connections to regions that are not well characterized) will have limited accuracy on both streamline and region-to-region bases without some prior anatomical knowledge, and results should be interpreted with care without strong independent validation (i.e., histological tracers and dissection).

Similarly, another major limitation is that the “gold” standard [chosen by Thomas et al. (2014)] is based on the very same reference used to constrain the tracking, which may bias the results. However, the aim was to investigate whether, with prior knowledge, it is possible to achieve a high sensitivity and specificity. In the human brain (or any brain), researchers and clinicians have the same ability to constrain streamlines to where they do/do not want them to go, which makes this a valid approach. And, again, we emphasize that given the ground truth defined previously, with anatomical accuracy defined and described as an ROI-based sensitivity/specificity measure, tractography can be highly accurate (at least for the pathways investigated). Finally, the present study presents a best-case scenario not only in terms of utilizing a priori knowledge, but also of the pathways chosen to validate (projection areas of M1 and V4) which are generally major projection systems with larger, well-defined projections. The use of these two systems was motivated by their use and careful manual delineations and ground truth definitions in Thomas et al. (2014) which were chosen as two exemplar orthogonally oriented systems. Even in this ideal model system, a “perfect” sensitivity and specificity was not achieved, with false negatives observed at greater distances from injection site, and biases or inaccuracies at exact cortical terminations, in line with the previous studies (Donahue et al. 2016; Schilling et al. 2018b; Reveley et al. 2015).

Generalizability

These results lead to an important set of open questions regarding generalizability. First, how should anatomical rules be defined to ensure that they generalize not only across subjects, but also across tracking algorithms? The exact set of constraints are almost certainly not optimal for all methods of generating streamlines. Clearly, these rules will differ with varying bundle-segmentation approaches, with much more flexibility and freedom in manually placed ROIs, whereas atlas-based labels are fixed and may provide the ability to include and/or exclude desired regions depending on how fine-grained the parcellation is. Next, how do these guiding principles change in healthy versus diseased individuals? It is critical that any guidance in either segmentation or connectivity analysis generalize to subjects with anomalous diffusion and structural properties of both normal and abnormal (tumorous) tissue. Finally, while the filtering approach used here is most directly related to the field of bundle segmentation, what physical, anatomical, or structural priors or rules can be used that will generalize to the connectomics field that will reduce invalid connections while ensuring the existence of valid connections? Major progress in the connectomics and bundle-segmentation field has taken place with advanced filtering and/or spatial priors based on anatomy (Warrington et al. 2020; Rheault et al. 2019, 2020; Girard et al. 2014; St-Onge et al. 2018), microstructure (Girard et al. 2017), and the diffusion signal itself (conservation of density) (Daducci et al. 2015; Smith et al. 2013). We believe the next big steps involve multimodal integration of these and orthogonal techniques used to probe the human connectome—for example myelin (Alonso-Ortiz et al. 2015; Ganzetti et al. 2014), BOLD contrasts (Galinsky and Frank 2015; 2017; Gore et al. 2019; Ding et al. 2018; Huang et al. 2018), functional imaging (Deslauriers-Gauthier et al. 2019; Galinsky et al. 2018), and quantitative microdissection (Benedictis et al. 2018), which will lead to a better understanding of the fundamental rules governing the structural organization and connectivity of the brain and endeavors to fully incorporate these into tractography algorithms. In essence, all of these facilitate the adoption of rules, for example ways to include, exclude, or generate streamlines in the same way approached through this study, which can lead to breakthroughs in the anatomical accuracy of tractography—as quantitatively shown in this study.

Finally, validation in animal models does not necessarily validate this methodology (or tractography in general) in humans. These results, and specifically these constraints, are not necessarily immediately generalizable—especially to different pathways, subjects, or pathology in particular. Additionally, non-anatomical constraints, such as path curvature or anisotropy thresholds, are not immediately translatable from this non-human ex vivo model. In this study, we have the advantage of detailed histological tracings to define our constraints. In the human, there is a tremendous wealth of information from anatomists, gleaned from histological and blunt dissection methods. This knowledge, while it may not be able to constrain tracking to the degree used here, should be used on a pathway-by-pathway basis to define and refine constraints. Thus, collaboration between the anatomy and diffusion communities is needed to reach general agreement on defining pathways—a good first step would be describing locations, areas, or general boundaries where pathways start, where they end, and regions they do/do not pass through.

Conclusion

Tractography, even if performed on high-quality diffusion MRI data with sophisticated methods, is faced with an inherent trade-off between sensitivity and specificity (the “sensitivity/specificity curse”) and it seems that additional information is needed to overcome these limitations. In this work, we show that tractography implemented as a bundle-segmentation technique, incorporating prior knowledge, can indeed be highly anatomically accurate. Importantly, this necessitates detailed knowledge of where pathways go and where they do not go. In this study, this knowledge is translated into constraints in the bundle dissection process which allows dissection and filtering of the desired streamlines from potentially many invalid streamlines. These techniques of using anatomical constraints to define inclusion/exclusion criteria have been utilized previously in bundle dissection studies, and we propose that connectomics studies should consider similar constraints guided by known anatomical, developmental, or microstructural rules.

Materials and methods

The aim of the methodology is to duplicate the process of a clinician, neuroanatomist, or researcher that may be manually delineating a fiber bundle, i.e., by applying and adapting guidelines until the streamlines best replicate the ground truth WM anatomy of the pathway of interest (for example when comparing to neuroanatomy textbooks, prior knowledge, or tractography protocols). We first describe the ground truth dataset and accuracy assessment, followed by a description of how pathways were created and delineated.

Ground truth and accuracy assessment

Figure 1 displays the datasets and ground truth derivation used in this study—for a detailed description of the histology, we refer to (Schmahmann and Pandya 2006), and for the acquisition and delineation of MRI, we refer to Thomas et al. (2014). Briefly, the ground truth is based on two anterograde tracer injections within (A) the precentral gyrus (PCG) corresponding to the foot region of the motor cortex [Case #28 in Schmahmann and Pandya (2006)] and (B) the ventral part of area V4 [Case #21 in Schmahmann and Pandya (2006)] of a rhesus macaque—these are the same injection sites utilized in Thomas et al. (2014). Slides were digitized and tracer substance (i.e., connection to the injection site) was delineated on individual slices of the reference atlas by the authors of Thomas et al. (2014) (Fig. 1a).

MRI acquisition is performed on an ex vivo rhesus monkey brain, and scanned over ~ 71 h using a 3D diffusion-weighted EPI PGSE sequence (b value = 4800 s/mm2, 7 b = 0 volumes, 121 DWIs with directions distributed over a tessellated icosahedral hemisphere). The tracer-labeled regions were transposed to the same space as the diffusion data [by the authors of Thomas et al. (2014)] for each MRI slice that was anatomically matched with the histology slice from the reference atlas. An example b = 0 (“b0”) slice from approximately the same anatomical location is shown (Fig. 1b), along with the tracer results in MRI space (Fig. 1c). Finally, gray and white matter ROIs were manually delineated on the high-resolution data (Thomas et al. 2014), and the agreement between tracer results and tractography was assessed in terms of the number of true-positive (TP), false-negative (FN), false-positive (FP), and true-negative (TN) connections, which are used to compute specificity [TN/(TN + FP)] and sensitivity [TP/(TP + FN)].

Tractography and pathway delineation

Two different methods of streamline generation and subsequent pathway delineation were investigated, representative of the approaches and software which the authors (KS and LP) choose in their own anatomical investigations. First, we utilized manually drawn ROIs (Wakana et al. 2007; Catani and de Schotten 2015), defining regions by hand where streamlines must go and where they must not go. Second, we made use of predefined anatomical regions defined in a macaque template to serve as inclusion and exclusion regions.

Manual delineation

Local voxel-wise reconstruction and orientation estimation were performed using constrained spherical deconvolution (Tournier et al. 2008) [one of the techniques investigated in both (Thomas et al. 2014) and (Schilling et al. 2018a)] implemented in the MrTrix3 software package (Tournier et al. 2012, 2019). Probabilistic tractography was performed (iFOD2 algorithm) using software default parameters, propagating pathways from randomly selected points throughout the brain until 5 million streamlines were generated throughout the whole brain. From this set of whole brain streamlines, subsets of pathways from the injection sites were virtually dissected.

Pathways connecting to the dorsal part of area 4 in the PCG were constrained and extracted using both the written descriptions and tracer visualizations from Case #28 of “Fiber Pathways of the Brain” (pages 322–328) (Schmahmann and Pandya 2006), while those connecting the ventral part of V4 were extracted using the descriptions and visualizations from Case #21. Example delineations for 5 “pathways” from the PCG, and corresponding tractography constraints, are described in detail below (and shown in Fig. 2). Importantly, this was an iterative manual process, where both inclusion and exclusion regions were added, removed, and translated until streamlines qualitatively matched the ground truth displayed in Figs. 4 and 5, as well as continually quantifying sensitivity/specificity until we determined that region placement was near-optimal for this set of streamlines. Importantly, the labeled regions used to quantify sensitivity/specificity were not used as exclusion/inclusion regions and were not used in the manual delineation process.

Local association fibers—rostrally directed fibers (Fig. 2, first row).

Here, Schmahmann, and Pandya (2006) describe “Diffuse terminations adjacent to the injection site are seen in area 4”, and thus, we utilize 1 inclusion ROI (we note that the injection region is used as an inclusion ROI in all examples, thus is not included in the constraint count)—placed on two separate slices rostral to the seed at approximately atlas slices #85 and #89 (Schmahmann and Pandya 2006)—as well as a maximum streamline length of 6 mm (which we found to be a trade-off between including additional streamlines at the expense of streamlines extending caudally past the seed).

Long association fibers—rostrally directed fibers (Fig. 2, second row)

Rostral to the injection site, “fibers travel in the white matter… [and] a small contingent of fibers near the inject site gathers at the upper bank and depth of the cingulate sulcus. These fibers terminate in the cortex at the depth of the cingulate sulcus…” in motor areas M3 (see slice #81 in for reference) (Schmahmann and Pandya 2006). To replicate this, we utilize two inclusion ROIs, three exclusion ROIs, and a maximum length of 40 mm. The inclusion ROIs (again, the injection region is also an inclusion ROI) force the pathways to go through the white matter adjacent and rostral to the seed, and re-enter the cortex at the cingulate sulcus. The exclusion regions exclude interhemispheric crossing at the mid-sagittal plane, fibers extending anteriorly once entering the cortex, and fibers entering or adjacent to the striatal bundle.

Commissural fibers (Fig. 2, third row)

The commissural fibers descend into the white matter of the precentral gyrus (Schmahmann and Pandya 2006), and “move medially to enter the corpus callosum, and head towards the opposite hemisphere.” For these fibers, we use one inclusion region at the mid-sagittal slice of the corpus callosum, and three exclusion regions excluding all other interhemispheric connections (i.e., incorrect “jumps” across hemispheres from the superior parietal lobe), fibers entering the cingulate, and fibers that project laterally before moving medially.

Striatal fibers (Fig. 2, fourth row)

The striatal bundle and Muratoff bundle descend from the injection site and terminate in the body and head of the caudate nucleus. Some fibers additionally traverse the dorsal internal capsule to terminate in the putamen (Schmahmann and Pandya 2006). For these systems, we utilized two large inclusion ROIs (one volume for the Putamen; one for the caudate nucleus) although we did not enforce streamlines to pass through both (i.e., they only had to pass/terminate in one or the other), and thus, these could be considered a single region. Additionally, we implemented five exclusion regions to prevent thalamic terminations, interhemispheric fibers (i.e., after traversing entirely through the caudate), and fibers extending too far laterally or posterior.

Subcortical fibers—pontine bundle (Fig. 2, fifth row)

Fibers in the pontine bundle “descend in the central and medial parts of the rostral posterior limb…and enter the cerebral peduncle as they continue into the brainstem” (Schmahmann and Pandya 2006). For these fibers, we included one simple inclusion region (following procedures very similar to that from Wakana et al. (2007) for the corticospinal tract) and included several exclusion ROIs. These exclusion ROIs were drawn on a number of orthogonal and oblique slices to limit pathways that took tortuous trajectories to reach the internal capsule, traveled across hemispheres, or left and re-entered the expected pathways.


For case #28 (PCG), 15 separate bundles, or sets of fibers/streamlines, were extracted: 2 sets of local association fibers (1 rostrally and caudally directed), 3 sets of caudally directed long association fibers, 4 sets of rostrally directed long association fibers, 4 sets of commissural and subcortical fibers (1 commissural, 1 terminating in thalamic nuclei, 1 terminating in the subthalamic nucleus, and 1 set through the cerebral peduncles), and 2 sets of striatal fibers traveling through the putamen and caudate nucleus. For case #21 (V4), 10 sets of fibers were extracted: 2 sets of local association fibers, 2 sets of caudally directed long association fibers, 1 set of rostrally directed long association fibers, 1 set of commissural fibers, and 4 sets of striatal fibers (1 terminating in the genu of the caudate nucleus, 1 in the body and head of the caudate nucleus, 1 terminating in the putamen, and 1 with fibers entering the claustrum).

We note that sub-divisions and classification decisions are made based on written descriptions (Schmahmann and Pandya 2006), and decisions made during the iterative process, although it is likely that separate sets of streamlines could have been combined, for example by concatenating constraints. This process was very much iterative. Regions were removed, added, or edited until pathways reached desired results. Once deemed “acceptable”, sensitivity/specificity analysis was run, and more corrections performed based on quantitative results. Approximately 50 h were spent in the process of creating and editing ROIs (see discussion on feasibility on human data and applicability to clinical and research applications).

Template-based delineation

Local voxel-wise reconstruction and orientation estimation were performed using constrained spherical deconvolution (Tournier et al. 2008) implemented in the Dipy software package (Garyfallidis et al. 2014). Probabilistic tractography was performed (LocalTracking algorithm) using software default parameters (step size = 0.5 × voxel size, max length = 800 steps), propagating pathways from randomly selected points throughout the brain until 1 million streamlines were generated.

As in the manually drawn ROIs, pathways connecting to area 4 of PCG and V4 were extracted using both written descriptions and tracer visualizations from Case #28 and Case #21, respectively. However, in this case, we utilized a template of predefined anatomical regions defined in a standard atlas space. We chose the PennCHOP macaque template (Feng et al. 2017), which represents a good compromise of cortical, subcortical, and white matter ROIs.

Example dissections for two pathways from the PCG, and corresponding constraints, are described in detail below (and shown in Fig. 3). Again, this was an iterative process, typically involving defining the endpoints based on known anatomy, followed by refinement through exclusion regions or forcing pathways to go through specified WM regions.

Striatal fibers

As described in Schmahmann and Pandya (2006), the striatal bundles descend from the injection site, enter the corona radiata and dorsal aspect of the external capsule, and “terminate in the dorsal segment of the claustrum as well as lateral sectors of the putamen throughout most of its rostrocaudal extent”. Focusing first on putamen streamlines, we select only streamlines with an endpoint in the putamen. While streamlines do enter the corona radiata, spurious looping streamlines are apparent, which are removed through the use of six exclusion ROIs (anterior, posterior, and retrolenticular limb of the internal capsule, fornix, thalamus, and amygdala) in addition to a length threshold of 50 mm.

Commissural fibers

Again, the commissural fibers enter the corpus callosum and head towards the opposite hemisphere. For these fibers, we use the body of the corpus callosum as an inclusion region, followed by several exclusion regions that exclude regions where these fibers are known not to pass through before traversing hemispheres (cingulum, Thalamus, fornix, posterior limb of the internal capsule, extreme capsule, and posterior cingulate gyrus).


For case #28 (PCG), 12 separate bundles, or sets of fibers/streamlines, were extracted (note that differences in manual delineations are due to constraints in template ROI parcellations): 3 sets of local association fibers (1 rostrally directed, 1 caudally directed, and one with a simple length threshold), 2 sets of caudally directed long association fibers, 1 set of rostrally directed long association fibers, 4 sets of commissural and subcortical fibers (2 commissural through the splenium and through the body of the corpus callosum, 1 passing through the cerebral peduncles, and 1 terminating in the thalamus), and 2 sets of striatal fibers (putamen and caudate nucleus), and 1 set of fibers projecting through the anterior corona radiata. For case #21 (V4), 12 sets of fibers were extracted: 4 sets of rostrally directed long association fibers (ending in the occipital gyrus, angular gyrus, inferior temporal gyrus, and middle temporal gyrus), 2 sets of caudally directed long association fibers (occipital gyrus and lingual gyrus), 1 set of commissural fibers (through the body of the corpus callosum), 3 sets of striatal fibers (1 terminating in the caudate nucleus, 1 in the putamen, and 1 in the claustrum), and 1 set of fibers projecting through the extreme capsule.