Abstract
The vast expansion from mossy fibers to cerebellar granule cells (GrC) produces a neural representation that supports functions including associative and internal model learning. This motif is shared by other cerebellum-like structures and has inspired numerous theoretical models. Less attention has been paid to structures immediately presynaptic to GrC layers, whose architecture can be described as a ‘bottleneck’ and whose function is not understood. We therefore develop a theory of cerebellum-like structures in conjunction with their afferent pathways that predicts the role of the pontine relay to cerebellum and the glomerular organization of the insect antennal lobe. We highlight a new computational distinction between clustered and distributed neuronal representations that is reflected in the anatomy of these two brain structures. Our theory also reconciles recent observations of correlated GrC activity with theories of nonlinear mixing. More generally, it shows that structured compression followed by random expansion is an efficient architecture for flexible computation.
Similar content being viewed by others
Code availability
All the simulations and analyses were performed using custom code written in Python (https://www.python.org), and can be downloaded at https://www.columbia.edu/~spm2176/code/muscinelli_2023.zip.
References
Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Bell, C. C., Han, V. & Sawtell, N. B. Cerebellum-like structures and their implications for cerebellar function. Annu. Rev. Neurosci. 31, 1–24 (2008).
Marr, D. A theory of cerebellar cortex. J. Physiol. 202, 437–470 (1969).
Babadi, B. & Sompolinsky, H. Sparseness and expansion in sensory representations. Neuron 83, 1213–1226 (2014).
Litwin-Kumar, A., Harris, K. D., Axel, R., Sompolinsky, H. & Abbott, L. F. Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164.e7 (2017).
Cayco-Gajic, N. A. & Silver, R. A. Re-evaluating circuit mechanisms underlying pattern separation. Neuron 101, 584–602 (2019).
Brodal, P. & Bjaalie, J. G. Organization of the pontine nuclei. Neurosci. Res. 13, 83–118 (1992).
Chen, W. R. & Shepherd, G. M. The olfactory glomerulus: a cortical module with specific functions. J. Neurocytol. 34, 353–360 (2005).
Bhandawat, V., Olsen, S. R., Gouwens, N. W., Schlief, M. L. & Wilson, R. I. Sensory processing in the Drosophila antennal lobe increases reliability and separability of ensemble odor representations. Nat. Neurosci. 10, 1474–1482 (2007).
Olsen, S. R. & Wilson, R. I. Lateral presynaptic inhibition mediates gain control in an olfactory circuit. Nature 452, 956–960 (2008).
Olsen, S. R., Bhandawat, V. & Wilson, R. I. Divisive normalization in olfactory population codes. Neuron 66, 287–299 (2010).
Guo, J.-Z. et al. Disrupting cortico-cerebellar communication impairs dexterity. eLife 10, e65906 (2021).
Wagner, M. J. et al. Shared cortex-cerebellum dynamics in the execution and learning of a motor task. Cell 177, 669–682.e24 (2019).
Vosshall, L. B., Wong, A. M. & Axel, R. An olfactory sensory map in the fly brain. Cell 102, 147–159 (2000).
Marin, E. C., Jefferis, G. S. X. E., Komiyama, T., Zhu, H. & Luo, L. Representation of the glomerular olfactory map in the Drosophila brain. Cell 109, 243–255 (2002).
Wong, A. M., Wang, J. W. & Axel, R. Spatial representation of the glomerular map in the Drosophila protocerebrum. Cell 109, 229–241 (2002).
Berck, M. E. et al. The wiring diagram of a glomerular olfactory system. eLife 5, e14859 (2016).
Bates, A. S. et al. Complete connectomic reconstruction of olfactory projection neurons in the fly brain. Curr. Biol. 30, 3183–3199.e6 (2020).
Chadderton, P., Margrie, T. W. & Häusser, M. Integration of quanta in cerebellar granule cells during sensory processing. Nature 428, 856–860 (2004).
Ito, I., Ong, R. C.-Y., Raman, B. & Stopfer, M. Sparse odor representation and olfactory learning. Nat. Neurosci. 11, 1177–1184 (2008).
Kolkman, K. E., McElvain, L. E. & du Lac, S. Diverse precerebellar neurons share similar intrinsic excitability. J. Neurosci. 31, 16665–16674 (2011).
Shenoy, K. V., Sahani, M. & Churchland, M. M. Cortical control of arm movements: a dynamical systems perspective. Annu. Rev. Neurosci. 36, 337–359 (2013).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Caron, S. J. C., Ruta, V., Abbott, L. F. & Axel, R. Random convergence of olfactory inputs in the Drosophila mushroom body. Nature 497, 113–117 (2013).
Gruntman, E. & Turner, G. C. Integration of the olfactory code across dendritic claws of single mushroom body neurons. Nat. Neurosci. 16, 1821–1829 (2013).
Hallem, E. A. & Carlson, J. R. Coding of odors by a receptor repertoire. Cell 125, 143–160 (2006).
Friedrich, R. W. & Wiechert, M. T. Neuronal circuits and computations: pattern decorrelation in the olfactory bulb. FEBS Lett. 588, 2504–2513 (2014).
Schlegel, P. et al. Information flow, cell types and stereotypy in a full olfactory connectome. eLife 10, e66018 (2021).
Peters, A. J., Lee, J., Hedrick, N. G., O’Neil, K. & Komiyama, T. Reorganization of corticospinal output during motor learning. Nat. Neurosci. 20, 1133–1141 (2017).
Wolpert, D. M., Miall, R. C. & Kawato, M. Internal models in the cerebellum. Trends Cogn. Sci. 2, 338–347 (1998).
Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response. Neuron 97, 953–966.e8 (2018).
Saxena, S., Russo, A. A., Cunningham, J. & Churchland, M. M. Motor cortex activity across movement speeds is predicted by network-level strategies for generating muscle activity. eLife 11, e67620 (2022).
Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23, 260–270 (2020).
Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 15, 267–273 (1982).
Pehlevan, C. & Chklovskii, D. B. Optimization theory of Hebbian/anti-Hebbian networks for PCA and whitening. In 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA 1458–1465 (Allerton, 2015).
Schwarz, C. & Thier, P. Binding of signals relevant for action: towards a hypothesis of the functional role of the pontine nuclei. Trends Neurosci. 22, 443–451 (1999).
Pehlevan, C., Hu, T. & Chklovskii, D. B. A Hebbian/anti-Hebbian neural network for linear subspace learning: a derivation from multidimensional scaling of streaming data. Neural Comput. 27, 1461–1495 (2015).
Barak, O., Rigotti, M. & Fusi, S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J. Neurosci. 33, 3844–3856 (2013).
Ganguli, S. & Sompolinsky, H. Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annu. Rev. Neurosci. 35, 485–508 (2012).
Barlow, H. B. in Sensory Communication (ed. Rosenblith, W. A.) 216–234 (MIT Press, 1961).
Atick, J. J. Could information theory provide an ecological theory of sensory processing? Netw. Comput. Neural Syst. 3, 213–251 (1992).
Simoncelli, E. P. Vision and the statistics of the visual environment. Curr. Opin. Neurobiol. 13, 144–149 (2003).
Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37, 233–243 (1991).
Benna, M. K. & Fusi, S. Place cells may simply be memory cells: memory compression leads to spatial tuning and history dependence. Proc. Natl Acad. Sci. USA 118, e2018422118 (2021).
Baldi, P. & Hornik, K. Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2, 53–58 (1989).
Apps, R. & Garwicz, M. Anatomical and physiological foundations of cerebellar information processing. Nat. Rev. Neurosci. 6, 297–311 (2005).
Oscarsson, O. Functional organization of the spino- and cuneocerebellar tracts. Physiol. Rev. 45, 495–522 (1965).
Kennedy, A. et al. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nat. Neurosci. 17, 416–422 (2014).
Bratton, B. & Bastian, J. Descending control of electroreception. II. Properties of nucleus praeeminentialis neurons projecting directly to the electrosensory lateral line lobe. J. Neurosci. 10, 1241–1253 (1990).
Kazama, H. & Wilson, R. I. Origins of correlated activity in an olfactory circuit. Nat. Neurosci. 12, 1136–1144 (2009).
Chapochnikov, N. M., Pehlevan, C. & Chklovskii, D. B. Normative and mechanistic model of an adaptive circuit for efficient encoding and feature extraction. Proc. Natl Acad. Sci. USA 120, e21174841 (2023).
Kebschull, J. M. et al. Cerebellar nuclei evolved by repeatedly duplicating a conserved cell-type set. Science 370, eabd5059 (2020).
Barbosa, J., Proville, R., Rodgers, C. C., Ostojic, S. & Boubenec, Y. Flexible selection of task-relevant features through across-area population gating. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500962 (2022).
Leergaard, T. B. & Bjaalie, J. G. Topography of the complete corticopontine projection: from experiments to principal Maps. Front. Neurosci. 1, 211–223 (2007).
Kratochwil, C. F., Maheshwari, U. & Rijli, F. M. The long journey of pontine nuclei neurons: from rhombic lip to cortico-ponto-cerebellar circuitry. Front. Neural Circuits https://doi.org/10.3389/fncir.2017.00033 (2017).
Mihailoff, G. A., Lee, H., Watt, C. B. & Yates, R. Projections to the basilar pontine nuclei from face sensory and motor regions of the cerebral cortex in the rat. J. Comp. Neurol. 237, 251–263 (1985).
Lanore, F., Cayco-Gajic, N. A., Gurnani, H., Coyle, D. & Silver, R. A. Cerebellar granule cell axons support high-dimensional representations. Nat. Neurosci. 24, 1142–1150 (2021).
Xie, M., Muscinelli, S., Harris, K. D. & Litwin-Kumar, A. Task-dependent optimal representations for cerebellar learning. Preprint at bioRxiv https://doi.org/10.1101/2022.08.15.504040 (2022).
Stewart, G. W. The efficient generation of random orthogonal matrices with an application to condition estimators. SIAM J. Numer. Anal. 17, 403–409 (1980).
Abbott, L. F., Rajan, K. & Sompolinsky, H. The Dynamic Brain: An Exploration of Neuronal Variability and its Functional Significance (eds Ding, M. & Glanzman, D.) 65–82 (Oxford Academic, 2011).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2017).
Fagg, A., Sitkoff, N., Barto, A. & Houk, J. Cerebellar learning for control of a two-link arm in muscle space. In Proc. of International Conference on Robotics and Automation, Albuquerque, NM, USA, Vol. 3, 2638–2644 (IEEE, 1997).
Acknowledgements
We would like to thank M. Xie, A. Hantman, B. Sauerbrei, J. Kadmon and R. Warren for helpful discussions and comments. We would also like to thank L.F. Abbott, N. Sawtell, M. Beiran, K. Lakshminarasimhan, N.A. Cayco-Gajic for their comments on the manuscript. The Wagner laboratory is supported by the NINDS Intramural Research Program. A.L.-K. and S.P.M. were supported by the Gatsby Charitable Foundation, National Science Foundation award DBI-1707398, and the Simons Collaboration on the Global Brain. S.P.M. was also supported by the Swartz Foundation. A.L.-K. was also supported by the Burroughs Wellcome Foundation, the McKnight Endowment Fund and NIH award R01EB029858. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01.
Author information
Authors and Affiliations
Contributions
S.P.M. and A.L.-K. conceived the study. S.P.M. performed simulations and analyses. M.J.W. performed the experiments and provided the data. S.P.M., M.J.W. and A.L.-K. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Learned compression is not beneficial when the input representation is unstructured.
a: Performance over learning when the compression weights are being trained using error backpropagation. Parameters are the same as in Fig. 2a. The solid line and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations. b: Left: Fraction of error for different network architecture when the input representation consists of random and uncorrelated Gaussian patterns, as in previous work4,5. Single-step expansion performs significantly better than learned compression (two-sided Welch’s t-test, n = 10, t = 4.82, p = 2.4 ⋅ 10−4), presumably due to incomplete convergence of gradient descent, and comparably to whitening compression. Parameters: N = D = P = 500, M = 2000, f = 0.1, σ = 0.1. Right: same as the left panel, but with Nc = N/2 instead of Nc = N. Single-step expansion performs significantly better than learned compression (two-sided Welch’s t-test, n = 10, t = 26.8, p = 1.3 ⋅ 10−15). The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. Parameters: N = D = P = 500, M = 2000, f = 0.1, σ = 0.1. In both left and right panels, the task-relevant input PC eigenvalues were set to not decay (p = 0) in contrast to previous figures, to consider a fully unstructured input representation.
Extended Data Fig. 2 Sign-constrained compression for clustered and distributed representations.
a: Distribution of the excitatory compression weights that maximize the \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-{\Delta }_{{{{\rm{c}}}}})}^{2}\), in the presence of a distributed input representation. b: Standard deviation of the out-degree of the input for the same compression matrix as in a, averaged across 10 realizations (red dashed line). The gray histogram represents the distribution of the same quantity for a compression matrix with the same sparsity but shuffled entries. c, d: Performance of a network with purely excitatory compression in the presence of a distributed input representation. Solid lines and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations, respectively. Parameters are the same as in Fig. 3e. c: Fraction of errors on a random classification task as a function of the redundancy in the input representation N/D. d: For fixed N/D = 10, network performance for different network architectures, as in Fig. 2a. ‘Excitatory’ indicates a network whose compression weights are trained to maximize the Hebbian SNR at the compression layer, that is \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-\Delta c)}^{2}\), while unconstrained indicates a network trained on the same objective but without sign constraints on the weights. Excitatory and optimal compression are not statistically different for n = 10). The training procedure is the same used in Fig. 2a. The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. e, f: Increasing input redundancy yields a smaller benefit when considering clustered input representations. All the parameters are the same as c, d, except for the type of input representation. e: Same as c, but for a clustered input representation. f: Same as d, but for a clustered input representation. Purely excitatory compression does not achieve the performance of whitening (two-sided Welch’s t-test, t-statistics = 10.615, p = 2.54 ⋅ 10−11, n = 10) nor of unconstrained compression trained with the same objective (two-sided Welch’s t-test, t-statistics = 8.563, p =9.19 ⋅ 10−8, n = 10). In panels c, e the shaded regions indicate the standard deviation across 10 network realizations.
Extended Data Fig. 3 Realistic properties of odor receptor responses.
a: Covariance of single odor receptor responses, computed from the Hallem-Carlson dataset26, sorted according to the response variances. b: Histogram of off-diagonal terms in the covariance matrix in a (in red), compared to a shuffle distribution (blue) obtained by shuffling the responses to different odorants for a given odor receptor. c: Mean of off-diagonal elements of the data covariance matrix (red dashed line), compared to the histogram of the same mean for the shuffled responses as in b (blue). The mean of the original data is significantly larger than the mean of the shuffle distribution (permutation test, p < 10−4). d: Geometrical representation of tuning vectors that are aligned (yellow) versus not aligned (black) with principal components (gray), corresponding to clustered and distributed compression layer representations, respectively. e: Dimension expansion dim(m)/ dim(x) at the expansion layer plotted against the in-degree of expansion layer neurons K. f: Same as e, but showing the fraction of errors on a random classification task instead of the dimension. g: Same as e, right, but showing the noise at the expansion layer instead of the dimension. In panels e-g, the solid lines and shaded areas indicate the mean and standard error of the mean across network realizations, respectively. Network parameters: N = 1000, M = 2000, Nc = D = P = 50, p = 1, f = 0.1, and σ = 0.1.
Extended Data Fig. 4 Effect of architectural parameters on the effectiveness of Hebbian plasticity.
a: Dependence of the network performance on Nc. Notice that performance saturates for relatively large values of Nc. b-d: The non-monotonic behavior of the network performance with L is robust to changes in Nc (b), N (c) and M (d). The optimal L moderately increases with N and it seems to start saturating for N > 500. e: Left: schematics of the setup in which compression weights are learned with Hebbian plasticity. Right: resulting mean squared overlaps between the rows of the compression matrix and the principal components, as a function of PC index. f: Same as e, but when compression weights are learned using Hebbian and anti-Hebbian learning rules in the presence of recurrent inhibition. We used the learning rule proposed in35 (see their Eq. (18)) to learn the compression weights. This learning scheme updates both the feedforward (excitatory/inhibitory) and the recurrent (inhibitory only) weights to introduce competition among compression layer units, enabling the extraction of sub-leading PCs. Notice that the decay is slower than without recurrent inhibition, indicating that several PCs are estimated considerably better, especially for large L. Unless otherwise stated, parameters were N = 500, Nc = 250, M = 5000, f = 0.1, D = P = 50, σ = 0.5, p = 0.1.
Extended Data Fig. 5 Learning a forward model of a two-joint arm.
a: Performance on the forward model task is non-monotonic with the pontine in-degree L. We plot the MSE on the forward model task as a function of L for the network with and without feedback from DCN. The best L is of the same order as we found for the classification task in Fig. 6a. We set σ = 1, while all the other parameters are the same as in Fig. 6e. The solid lines and shaded areas indicate the mean and standard deviation of the MSE across network realizations, respectively. b: DCN feedback leads to higher overlap of compression weights with signal principal components. We define the overlap of the weights onto unit i of the compression layer with the jth PC as \({{{{\rm{overlap}}}}}_{{{{\rm{ij}}}}}=\mathop{\sum }\nolimits_{k = 1}^{N}{G}_{ik}{A}_{kj}\), where G is the compression matrix learned without (left) or with (right) the feedback from DCN, while A is the embedding matrix of the task-relevant components (blue) or task-irrelevant components (red). The violin plot shows the mean and distribution of the overlaps across compression layer units. We set σ = 1.8 and L = 50, while all the other parameters are the same as in Fig. 6e. In the violin plots, the whiskers indicate the entire data range, and the horizontal line indicates the median of the distribution. c: Performance on the forward model task while the compression weights are adjusted using our modified version of Oja’s rule in the presence of feedback from DCN, for two different levels of input noise and two target dimensions. All the other parameters are the same as in Fig. 6e.
Extended Data Fig. 6 Dimension and noise contributions to local decorrelation performance.
a, b: Dimension (a) and noise (b) contributions to the performance shown in Fig. 8b, using the same parameters. c, d: Dimension (c) and noise (d) contributions to the performance shown in Fig. 8c, using the same parameters. e-g: Dimension (e) and noise (f) contributions to the performance (g), for the antennal lobe architecture, as a function of the in-degree of Kenyon cells K. Input was generated using a clustered representation. The green dashed line indicates the value obtained with optimal compression. The parameters were chosen to be consistent with the insect olfactory system anatomy, that is D = Nc = 50, N = 1000, M = 2000, p = 1, f = 0.1, σ = 1, P = 100. Note that when K ≥ 8, the local decorrelation strategy requires more synapses than the optimal compression one, for which K = 7 and L = 20. h, i: Dimension (h) and noise (i) contributions to the performance shown in Fig. 8d, using the same parameters. For all panels, the shaded areas indicate the standard deviation across network realizations.
Extended Data Fig. 7 Effect of nonlinearities at the compression layer.
To achieve a performance with nonlinear compression layer units comparable to that of linear units, we set Nc = 250. To maximize the dimension of the compression layer after the nonlinearity, we also introduced a random rotation of the optimal compression matrix (see Methods 5). a: Dimension of the compression layer representation for linear versus nonlinear (ReLU) compression. For ReLU compression, the nonlinearity is applied after random (left), PC-aligned (center), and whitening compression (right). b: Same as a, but showing the noise strength at the compression layer Δc. c: Same as a, but showing the fraction of errors in the random classification task. In panels a-c, the box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. d: Fraction of errors over training when the compression weights are trained using gradient descent and the compression layer units are nonlinear (ReLU). For comparison, the horizontal dashed lines indicate the performance of networks with linear compression layer units. The solid lines indicate the mean over 10 network realizations and the shading indicates the standard deviation across network realizations. e: Performance at convergence for the same networks as in d. For all panels, parameters were N = D = P = 500, Nc = 250, M = 2000, f = 0.1, fc = 0.3, and σ = 0.1.
Extended Data Fig. 8 Expansion layer dimension and noise strength depend on compression layer dimension and noise strength.
a: Dimension of the expansion layer representation as a function of the compression layer one. The compression layer representation was distributed, and its dimension was varied by changing p between 0 and 1. b: Noise strength Δm at the expansion layer as a function of the noise strength at the compression layer. Noise was additive, Gaussian, and isotropic at the compression layer, with standard deviation varying from 0 to 0.1. In both panels, solid lines show the theoretical result and dots are simulation results, averaged over 10 network realizations. Standard deviation of numerical simulations is not visible because it is smaller than the size of the marker. Parameters: Nc = 100, M = 1000, f = 0.1.
Supplementary information
Supplementary Information
Supplementary Modeling Note.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Muscinelli, S.P., Wagner, M.J. & Litwin-Kumar, A. Optimal routing to cerebellum-like structures. Nat Neurosci 26, 1630–1641 (2023). https://doi.org/10.1038/s41593-023-01403-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-023-01403-7
- Springer Nature America, Inc.