Optimal routing to cerebellum-like structures

Muscinelli, Samuel P.; Wagner, Mark J.; Litwin-Kumar, Ashok

doi:10.1038/s41593-023-01403-7

Optimal routing to cerebellum-like structures

Article
Published: 21 August 2023

Volume 26, pages 1630–1641, (2023)
Cite this article

From

View current issue Submit your manuscript

5010 Accesses
4 Citations
43 Altmetric
Explore all metrics

Abstract

The vast expansion from mossy fibers to cerebellar granule cells (GrC) produces a neural representation that supports functions including associative and internal model learning. This motif is shared by other cerebellum-like structures and has inspired numerous theoretical models. Less attention has been paid to structures immediately presynaptic to GrC layers, whose architecture can be described as a ‘bottleneck’ and whose function is not understood. We therefore develop a theory of cerebellum-like structures in conjunction with their afferent pathways that predicts the role of the pontine relay to cerebellum and the glomerular organization of the insect antennal lobe. We highlight a new computational distinction between clustered and distributed neuronal representations that is reflected in the anatomy of these two brain structures. Our theory also reconciles recent observations of correlated GrC activity with theories of nonlinear mixing. More generally, it shows that structured compression followed by random expansion is an efficient architecture for flexible computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: Similar routing architecture to expanded representations.**

**Fig. 2: Selectivity to task-relevant dimensions determines learning performance.**

**Fig. 3: Optimal compression for clustered and distributed representations.**

**Fig. 4: Compression of clustered representations in the insect olfactory system.**

**Fig. 5: Compression of distributed representations in the corticocerebellar pathway.**

**Fig. 6: Biologically plausible learned compression.**

**Fig. 7: Bottleneck model can explain correlations and selectivity of recorded GrC.**

**Fig. 8: Comparison between bottleneck architecture and single-step network.**

Cerebellar granule cell axons support high-dimensional representations

Article 24 June 2021

Evolution of the Marr-Albus-Ito Model

The Input-Output Organization of the Cerebrocerebellum as Kalman Filter

Data availability

The data analyzed in this study was previously published in Hallem and Carlson²⁶ and Wagner et al.¹³, and is available upon request.

Code availability

All the simulations and analyses were performed using custom code written in Python (https://www.python.org), and can be downloaded at https://www.columbia.edu/~spm2176/code/muscinelli_2023.zip.

References

Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Article CAS PubMed Google Scholar
Bell, C. C., Han, V. & Sawtell, N. B. Cerebellum-like structures and their implications for cerebellar function. Annu. Rev. Neurosci. 31, 1–24 (2008).
Article CAS PubMed Google Scholar
Marr, D. A theory of cerebellar cortex. J. Physiol. 202, 437–470 (1969).
Article CAS PubMed PubMed Central Google Scholar
Babadi, B. & Sompolinsky, H. Sparseness and expansion in sensory representations. Neuron 83, 1213–1226 (2014).
Article CAS PubMed Google Scholar
Litwin-Kumar, A., Harris, K. D., Axel, R., Sompolinsky, H. & Abbott, L. F. Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164.e7 (2017).
Article PubMed PubMed Central Google Scholar
Cayco-Gajic, N. A. & Silver, R. A. Re-evaluating circuit mechanisms underlying pattern separation. Neuron 101, 584–602 (2019).
Article CAS PubMed PubMed Central Google Scholar
Brodal, P. & Bjaalie, J. G. Organization of the pontine nuclei. Neurosci. Res. 13, 83–118 (1992).
Article CAS PubMed Google Scholar
Chen, W. R. & Shepherd, G. M. The olfactory glomerulus: a cortical module with specific functions. J. Neurocytol. 34, 353–360 (2005).
Article PubMed Google Scholar
Bhandawat, V., Olsen, S. R., Gouwens, N. W., Schlief, M. L. & Wilson, R. I. Sensory processing in the Drosophila antennal lobe increases reliability and separability of ensemble odor representations. Nat. Neurosci. 10, 1474–1482 (2007).
Article CAS PubMed PubMed Central Google Scholar
Olsen, S. R. & Wilson, R. I. Lateral presynaptic inhibition mediates gain control in an olfactory circuit. Nature 452, 956–960 (2008).
Article CAS PubMed PubMed Central Google Scholar
Olsen, S. R., Bhandawat, V. & Wilson, R. I. Divisive normalization in olfactory population codes. Neuron 66, 287–299 (2010).
Article CAS PubMed PubMed Central Google Scholar
Guo, J.-Z. et al. Disrupting cortico-cerebellar communication impairs dexterity. eLife 10, e65906 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wagner, M. J. et al. Shared cortex-cerebellum dynamics in the execution and learning of a motor task. Cell 177, 669–682.e24 (2019).
Article PubMed PubMed Central Google Scholar
Vosshall, L. B., Wong, A. M. & Axel, R. An olfactory sensory map in the fly brain. Cell 102, 147–159 (2000).
Article CAS PubMed Google Scholar
Marin, E. C., Jefferis, G. S. X. E., Komiyama, T., Zhu, H. & Luo, L. Representation of the glomerular olfactory map in the Drosophila brain. Cell 109, 243–255 (2002).
Article CAS PubMed Google Scholar
Wong, A. M., Wang, J. W. & Axel, R. Spatial representation of the glomerular map in the Drosophila protocerebrum. Cell 109, 229–241 (2002).
Article CAS PubMed Google Scholar
Berck, M. E. et al. The wiring diagram of a glomerular olfactory system. eLife 5, e14859 (2016).
Article PubMed PubMed Central Google Scholar
Bates, A. S. et al. Complete connectomic reconstruction of olfactory projection neurons in the fly brain. Curr. Biol. 30, 3183–3199.e6 (2020).
Article PubMed PubMed Central Google Scholar
Chadderton, P., Margrie, T. W. & Häusser, M. Integration of quanta in cerebellar granule cells during sensory processing. Nature 428, 856–860 (2004).
Article CAS PubMed Google Scholar
Ito, I., Ong, R. C.-Y., Raman, B. & Stopfer, M. Sparse odor representation and olfactory learning. Nat. Neurosci. 11, 1177–1184 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kolkman, K. E., McElvain, L. E. & du Lac, S. Diverse precerebellar neurons share similar intrinsic excitability. J. Neurosci. 31, 16665–16674 (2011).
Article CAS PubMed PubMed Central Google Scholar
Shenoy, K. V., Sahani, M. & Churchland, M. M. Cortical control of arm movements: a dynamical systems perspective. Annu. Rev. Neurosci. 36, 337–359 (2013).
Article CAS PubMed Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article Google Scholar
Caron, S. J. C., Ruta, V., Abbott, L. F. & Axel, R. Random convergence of olfactory inputs in the Drosophila mushroom body. Nature 497, 113–117 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gruntman, E. & Turner, G. C. Integration of the olfactory code across dendritic claws of single mushroom body neurons. Nat. Neurosci. 16, 1821–1829 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hallem, E. A. & Carlson, J. R. Coding of odors by a receptor repertoire. Cell 125, 143–160 (2006).
Article CAS PubMed Google Scholar
Friedrich, R. W. & Wiechert, M. T. Neuronal circuits and computations: pattern decorrelation in the olfactory bulb. FEBS Lett. 588, 2504–2513 (2014).
Article CAS PubMed Google Scholar
Schlegel, P. et al. Information flow, cell types and stereotypy in a full olfactory connectome. eLife 10, e66018 (2021).
Article CAS PubMed PubMed Central Google Scholar
Peters, A. J., Lee, J., Hedrick, N. G., O’Neil, K. & Komiyama, T. Reorganization of corticospinal output during motor learning. Nat. Neurosci. 20, 1133–1141 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wolpert, D. M., Miall, R. C. & Kawato, M. Internal models in the cerebellum. Trends Cogn. Sci. 2, 338–347 (1998).
Article CAS PubMed Google Scholar
Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response. Neuron 97, 953–966.e8 (2018).
Article PubMed PubMed Central Google Scholar
Saxena, S., Russo, A. A., Cunningham, J. & Churchland, M. M. Motor cortex activity across movement speeds is predicted by network-level strategies for generating muscle activity. eLife 11, e67620 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23, 260–270 (2020).
Article CAS PubMed PubMed Central Google Scholar
Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 15, 267–273 (1982).
Article CAS PubMed Google Scholar
Pehlevan, C. & Chklovskii, D. B. Optimization theory of Hebbian/anti-Hebbian networks for PCA and whitening. In 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA 1458–1465 (Allerton, 2015).
Schwarz, C. & Thier, P. Binding of signals relevant for action: towards a hypothesis of the functional role of the pontine nuclei. Trends Neurosci. 22, 443–451 (1999).
Article CAS PubMed Google Scholar
Pehlevan, C., Hu, T. & Chklovskii, D. B. A Hebbian/anti-Hebbian neural network for linear subspace learning: a derivation from multidimensional scaling of streaming data. Neural Comput. 27, 1461–1495 (2015).
Article PubMed Google Scholar
Barak, O., Rigotti, M. & Fusi, S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J. Neurosci. 33, 3844–3856 (2013).
Article CAS PubMed PubMed Central Google Scholar
Ganguli, S. & Sompolinsky, H. Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annu. Rev. Neurosci. 35, 485–508 (2012).
Article CAS PubMed Google Scholar
Barlow, H. B. in Sensory Communication (ed. Rosenblith, W. A.) 216–234 (MIT Press, 1961).
Atick, J. J. Could information theory provide an ecological theory of sensory processing? Netw. Comput. Neural Syst. 3, 213–251 (1992).
Article Google Scholar
Simoncelli, E. P. Vision and the statistics of the visual environment. Curr. Opin. Neurobiol. 13, 144–149 (2003).
Article CAS PubMed Google Scholar
Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37, 233–243 (1991).
Article CAS Google Scholar
Benna, M. K. & Fusi, S. Place cells may simply be memory cells: memory compression leads to spatial tuning and history dependence. Proc. Natl Acad. Sci. USA 118, e2018422118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Baldi, P. & Hornik, K. Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2, 53–58 (1989).
Article Google Scholar
Apps, R. & Garwicz, M. Anatomical and physiological foundations of cerebellar information processing. Nat. Rev. Neurosci. 6, 297–311 (2005).
Article CAS PubMed Google Scholar
Oscarsson, O. Functional organization of the spino- and cuneocerebellar tracts. Physiol. Rev. 45, 495–522 (1965).
Article CAS PubMed Google Scholar
Kennedy, A. et al. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nat. Neurosci. 17, 416–422 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bratton, B. & Bastian, J. Descending control of electroreception. II. Properties of nucleus praeeminentialis neurons projecting directly to the electrosensory lateral line lobe. J. Neurosci. 10, 1241–1253 (1990).
Article CAS PubMed PubMed Central Google Scholar
Kazama, H. & Wilson, R. I. Origins of correlated activity in an olfactory circuit. Nat. Neurosci. 12, 1136–1144 (2009).
Article CAS PubMed PubMed Central Google Scholar
Chapochnikov, N. M., Pehlevan, C. & Chklovskii, D. B. Normative and mechanistic model of an adaptive circuit for efficient encoding and feature extraction. Proc. Natl Acad. Sci. USA 120, e21174841 (2023).
Article Google Scholar
Kebschull, J. M. et al. Cerebellar nuclei evolved by repeatedly duplicating a conserved cell-type set. Science 370, eabd5059 (2020).
Article CAS PubMed PubMed Central Google Scholar
Barbosa, J., Proville, R., Rodgers, C. C., Ostojic, S. & Boubenec, Y. Flexible selection of task-relevant features through across-area population gating. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500962 (2022).
Leergaard, T. B. & Bjaalie, J. G. Topography of the complete corticopontine projection: from experiments to principal Maps. Front. Neurosci. 1, 211–223 (2007).
Article PubMed PubMed Central Google Scholar
Kratochwil, C. F., Maheshwari, U. & Rijli, F. M. The long journey of pontine nuclei neurons: from rhombic lip to cortico-ponto-cerebellar circuitry. Front. Neural Circuits https://doi.org/10.3389/fncir.2017.00033 (2017).
Article PubMed PubMed Central Google Scholar
Mihailoff, G. A., Lee, H., Watt, C. B. & Yates, R. Projections to the basilar pontine nuclei from face sensory and motor regions of the cerebral cortex in the rat. J. Comp. Neurol. 237, 251–263 (1985).
Article CAS PubMed Google Scholar
Lanore, F., Cayco-Gajic, N. A., Gurnani, H., Coyle, D. & Silver, R. A. Cerebellar granule cell axons support high-dimensional representations. Nat. Neurosci. 24, 1142–1150 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xie, M., Muscinelli, S., Harris, K. D. & Litwin-Kumar, A. Task-dependent optimal representations for cerebellar learning. Preprint at bioRxiv https://doi.org/10.1101/2022.08.15.504040 (2022).
Stewart, G. W. The efficient generation of random orthogonal matrices with an application to condition estimators. SIAM J. Numer. Anal. 17, 403–409 (1980).
Article Google Scholar
Abbott, L. F., Rajan, K. & Sompolinsky, H. The Dynamic Brain: An Exploration of Neuronal Variability and its Functional Significance (eds Ding, M. & Glanzman, D.) 65–82 (Oxford Academic, 2011).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2017).
Fagg, A., Sitkoff, N., Barto, A. & Houk, J. Cerebellar learning for control of a two-link arm in muscle space. In Proc. of International Conference on Robotics and Automation, Albuquerque, NM, USA, Vol. 3, 2638–2644 (IEEE, 1997).

Download references

Acknowledgements

We would like to thank M. Xie, A. Hantman, B. Sauerbrei, J. Kadmon and R. Warren for helpful discussions and comments. We would also like to thank L.F. Abbott, N. Sawtell, M. Beiran, K. Lakshminarasimhan, N.A. Cayco-Gajic for their comments on the manuscript. The Wagner laboratory is supported by the NINDS Intramural Research Program. A.L.-K. and S.P.M. were supported by the Gatsby Charitable Foundation, National Science Foundation award DBI-1707398, and the Simons Collaboration on the Global Brain. S.P.M. was also supported by the Swartz Foundation. A.L.-K. was also supported by the Burroughs Wellcome Foundation, the McKnight Endowment Fund and NIH award R01EB029858. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01.

Author information

Authors and Affiliations

Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY, USA
Samuel P. Muscinelli & Ashok Litwin-Kumar
National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD, USA
Mark J. Wagner

Authors

Samuel P. Muscinelli
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Ashok Litwin-Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.P.M. and A.L.-K. conceived the study. S.P.M. performed simulations and analyses. M.J.W. performed the experiments and provided the data. S.P.M., M.J.W. and A.L.-K. wrote the paper.

Corresponding authors

Correspondence to Samuel P. Muscinelli or Ashok Litwin-Kumar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Learned compression is not beneficial when the input representation is unstructured.

a: Performance over learning when the compression weights are being trained using error backpropagation. Parameters are the same as in Fig. 2a. The solid line and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations. b: Left: Fraction of error for different network architecture when the input representation consists of random and uncorrelated Gaussian patterns, as in previous work^4,5. Single-step expansion performs significantly better than learned compression (two-sided Welch’s t-test, n = 10, t = 4.82, p = 2.4 ⋅ 10⁻⁴), presumably due to incomplete convergence of gradient descent, and comparably to whitening compression. Parameters: N = D = P = 500, M = 2000, f = 0.1, σ = 0.1. Right: same as the left panel, but with N_c = N/2 instead of N_c = N. Single-step expansion performs significantly better than learned compression (two-sided Welch’s t-test, n = 10, t = 26.8, p = 1.3 ⋅ 10⁻¹⁵). The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. Parameters: N = D = P = 500, M = 2000, f = 0.1, σ = 0.1. In both left and right panels, the task-relevant input PC eigenvalues were set to not decay (p = 0) in contrast to previous figures, to consider a fully unstructured input representation.

Extended Data Fig. 2 Sign-constrained compression for clustered and distributed representations.

a: Distribution of the excitatory compression weights that maximize the \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-{\Delta }_{{{{\rm{c}}}}})}^{2}\), in the presence of a distributed input representation. b: Standard deviation of the out-degree of the input for the same compression matrix as in a, averaged across 10 realizations (red dashed line). The gray histogram represents the distribution of the same quantity for a compression matrix with the same sparsity but shuffled entries. c, d: Performance of a network with purely excitatory compression in the presence of a distributed input representation. Solid lines and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations, respectively. Parameters are the same as in Fig. 3e. c: Fraction of errors on a random classification task as a function of the redundancy in the input representation N/D. d: For fixed N/D = 10, network performance for different network architectures, as in Fig. 2a. ‘Excitatory’ indicates a network whose compression weights are trained to maximize the Hebbian SNR at the compression layer, that is \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-\Delta c)}^{2}\), while unconstrained indicates a network trained on the same objective but without sign constraints on the weights. Excitatory and optimal compression are not statistically different for n = 10). The training procedure is the same used in Fig. 2a. The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. e, f: Increasing input redundancy yields a smaller benefit when considering clustered input representations. All the parameters are the same as c, d, except for the type of input representation. e: Same as c, but for a clustered input representation. f: Same as d, but for a clustered input representation. Purely excitatory compression does not achieve the performance of whitening (two-sided Welch’s t-test, t-statistics = 10.615, p = 2.54 ⋅ 10⁻¹¹, n = 10) nor of unconstrained compression trained with the same objective (two-sided Welch’s t-test, t-statistics = 8.563, p =9.19 ⋅ 10⁻⁸, n = 10). In panels c, e the shaded regions indicate the standard deviation across 10 network realizations.

Extended Data Fig. 3 Realistic properties of odor receptor responses.

a: Covariance of single odor receptor responses, computed from the Hallem-Carlson dataset²⁶, sorted according to the response variances. b: Histogram of off-diagonal terms in the covariance matrix in a (in red), compared to a shuffle distribution (blue) obtained by shuffling the responses to different odorants for a given odor receptor. c: Mean of off-diagonal elements of the data covariance matrix (red dashed line), compared to the histogram of the same mean for the shuffled responses as in b (blue). The mean of the original data is significantly larger than the mean of the shuffle distribution (permutation test, p < 10⁻⁴). d: Geometrical representation of tuning vectors that are aligned (yellow) versus not aligned (black) with principal components (gray), corresponding to clustered and distributed compression layer representations, respectively. e: Dimension expansion dim(m)/ dim(x) at the expansion layer plotted against the in-degree of expansion layer neurons K. f: Same as e, but showing the fraction of errors on a random classification task instead of the dimension. g: Same as e, right, but showing the noise at the expansion layer instead of the dimension. In panels e-g, the solid lines and shaded areas indicate the mean and standard error of the mean across network realizations, respectively. Network parameters: N = 1000, M = 2000, N_c = D = P = 50, p = 1, f = 0.1, and σ = 0.1.

Extended Data Fig. 4 Effect of architectural parameters on the effectiveness of Hebbian plasticity.

a: Dependence of the network performance on Nc. Notice that performance saturates for relatively large values of N_c. b-d: The non-monotonic behavior of the network performance with L is robust to changes in N_c (b), N (c) and M (d). The optimal L moderately increases with N and it seems to start saturating for N > 500. e: Left: schematics of the setup in which compression weights are learned with Hebbian plasticity. Right: resulting mean squared overlaps between the rows of the compression matrix and the principal components, as a function of PC index. f: Same as e, but when compression weights are learned using Hebbian and anti-Hebbian learning rules in the presence of recurrent inhibition. We used the learning rule proposed in³⁵ (see their Eq. (18)) to learn the compression weights. This learning scheme updates both the feedforward (excitatory/inhibitory) and the recurrent (inhibitory only) weights to introduce competition among compression layer units, enabling the extraction of sub-leading PCs. Notice that the decay is slower than without recurrent inhibition, indicating that several PCs are estimated considerably better, especially for large L. Unless otherwise stated, parameters were N = 500, N_c = 250, M = 5000, f = 0.1, D = P = 50, σ = 0.5, p = 0.1.

Extended Data Fig. 5 Learning a forward model of a two-joint arm.

a: Performance on the forward model task is non-monotonic with the pontine in-degree L. We plot the MSE on the forward model task as a function of L for the network with and without feedback from DCN. The best L is of the same order as we found for the classification task in Fig. 6a. We set σ = 1, while all the other parameters are the same as in Fig. 6e. The solid lines and shaded areas indicate the mean and standard deviation of the MSE across network realizations, respectively. b: DCN feedback leads to higher overlap of compression weights with signal principal components. We define the overlap of the weights onto unit i of the compression layer with the j^th PC as \({{{{\rm{overlap}}}}}_{{{{\rm{ij}}}}}=\mathop{\sum }\nolimits_{k = 1}^{N}{G}_{ik}{A}_{kj}\), where G is the compression matrix learned without (left) or with (right) the feedback from DCN, while A is the embedding matrix of the task-relevant components (blue) or task-irrelevant components (red). The violin plot shows the mean and distribution of the overlaps across compression layer units. We set σ = 1.8 and L = 50, while all the other parameters are the same as in Fig. 6e. In the violin plots, the whiskers indicate the entire data range, and the horizontal line indicates the median of the distribution. c: Performance on the forward model task while the compression weights are adjusted using our modified version of Oja’s rule in the presence of feedback from DCN, for two different levels of input noise and two target dimensions. All the other parameters are the same as in Fig. 6e.

Extended Data Fig. 6 Dimension and noise contributions to local decorrelation performance.

a, b: Dimension (a) and noise (b) contributions to the performance shown in Fig. 8b, using the same parameters. c, d: Dimension (c) and noise (d) contributions to the performance shown in Fig. 8c, using the same parameters. e-g: Dimension (e) and noise (f) contributions to the performance (g), for the antennal lobe architecture, as a function of the in-degree of Kenyon cells K. Input was generated using a clustered representation. The green dashed line indicates the value obtained with optimal compression. The parameters were chosen to be consistent with the insect olfactory system anatomy, that is D = Nc = 50, N = 1000, M = 2000, p = 1, f = 0.1, σ = 1, P = 100. Note that when K ≥ 8, the local decorrelation strategy requires more synapses than the optimal compression one, for which K = 7 and L = 20. h, i: Dimension (h) and noise (i) contributions to the performance shown in Fig. 8d, using the same parameters. For all panels, the shaded areas indicate the standard deviation across network realizations.

Extended Data Fig. 7 Effect of nonlinearities at the compression layer.

To achieve a performance with nonlinear compression layer units comparable to that of linear units, we set Nc = 250. To maximize the dimension of the compression layer after the nonlinearity, we also introduced a random rotation of the optimal compression matrix (see Methods 5). a: Dimension of the compression layer representation for linear versus nonlinear (ReLU) compression. For ReLU compression, the nonlinearity is applied after random (left), PC-aligned (center), and whitening compression (right). b: Same as a, but showing the noise strength at the compression layer Δ_c. c: Same as a, but showing the fraction of errors in the random classification task. In panels a-c, the box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. d: Fraction of errors over training when the compression weights are trained using gradient descent and the compression layer units are nonlinear (ReLU). For comparison, the horizontal dashed lines indicate the performance of networks with linear compression layer units. The solid lines indicate the mean over 10 network realizations and the shading indicates the standard deviation across network realizations. e: Performance at convergence for the same networks as in d. For all panels, parameters were N = D = P = 500, N_c = 250, M = 2000, f = 0.1, f_c = 0.3, and σ = 0.1.

Extended Data Fig. 8 Expansion layer dimension and noise strength depend on compression layer dimension and noise strength.

a: Dimension of the expansion layer representation as a function of the compression layer one. The compression layer representation was distributed, and its dimension was varied by changing p between 0 and 1. b: Noise strength Δ_m at the expansion layer as a function of the noise strength at the compression layer. Noise was additive, Gaussian, and isotropic at the compression layer, with standard deviation varying from 0 to 0.1. In both panels, solid lines show the theoretical result and dots are simulation results, averaged over 10 network realizations. Standard deviation of numerical simulations is not visible because it is smaller than the size of the marker. Parameters: N_c = 100, M = 1000, f = 0.1.

Supplementary information

Supplementary Information

Supplementary Modeling Note.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Muscinelli, S.P., Wagner, M.J. & Litwin-Kumar, A. Optimal routing to cerebellum-like structures. Nat Neurosci 26, 1630–1641 (2023). https://doi.org/10.1038/s41593-023-01403-7

Download citation

Received: 28 March 2022
Accepted: 12 July 2023
Published: 21 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1038/s41593-023-01403-7
Springer Nature America, Inc.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal routing to cerebellum-like structures

From

Abstract

Access this article

Similar content being viewed by others

Cerebellar granule cell axons support high-dimensional representations

Evolution of the Marr-Albus-Ito Model

The Input-Output Organization of the Cerebrocerebellum as Kalman Filter

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 Learned compression is not beneficial when the input representation is unstructured.

Extended Data Fig. 2 Sign-constrained compression for clustered and distributed representations.

Extended Data Fig. 3 Realistic properties of odor receptor responses.

Extended Data Fig. 4 Effect of architectural parameters on the effectiveness of Hebbian plasticity.

Extended Data Fig. 5 Learning a forward model of a two-joint arm.

Extended Data Fig. 6 Dimension and noise contributions to local decorrelation performance.

Extended Data Fig. 7 Effect of nonlinearities at the compression layer.

Extended Data Fig. 8 Expansion layer dimension and noise strength depend on compression layer dimension and noise strength.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

Navigation

Optimal routing to cerebellum-like structures

Abstract

Access this article

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation