MadMiner: Machine Learning-Based Inference for Particle Physics

Brehmer, Johann; Kling, Felix; Espejo, Irina; Cranmer, Kyle

doi:10.1007/s41781-020-0035-2

MadMiner: Machine Learning-Based Inference for Particle Physics

Original Article
Published: 18 January 2020

Volume 4, article number 3, (2020)
Cite this article

Computing and Software for Big Science Aims and scope Submit manuscript

Johann Brehmer ORCID: orcid.org/0000-0003-3344-4209¹,
Felix Kling^2,3,
Irina Espejo¹ &
…
Kyle Cranmer¹

1331 Accesses
58 Citations
Explore all metrics

Abstract

Precision measurements at the LHC often require analyzing high-dimensional event data for subtle kinematic signatures, which is challenging for established analysis methods. Recently, a powerful family of multivariate inference techniques that leverage both matrix element information and machine learning has been developed. This approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the underlying physics or detector response. In this paper, we introduce MadMiner , a Python module that streamlines the steps involved in this procedure. Wrapping around MadGraph5_aMC and Pythia 8, it supports almost any physics process and model. To aid phenomenological studies, the tool also wraps around Delphes 3, though it is extendable to a full Geant4-based detector simulation. We demonstrate the use of MadMiner in an example analysis of dimension-six operators in ttH production, finding that the new techniques substantially increase the sensitivity to new physics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unbinned multivariate observables for global SMEFT analyses from machine learning

Article Open access 06 March 2023

Herwig 7.0/Herwig++ 3.0 release note

Article Open access 11 April 2016

Generalisation of the identity method for determination of high-order moments of multiplicity distributions with a software implementation

Article Open access 18 May 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The issue of likelihood-free inference, the inference techniques discussed here, and MadMiner just as well apply in a Bayesian setting, see for instance Ref. [56].
Note that this approach is similar in spirit to the Matrix Element Method, which also uses parton-level likelihoods and aims to estimate $r(x | \theta _0, \theta _1)$ by calculating approximate versions of the integral in Eq. (3). But unlike the Matrix Element Method, our machine learning-based approach supports realistic shower and detector simulations and can be evaluated very efficiently.
In fact, the score vector is a generalization of the concept of Optimal Observables [27,28,29] from the parton level to the full statistical model including shower and detector simulation.
The Fisher information defines a metric on the parameter space, giving rise to the field of information geometry [9, 73, 74]. In that formalism, we can also define “global” distances measured along geodesics, which are equivalent to the expected log likelihood ratio even beyond the local approximation of small $\Delta \theta$ [75].
Fundamentally, the presented inference techniques also support new physics effects that affect e. g. the probabilities of shower splittings, but this is currently not supported in MadMiner.
Similarly, important phase-space regions can also be identified using the log likelihood ratio directly [105,106,107].

References

Brehmer J, Cranmer K, Espejo I, Kling F, Louppe G, Pavez J (2019) Effective LHC measurements with matrix elements and machine learning. arxiv: 1906.01578
Cranmer KS (2001) Kernel estimation in high-energy physics. Comput Phys Commun 136:198
ADS MATH Google Scholar
Cranmer K, Lewis G, Moneta L, Shibata A, Verkerke W (2012) (ROOT) HistFactory: a tool for creating statistical models for use with RooFit and RooStats
Frate M, Cranmer K, Kalia S, Vandenberg-Rodes A, Whiteson D (2017) Modeling smooth backgrounds and generic localized signals with gaussian processes. arxiv: 1709.05681
Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Statist 12(4):1151
MathSciNet MATH Google Scholar
Beaumont MA, Zhang W, Balding DJ (2002) Approximate bayesian computation in population genetics. Genetics 162(4):2025
Google Scholar
Alsing J, Wandelt B, Feeney S (2018) Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. arxiv: 1801.01497
Charnock T, Lavaux G, Wandelt BD (2018) Automatic physical inference with information maximizing neural networks. Phys. Rev. D 97(8):083004
ADS Google Scholar
Brehmer J, Cranmer K, Kling F, Plehn T (2017) Better Higgs boson measurements through information geometry. Phys Rev D95(7):073002
ADS MathSciNet Google Scholar
Brehmer J, Kling F, Plehn T, Tait TMP (2018) Better Higgs-CP tests through information geometry. Phys Rev D97(9):095017
ADS Google Scholar
Kondo K (1988) Dynamical likelihood method for reconstruction of events with missing momentum. I. Method and toy models. J Phys Soc Jpn 57:4126
ADS Google Scholar
Abazov VM et al (2004) A precision measurement of the mass of the top quark. Nature 429:638 (DO)
ADS Google Scholar
Artoisenet P, Mattelaer O (2008) MadWeight: automatic event reweighting with matrix elements. PoS CHARGED2008:025
Google Scholar
Gao Y, Gritsan AV, Guo Z, Melnikov K, Schulze M, Tran NV (2010) Spin determination of single-produced resonances at hadron colliders. Phys Rev D81:075022
ADS Google Scholar
Alwall J, Freitas A, Mattelaer O (2011) The matrix element method and QCD radiation. Phys Rev D83:074010
ADS Google Scholar
Bolognesi S, Gao Y, Gritsan AV et al (2012) On the spin and parity of a single-produced resonance at the LHC. Phys Rev D86:095031
ADS Google Scholar
Avery P et al (2013) Precision studies of the Higgs boson decay channel $H \rightarrow ZZ \rightarrow 4l$ with MEKD. Phys Rev D87(5):055006
ADS Google Scholar
Andersen JR, Englert C, Spannowsky M (2013) Extracting precise Higgs couplings by using the matrix element method. Phys Rev D87(1):015019
ADS Google Scholar
Campbell JM, Ellis RK, Giele WT, Williams C (2013) Finding the Higgs boson in decays to $Z \gamma$ using the matrix element method at Next-to-Leading Order. Phys Rev D87(7):073005
ADS Google Scholar
Artoisenet P, de Aquino P, Maltoni F, Mattelaer O (2013) Unravelling $t\overline{t}h$ via the Matrix Element Method. Phys Rev Lett 111(9):091802
ADS Google Scholar
Gainer JS, Lykken J, Matchev KT, Mrenna S, Park M (2013) The matrix element method: past, present, and future. In: Proceedings of community summer study on the future of U.S. particle physics: snowmass on the Mississippi (CSS2013): Minneapolis, MN, USA, July 29–August 6 2013. arxiv: 1307.3546
Schouten D, DeAbreu A, Stelzer B (2015) Accelerated matrix element method with parallel computing. Comput Phys Commun 192:54
ADS MathSciNet Google Scholar
Martini T, Uwer P (2015) Extending the matrix element method beyond the born approximation: calculating event weights at next-to-leading order accuracy. JHEP 09:083
ADS Google Scholar
Gritsan AV, Röntsch R, Schulze M, Xiao M (2016) Constraining anomalous Higgs boson couplings to the heavy flavor fermions using matrix element techniques. Phys Rev D94(5):055023
ADS Google Scholar
Martini T, Uwer P (2017) The Matrix Element Method at next-to-leading order QCD for hadronic collisions: single top-quark production at the LHC as an example application. arxiv: 1712.04527
Kraus M, Martini T, Uwer P (2019) Predicting event weights at next-to-leading order QCD for jet events defined by $2\rightarrow 1$ jet algorithms. arxiv: 1901.08008
Atwood D, Soni A (1992) Analysis for magnetic moment and electric dipole moment form-factors of the top quark via $e^+ e^- \rightarrow t \bar{t}$. Phys Rev D45:2405
ADS Google Scholar
Davier M, Duflot L, Le Diberder F, Rouge A (1993) The Optimal method for the measurement of tau polarization. Phys Lett B306:411
ADS Google Scholar
Diehl M, Nachtmann O (1994) Optimal observables for the measurement of three gauge boson couplings in $e^+ e^- \rightarrow W^+ W^-$. Z Phys C62:397
ADS Google Scholar
Soper DE, Spannowsky M (2011) Finding physics signals with shower deconstruction. Phys Rev D84:074002
ADS Google Scholar
Soper DE, Spannowsky M (2013) Finding top quarks with shower deconstruction. Phys Rev D87:054012
ADS Google Scholar
Soper DE, Spannowsky M (2014) Finding physics signals with event deconstruction. Phys Rev D89(9):094005
ADS Google Scholar
Englert C, Mattelaer O, Spannowsky M (2016) Measuring the Higgs-bottom coupling in weak boson fusion. Phys Lett B756:103
ADS Google Scholar
Fan Y, Nott DJ, Sisson SA (2012) Approximate Bayesian computation via regression density estimation. ArXiv e-prints arxiv: 1212.1479
Dinh L, Krueger D, Bengio Y (2014) NICE: Non-linear Independent Components Estimation. ArXiv e-prints arxiv: 1410.8516
Germain M, Gregor K, Murray I, Larochelle H (2015) MADE: masked autoencoder for distribution estimation. ArXiv e-prints arxiv: 1502.03509
Cranmer K, Pavez J, Louppe G (2015) Approximating likelihood ratios with calibrated discriminative classifiers. arxiv: 1506.02169
Cranmer K, Louppe G (2016) Unifying generative models and exact likelihood-free inference with conditional bijections. J. Brief Ideas
Louppe G, Cranmer K, Pavez J (2016) carl: a likelihood-free inference toolbox. J Open Source Softw 1(1):11
ADS Google Scholar
Dinh L, Sohl-Dickstein J, Bengio S (2016) Density estimation using Real NVP. ArXiv e-prints arxiv: 1605.08803
Papamakarios G, Murray I (2016) Fast $\epsilon$-free inference of simulation models with Bayesian conditional density estimation. arXiv e-prints arXiv:1605.06376
Dutta R, Corander J, Kaski S, Gutmann MU (2016) Likelihood-free inference by ratio estimation. ArXiv e-prints arxiv: 1611.10242
Uria B, Côté M-A, Gregor K, Murray I, Larochelle H (2016) Neural autoregressive distribution estimation. ArXiv e-prints arxiv: 1605.02226
Gutmann MU, Dutta R, Kaski S, Corander J (2017) Likelihood-free inference via classification. Stat Comput 1–15
Tran D, Ranganath R, Blei DM (2017) Hierarchical implicit models and likelihood-free variational inference. ArXiv e-prints arxiv: 1702.08896
Louppe G, Cranmer K (2017) Adversarial variational optimization of non-differentiable simulators. ArXiv e-prints arxiv: 1707.07113
Papamakarios G, Pavlakou T, Murray I (2017) Masked autoregressive flow for density estimation. ArXiv e-prints arxiv: 1705.07057
Lueckmann J-M, Goncalves PJ, Bassetto G, Öcal K, Nonnenmacher M, Macke JH (2017) Flexible statistical inference for mechanistic models of neural dynamics. arXiv e-prints arXiv:1711.01861
Huang C-W, Krueger D, Lacoste A, Courville A (2018) Neural autoregressive flows. ArXiv e-prints arxiv: 1804.00779
Papamakarios G, Sterratt DC, Murray I (2018) Sequential neural likelihood: fast likelihood-free inference with autoregressive flows. ArXiv e-prints arxiv: 1805.07226
Lueckmann J-M, Bassetto G, Karaletsos T, Macke JH (2018) Likelihood-free inference with emulator networks. arXiv e-prints arXiv:1805.09294
Chen TQ, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. CoRR arxiv: abs/1806.07366
Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1x1 convolutions. arXiv e-prints arXiv:1807.03039,
Grathwohl W, Chen RTQ, Bettencourt J, Sutskever I, Duvenaud D (2018) FFJORD: free-form continuous dynamics for scalable reversible generative models. ArXiv e-prints arxiv: 1810.01367
Dinev T, Gutmann MU (2018) Dynamic likelihood-free inference via ratio estimation (DIRE). arXiv e-prints arXiv:1810.09899
Hermans J, Begy V, Louppe G (2019) Likelihood-free MCMC with approximate likelihood ratios. arxiv: 1903.04057
Alsing J, Charnock T, Feeney S, Wandelt B (2019) Fast likelihood-free cosmology with neural density estimators and active learning. arxiv: 1903.00007
Greenberg DS, Nonnenmacher M, Macke JH (2019) Automatic posterior transformation for likelihood-free inference. arXiv e-prints arXiv:1905.07488
Brehmer J, Louppe G, Pavez J, Cranmer K (2018) Mining gold from implicit models to improve likelihood-free inference. arxiv: 1805.12244
Brehmer J, Cranmer K, Louppe G, Pavez J (2018) Constraining effective field theories with machine learning. Phys Rev Lett 121(11):111801
ADS Google Scholar
Brehmer J, Cranmer K, Louppe G, Pavez J (2018) A guide to constraining effective field theories with machine learning. Phys Rev D 98(5):052004
ADS Google Scholar
Stoye M, Brehmer J, Louppe G, Pavez J, Cranmer K (2018) Likelihood-free inference with an improved cross-entropy estimator. arxiv: 1808.00973
Alwall J, Frederix R, Frixione S et al (2014) The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP 07:079
ADS MATH Google Scholar
Sjostrand T, Mrenna S, Skands PZ (2008) A Brief Introduction to PYTHIA 8.1. Comput Phys Commun 178:852
ADS MATH Google Scholar
de Favereau J, Delaere C, Demin P et al (2014) (DELPHES 3): DELPHES 3, A modular framework for fast simulation of a generic collider experiment. JHEP 02:057
Google Scholar
Agostinelli S et al (2003) (GEANT4): GEANT4: A Simulation toolkit. Nucl. Instrum. Meth. A506:250
ADS Google Scholar
Cranmer K Practical Statistics for the LHC. In Proceedings, 2011 European School of High-Energy Physics (ESHEP 2011): Cheile Gradistei, Romania, September 7–20, 2011, pp 267-308, 2015. [247(2015)] arxiv: 1503.07622
Baldi P, Cranmer K, Faucett T, Sadowski P, Whiteson D (2016) Parameterized neural networks for high-energy physics. Eur Phys J C76(5):235
ADS Google Scholar
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9(1):60
MATH Google Scholar
Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc 54(3):426
MathSciNet MATH Google Scholar
Cowan G, Cranmer K, Gross E, Vitells O (2011) Asymptotic formulae for likelihood-based tests of new physics. Eur Phys J C 71:1554 (Erratum: Eur Phys J C73:2501–2013)
ADS Google Scholar
Alsing J, Wandelt B (2018) Generalized massive optimal data compression. Mon Not R Astron So. 476(1):L60
ADS Google Scholar
Efron B (1975) Defining the curvature of a statistical problem (with applications to second order efficiency). Ann Stat 3(6):1189
MathSciNet MATH Google Scholar
Amari S-I (1982) Differential geometry of curved exponential families-curvatures and information loss. Ann Statist 10(2):357
MathSciNet MATH Google Scholar
Brehmer J (2017) New ideas for effective higgs measurements. Ph.D. thesis, U. Heidelberg (main) http://www.thphys.uni-heidelberg.de/~plehn/includes/theses/brehmer_d.pdf
Radhakrishna Rao C (1945) Information and the accuracy attainable in the estimation of statistical parameters. Bull Calcutta Math Soc 37:81
MathSciNet MATH Google Scholar
Cramér H (1946) Mathematical methods of statistics. Princeton University Press, ISBN 0691080046
Edwards TDP, Weniger C (2018) A fresh approach to forecasting in astroparticle physics and dark matter searches. JCAP 1802(02):021
ADS Google Scholar
Degrande C, Duhr C, Fuks B, Grellscheid D, Mattelaer O, Reiter T (2012) UFO—The Universal FeynRules Output. Comput Phys Commun 183:1201
ADS Google Scholar
Mattelaer O (2016) On the maximal use of Monte Carlo samples: re-weighting events at NLO accuracy. Eur Phys J C76(12):674
ADS Google Scholar
Aad G et al (2015) A morphing technique for signal modelling in a multidimensional space of coupling parameters. Physics note ATL-PHYS-PUB-2015-047. http://cds.cern.ch/record/2066980 (ATLAS)
Alsing J, Wandelt B (2019) Nuisance hardened data compression for fast likelihood-free inference. arxiv: 1903.01473
Lukas M Feickert, Stark G, Turra R, Forde J (2018) diana-hep/pyhf v0.0.15 https://doi.org/10.5281/zenodo.1464139
Frederix R, Frixione S, Hirschi V, Maltoni F, Pittau R, Torrielli P (2012) Four-lepton production at hadron colliders: aMC@NLO predictions with theoretical uncertainties. JHEP 02:099
ADS Google Scholar
Paszke A, Gross S, Chintala S et al. (2017) Automatic differentiation in pytorch. In: NIPS-W
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145
Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv e-prints arXiv:1412.6980
Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations
Lakshminarayanan B, Pritzel A, Blundell C (2016) Simple and scalable predictive uncertainty estimation using deep ensembles. arXiv e-prints arXiv:1612.01474
Brehmer J, Kling F, Espejo I, Cranmer K (2019) MadMiner code repository. https://doi.org/10.5281/zenodo.1489147
Brehmer J, Kling F, Espejo I, Cranmer K (2019) MadMiner technical documentation. https://madminer.readthedocs.io/en/latest/
Espejo I, Brehmer J, Cranmer K (2019) MadMiner Docker repositories. https://hub.docker.com/u/madminertool
Šimko T, Heinrich L, Hirvonsalo H, Kousidis D, Rodríguez D (2018) REANA: a system for reusable research data analyses. Technical Report CERN-IT-2018-003, CERN, Geneva. https://cds.cern.ch/record/2652340
Espejo I, Brehmer J, Kling F, Cranmer K (2019) MadMiner Reana deployment. https://github.com/irinaespejo/workflow-madminer
The HDF Group: Hierarchical data format version 5, 2000–2010. http://www.hdfgroup.org/HDF5
Dobbs M, Hansen JB (2001) The HepMC C++ Monte Carlo event record for High Energy Physics. Comput Phys Commun 134:41
ADS Google Scholar
Rodrigues E, Marinangeli M, Pollack B et al (2019) scikit-hep/scikit-hep: scikit-hep-0.5.1 https://doi.org/10.5281/zenodo.3234683
Oliphant T (2006): NumPy: A guide to NumPy. USA: Trelgol Publishing. http://www.numpy.org/
Butterworth J et al (2016) PDF4LHC recommendations for LHC Run II. J Phys G43:023001
ADS Google Scholar
de Florian D et al, (LHC Higgs Cross Section Working Group) (2016) Handbook of LHC Higgs cross sections: 4. Deciphering the Nature of the Higgs Sector arXiv:1610:07922
Giudice GF, Grojean C, Pomarol A, Rattazzi R (2007) The strongly-interacting light Higgs. JHEP 06:045
ADS Google Scholar
Alloul A, Fuks B, Sanz V (2014) Phenomenology of the Higgs Effective Lagrangian via FEYNRULES. JHEP 04:110
ADS Google Scholar
Maltoni F, Vryonidou E, Zhang C (2016) Higgs production in association with a top-antitop pair in the standard model effective field theory at NLO in QCD. JHEP 10:123
ADS Google Scholar
Cepeda M, et al (Physics of the HL-LHC Working Group) (2019) Higgs physics at the HL-LHC and HE-LHC. arxiv: 1902.00134
Plehn T, Schichtel P, Wiegand D (2014) Where boosted significances come from. Phys Rev D89(5):054002
ADS Google Scholar
Kling F, Plehn T, Schichtel P (2017) Maximizing the significance in Higgs boson pair analyses. Phys Rev D95(3):035026
ADS Google Scholar
Gonçalves D, Han T, Kling F, Plehn T, Takeuchi M (2018) Higgs boson pair production at future hadron colliders: From kinematics to dynamics. Phys Rev D97(11):113004
ADS Google Scholar
Merkel D (2014) Docker: Lightweight linux containers for consistent development and deployment. Linux J 2014:239
Google Scholar
Kluyver T, Ragan-Kelley B, Pérez F et al. (2016) Jupyter notebooks—a publishing format for reproducible computational workflows. In: ELPUB
Hunter JD (2007) Matplotlib: A 2d graphics environment. Comput Sci Eng 9(3):90
Google Scholar
Lukas: lukasheinrich/pylhe v0.0.4, 2018. https://doi.org/10.5281/zenodo.1217032
Sjstrand T, Ask S, Christiansen JR et al (2015) An Introduction to PYTHIA 8.2. Comput Phys Commun 191:159
ADS MATH Google Scholar
Van Rossum G, Drake FL Jr (1995) Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands
Rodrigues E (2019) The Scikit-HEP Project. In: 23rd International conference on computing in high energy and nuclear physics (CHEP 2018) Sofia, Bulgaria, 9–13 July 2018. arxiv: 1905.00002
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825
MathSciNet MATH Google Scholar
Pivarski J, Das P, Smirnov D et al. (2019) scikit-hep/uproot: 3.7.2. https://doi.org/10.5281/zenodo.3256257
Heinrich L, Cranmer K (2017) diana-hep/yadage v0.12.13. https://doi.org/10.5281/zenodo.1001816

Download references

Acknowledgements

We would like to thank Zubair Bhatti, Lukas Heinrich, Alexander Held, and Samuel Homiller for their important contributions to the development of MadMiner . We are grateful to Joakim Olsson for his help with the tth data generation. We also thank Pablo de Castro, Sally Dawson, Gilles Louppe, Olivier Mattelaer, Duccio Pappadopulo, Michael Peskin, Tilman Plehn, Josh Rudermann, and Leonora Vesterbacka for fruitful discussions. Last but not least, we are grateful to the authors and maintainers of many open-source software packages, including Delphes 3 [65], Docker [108], Jupyter notebooks [109], MadGraph5_aMC [63], Matplotlib [110], NumPy [98], pylhe [111], Pythia 8 [112], Python [113], PyTorch [85], REANA [93], scikit-hep [114], scikit-learn [115], uproot [116], and yadage [117]. This work was supported by the U.S. National Science Foundation (NSF) under the awards ACI-1450310, OAC-1836650, and OAC-1841471. It was also supported through the NYU IT High Performance Computing resources, services, and staff expertise. JB and KC are grateful for the support of the Moore–Sloan data science environment at NYU. KC is also supported through the NSF grant PHY-1505463, while FK is supported by NSF grant PHY-1620638 and U. S. Department of Energy grant DE-AC02-76SF00515.

Author information

Authors and Affiliations

Center for Data Science and Center for Cosmology and Particle Physics, New York University, New York, NY, 10003, USA
Johann Brehmer, Irina Espejo & Kyle Cranmer
Department of Physics and Astronomy, University of California, Irvine, CA, 92697, USA
Felix Kling
SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA, 94025, USA
Felix Kling

Authors

Johann Brehmer
View author publications
You can also search for this author in PubMed Google Scholar
Felix Kling
View author publications
You can also search for this author in PubMed Google Scholar
Irina Espejo
View author publications
You can also search for this author in PubMed Google Scholar
Kyle Cranmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johann Brehmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Frequently Asked Questions

Here, we collect questions that are asked often, hoping to avoid misconceptions:

Does the whole event history not change when I change parameters?

No. In probabilistic processes such as those at the LHC, any given event history is typically compatible with different values of the theory parameters, but might be more or less likely. With “event history” we mean the entire evolution of a simulated particle collision, ranging from the initial-state and final-state elementary particles through the parton shower and detector interactions to observables. The joint likelihood ratio and joint score quantify how much more or less likely one particular such evolution of a simulated event becomes when the theory parameters are varied.
If the network is trained on parton-level matrix element information, how does it learn about the effect of shower and detector?

It is true that the “labels” that the networks are trained on, the joint likelihood ratio and joint score, are based on parton-level information. However, the inputs into the neural network are observables based on a full simulation chain, after parton shower, detector effects, and the reconstruction of observables. It was shown in Ref. [59,60,61] that the joint likelihood ratio and joint score are unbiased, but noisy, estimators of the true likelihood ratio and true score (including shower and detector effects). A network trained in the right way will, therefore, learn the effect of shower and detector. We illustrate this mechanism in Sect. 5.1 in a one-dimensional problem.
Can this approach be used for signal-background classification?

Yes. In the simplest case, where the signal and background hypothesis do not depend on any additional parameters, the Carl, Rolr, or Alice techniques can be used to learn the probability of an individual event being signal or background. If there are parameters of interest such as a signal strength or the mass of a resonance, the score becomes useful and techniques such as Sally, Rascal, Cascal, and Alices can be more powerful.

The techniques that use the joint likelihood ratio or score require less training data when the signal and background processes populate the same phase-space regions. If this is not the case, these methods still apply, but will not offer an advantage over the traditional training of binary classifiers.
What if the simulations do not describe the physics accurately?

No simulator is perfect, but many of the techniques used for incorporating systematic uncertainties from mismodeling in the case of multivariate classifiers can also be used in this setting. For instance, often, the effect of mismodeling can be corrected with simple scale factors and the residual uncertainty incorporated with nuisance parameters. MadMiner can handle such systematic uncertainties as discussed above. If only particular phase-space regions are problematic, for instance those with low-energy jets, we recommend to exclude these parameter regions with suitable selection cuts. If the kinematic distributions are trusted, but the overall normalization is less well known, a data-driven normalization can be used.

Of course, there is no silver bullet, and if the simulation code is not trustworthy at all in a particular process and the uncertainty cannot be quantified with nuisance parameters, these methods (and many more traditional analysis methods) will not provide accurate results.
Is the neural network a black box?

Neural networks are often criticized for their lack of explainability. It is true that the internal structure of the network is not directly interpretable, but in MadMiner , the interpretation of what the network is trying to learn is clearly connected to the matrix element. In practical terms, one of the challenges is to verify whether a network has been successfully trained. For that purpose, many cross-checks and diagnostic tools are available to make sure that this is the case:
- checking the loss function on a separate validation sample;
- training of multiple network instances with independent random seeds, as discussed above;
- checking the expectation values of the score and likelihood ratio against their known true values, see Ref. [61];
- varying of the reference hypothesis in the likelihood ratio, see Ref. [61];
- training classifiers between data reweighted with the estimated likelihood ratio and original data from a new parameter point, see Ref. [61];
- validating the inference techniques in low-dimensional problems with histograms, see Sect. 5.1;
- validating the inference techniques on a parton-level scenario with tractable likelihood function, see Sect. 5.2; and
- checking the asymptotic distribution of the likelihood ratio against Wilks’ theorem [69,70,71].
Finally, when limits are set based on the Neyman construction with toy experiments (rather than using the asymptotic properties of the likelihood ratio), there is a coverage guarantee: the exclusion contours constructed in this way will not exclude the true point more often than the confidence level. No matter how wrong the likelihood, likelihood ratio, or score function estimated by the neural network is, the final limits might lose statistical power, but will never be too optimistic.
Are you trying to replace PhD students with a machine?

As a preemptive safety measure against scientists being made redundant by automated inference algorithms, we have implemented a number of bugs in MadMiner . It will take skilled physicists to find them, ensuring safe jobs for a while. More seriously, just as MadGraph automated the process of generating events for an arbitrary hard scattering process, MadMiner aims to contribute to the automation of several steps in the inference chain. Both developments enhance the productivity of physicists.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brehmer, J., Kling, F., Espejo, I. et al. MadMiner: Machine Learning-Based Inference for Particle Physics. Comput Softw Big Sci 4, 3 (2020). https://doi.org/10.1007/s41781-020-0035-2

Download citation

Received: 08 August 2019
Accepted: 03 January 2020
Published: 18 January 2020
DOI: https://doi.org/10.1007/s41781-020-0035-2

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MadMiner: Machine Learning-Based Inference for Particle Physics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unbinned multivariate observables for global SMEFT analyses from machine learning

Herwig 7.0/Herwig++ 3.0 release note

Generalisation of the identity method for determination of high-order moments of multiplicity distributions with a software implementation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Frequently Asked Questions

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

MadMiner: Machine Learning-Based Inference for Particle Physics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unbinned multivariate observables for global SMEFT analyses from machine learning

Herwig 7.0/Herwig++ 3.0 release note

Generalisation of the identity method for determination of high-order moments of multiplicity distributions with a software implementation

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Frequently Asked Questions

Appendix: Frequently Asked Questions

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation