Deep learning tools can incorporate all of the available information into a search for new particles, thus making the best use of the available data. This paper reviews how to optimally integrate ...information with deep learning and explicitly describes the corresponding sources of uncertainty. Simple illustrative examples show how these concepts can be applied in practice.
Jet substructure has emerged to play a central role at the Large Hadron Collider (LHC), where it has provided numerous innovative new ways to search for new physics and to probe the Standard Model in ...extreme regions of phase space. In this article we provide a comprehensive review of state of the art theoretical and machine learning developments in jet substructure. This article is meant both as a pedagogical introduction, covering the key physical principles underlying the calculation of jet substructure observables, the development of new observables, and cutting edge machine learning techniques for jet substructure, as well as a comprehensive reference for experts. We hope that it will prove a useful introduction to the exciting and rapidly developing field of jet substructure at the LHC.
A
bstract
Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect ...simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.
A variety of techniques have been proposed to train machine learning classifiers that are independent of a given feature. While this can be an essential technique for enabling background estimation, ...it may also be useful for reducing uncertainties. We carefully examine theory uncertainties, which typically do not have a statistical origin. We will provide explicit examples of two-point (fragmentation modeling) and continuous (higher-order corrections) uncertainties where decorrelating significantly reduces the apparent uncertainty while the true uncertainty is much larger. These results suggest that caution should be taken when using decorrelation for these types of uncertainties as long as we do not have a complete decomposition into statistically meaningful components.
A
bstract
As most target final states for searches and measurements at the Large Hadron Collider have a particular quark/gluon composition, tools for distinguishing quark- from gluon-initiated jets ...can be very powerful. In addition to the difficulty of the classification task, quark-versus-gluon jet tagging is challenging to calibrate. The difficulty arises from the topology dependence of quark-versus-gluon jet tagging: since quarks and gluons have net quantum chromodynamic color charge while only colorless hadrons are measured, the radiation pattern inside a jet of a particular type depends on the rest of its environment. Given a definition of a quark or gluon jet, this paper studies the topology dependence of such jets in simulation. A set of phase space regions and jet substructure observables are identified for further comparative studies between generators and eventually in data.
A
bstract
Decays of Higgs boson-like particles into multileptons is a well-motivated process for investigating physics beyond the Standard Model (SM). A unique feature of this final state is the ...precision with which the SM is known. As a result, simulations are used directly to estimate the background. Current searches consider specific models and typically focus on those with a single free parameter to simplify the analysis and interpretation. In this paper, we explore recent proposals for signal model agnostic searches using machine learning in the multilepton final state. These tools can be used to simultaneously search for many models, some of which have no dedicated search at the Large Hadron Collider. We find that the machine learning methods offer broad coverage across parameter space beyond where current searches are sensitive, with a necessary loss of performance compared to dedicated searches by only about one order of magnitude.
Simulations play a key role for inference in collider physics. We explore various approaches for enhancing the precision of simulations using machine learning, including interventions at the end of ...the simulation chain (reweighting), at the beginning of the simulation chain (pre-processing), and connections between the end and beginning (latent space refinement). To clearly illustrate our approaches, we use W + jets matrix element surrogate simulations based on normalizing flows as a prototypical example. First, weights in the data space are derived using machine learning classifiers. Then, we pull back the data-space weights to the latent space to produce unweighted examples and employ the Latent Space Refinement (
Laser
) protocol using Hamiltonian Monte Carlo. An alternative approach is an augmented normalizing flow, which allows for different dimensions in the latent and target spaces. These methods are studied for various pre-processing strategies, including a new and general method for massive particles at hadron colliders that is a tweak on the widely-used
RamboOnDiet
mapping. We find that modified simulations can achieve sub-percent precision across a wide range of phase space.