We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two ...discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
Topic model for four-top at the LHC Alvarez, Ezequiel; Lamagna, Federico; Szewc, Manuel
The journal of high energy physics,
01/2020, Volume:
2020, Issue:
1
Journal Article
Peer reviewed
Open access
A
bstract
We study the implementation of a Topic Model algorithm in four-top searches at the LHC as a test-probe of a not ideal system for applying this technique. We study this Topic Model behavior ...as its different hypotheses such as mutual reducibility and equal distribution in all samples shift from true. The four-top final state at the LHC is not only relevant because it does not fulfill these conditions, but also because it is a difficult and inefficient system to reconstruct and current Monte Carlo modeling of signal and backgrounds suffers from non-negligible uncertainties. We implement this Topic Model algorithm in the Same-Sign lepton channel where S/B is of order one and all backgrounds cannot have more than two b-jets at parton level. We define different mixtures according to the number of b- jets and we use the total number of jets to demix. Since only the background has an anchor bin, we find that we can reconstruct the background in the signal region independently of Monte Carlo. We propose to use this information to tune the Monte Carlo in the signal region and then compare signal prediction with data. We also explore Machine Learning techniques applied to this Topic Model algorithm and find slight improvements as well as potential roads to investigate. Although our findings indicate that still with the full LHC run 3 data the implementation would be challenging, we pursue through this work to find ways to reduce the impact of Monte Carlo simulations in four-top searches at the LHC.
A
bstract
During the last years ATLAS and CMS have reported a number of slight to mild discrepancies in signatures of multileptons plus
b
-jets in analyses such as
t
t
¯
H
,
t
t
¯
W
±
,
t
t
¯
Z
and
t
...t
¯
t
t
¯
. Among them, a recent ATLAS result on
t
t
¯
H
production has also reported an excess in the charge asymmetry in the same-sign dilepton channel with two or more
b
-tagged jets. Motivated by these tantalizing discrepancies, we study a phenomenological New Physics model consisting of a
Z′
boson that couples to up-type quarks via right-handed currents:
t
R
γ
μ
t
¯
R
,
t
R
γ
μ
c
¯
R
, and
t
R
γ
μ
u
¯
R
. The latter vertex allows to translate the charge asymmetry at the LHC initial state protons to a final state with top quarks which, decaying to a positive lepton and a
b
-jet, provides a crucial contribution to some of the observed discrepancies. Through an analysis at a detector level, we select the region in parameter space of our model that best reproduces the data in the aforementioned
t
t
¯
H
study, and in a recent ATLAS
t
t
¯
t
t
¯
search. We find that our model provides a better fit to the experimental data than the Standard Model for a New Physics scale of approximately ∼500 GeV, and with a hierarchical coupling of the
Z′
boson that favours the top quark and the presence of FCNC currents. In order to estimate the LHC sensitivity to this signal, we design a broadband search featuring many kinematic regions with different signal-to-background ratio, and perform a global analysis. We also define signal-enhanced regions and study observables that could further distinguish signal from background. We find that the region in parameter space of our model that best fits the analysed data could be probed with a significance exceeding 3 standard deviations with just the full Run-2 dataset.
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop ...and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). Methods made use of modern machine learning tools and were based on unsupervised learning (autoencoders, generative adversarial networks, normalizing flows), weakly supervised learning, and semi-supervised learning. This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.
A composite pNGB leptoquark at the LHC Alvarez, Ezequiel; Da Rold, Leandro; Juste, Aurelio ...
The journal of high energy physics,
12/2018, Volume:
2018, Issue:
12
Journal Article
Peer reviewed
Open access
A
bstract
The measurements of
R
K
(∗)
and
R
D
(∗)
by BaBar, Belle and the LHCb collaborations could be showing a hint of lepton flavor universality violation that can be accommodated by the presence ...of suitable leptoquarks at the TeV scale. We consider an effective description, with leptoquarks arising as composite pseudo Nambu-Goldstone bosons, as well as anarchic partial compositeness of the SM fermions. Considering the
R
K
(∗)
anomaly within this framework, we study pair production of
S
3
∼
3
¯
3
1
/
3
at the LHC. We focus on the component
S
3
1/3
of the triplet, which decays predominantly into
tτ
and
bν
, and study the bounds from existing searches at
s
=
13
TeV at the LHC. We find that sbottom searches in the
b
b
¯
+
E
T
miss
final state best explore the region in parameter space preferred by our model and currently exclude
S
3
1/3
masses up to ∼1 TeV. Additional searches, considering the
t
τ and
tμ
decay modes, are required to probe the full physical parameter space. In this paper we also recast existing studies on direct leptoquark searches in the
tτ tτ
channel and SM
t
t
¯
t
t
¯
searches, and obtain the regions in parameter space currently excluded. Practically the whole physical parameter space is currently excluded for masses up to ∼0.8 TeV, which could be extended up to ∼1 TeV with the full Run 3 dataset. We conclude that pair production searches for this leptoquark can benefit from considering the final state
tτ b
+
E
T
miss
, where the largest branching ratio is expected. We appraise that future explorations of leptoquarks explaining the B-anomalies with masses beyond the TeV should also consider single and non-resonant production in order to extend the mass reach.
Machine-learning techniques have become fundamental in high-energy physics and, for new physics searches, it is crucial to know their performance in terms of experimental sensitivity, understood as ...the statistical significance of the signal-plus-background hypothesis over the background-only one. We present here a simple method that combines the power of current machine-learning techniques to face high-dimensional data with the likelihood-based inference tests used in traditional analyses, which allows us to estimate the sensitivity for both discovery and exclusion limits through a single parameter of interest, the signal strength. Based on supervised learning techniques, it can perform well also with high-dimensional data, when traditional techniques cannot. We apply the method to a toy model first, so we can explore its potential, and then to a LHC study of new physics particles in dijet final states. Considering as the optimal statistical significance the one we would obtain if the true generative functions were known, we show that our method provides a better approximation than the usual naive counting experimental results.
We propose an extension of the existing experimental strategy for measuring branching fractions of top quark decays, targeting specifically t\to j_q W t → j q W , where j_q j q is a light quark jet. ...The improved strategy uses orthogonal b b - and q q -taggers, and adds a new observable, the number of light-quark-tagged jets, to the already commonly used observable, the fraction of b b -tagged jets in an event. Careful inclusion of the additional complementary observable significantly increases the expected statistical power of the analysis, with the possibility of excluding |V_{tb}|=1 | V t b | = 1 at 95\% 95 % C.L. at the HL-LHC, and accessing directly the standard model value of |V_{td}|^2+|V_{ts}|^2 | V t d | 2 + | V t s | 2 .
Recognizing hadronically decaying top-quark jets in a sample of jets, or even its total fraction in the sample, is an important step in many LHC searches for Standard Model and Beyond Standard Model ...physics as well. Although there exists outstanding top-tagger algorithms, their construction and their expected performance rely on Montecarlo simulations, which may induce potential biases. For these reasons we develop two simple unsupervised top-tagger algorithms based on performing Bayesian inference on a mixture model. In one of them we use as the observed variable a new geometrically-based observable
\tilde{A}_{3}
A
̃
3
, and in the other we consider the more traditional
\tau_{3}/\tau_{2}
τ
3
/
τ
2
N
N
-subjettiness ratio, which yields a better performance. As expected, we find that the unsupervised tagger performance is below existing supervised taggers, reaching expected Area Under Curve AUC
\sim 0.80-0.81
∼
0.80
−
0.81
and accuracies of about 69%
-
−
75% in a full range of sample purity. However, these performances are more robust to possible biases in the Montecarlo that their supervised counterparts. Our findings are a step towards exploring and considering simpler and unbiased taggers.
This work reports on a method for uncertainty estimation in simulated collider-event predictions. The method is based on a Monte Carlo-veto algorithm, and extends previous work on uncertainty ...estimates in parton showers by including uncertainty estimates for the Lund string-fragmentation model. This method is advantageous from the perspective of simulation costs: a single ensemble of generated events can be reinterpreted as though it was obtained using a different set of input parameters, where each event now is accompanied with a corresponding weight. This allows for a robust exploration of the uncertainties arising from the choice of input model parameters, without the need to rerun full simulation pipelines for each input parameter choice. Such explorations are important when determining the sensitivities of precision physics measurements. Accompanying code is available at https://gitlab.com/uchep/mlhad-weights-validation.
Learning Latent Jet Structure Dillon, Barry M.; Faroughy, Darius A.; Kamenik, Jernej F. ...
Symmetry (Basel),
07/2021, Volume:
13, Issue:
7
Journal Article
Peer reviewed
Open access
We summarize our recent work on how to infer on jet formation processes directly from substructure data using generative statistical models. We recount in detail how to cast jet substructure ...observables’ measurements in terms of Bayesian mixed membership models, in particular Latent Dirichlet Allocation. Using a mixed sample of QCD and boosted tt¯ jet events and focusing on the primary Lund plane observable basis for event measurements, we show how using educated priors on the latent distributions allows to infer on the underlying physical processes in a semi-supervised way.