Search for new physics phenomena in events with two leptons and missing transverse energy using
machine learning
Michaël Etienne Arlandoo
Thesis submitted for the degree of Master in Subatomic Physics
60 credits
Department of Physics
Faculty of mathematics and natural sciences
UNIVERSITY OF OSLO
c 2019 Micha¨el Etienne Arlandoo
Search for new physics phenomena in events with two leptons and missing transverse energy using machine learning
http://www.duo.uio.no/
Printed: Reprosentralen, University of Oslo
Abstract
In this thesis, we investigate the use of machine learning methods to see if we can improve the sensitivity in the search for new physics phenomena at the Large Hadron Collider compared to the conventional analysis. The specific process analyzed is the pair-production of charginos leading to a final state with opposite-sign leptons and missing transverse energy. The two machine learning algorithms used are the logistic regression and boosted decision trees.
The performance of the logistic regression was quite bad as expected but the boosted decision tree performed better than the conventional method in the inclusive signal regions. Also a systematic method has been designed to compare the different machine learning algorithms by imposing them to output the same decision function.
Acknowledgements
First, I would like to thank my main supervisor Farid Ould-Saada for giving me the opportunity to delve into a subject that was, prior to this thesis, completely unknown to me. Thank you Farid for your support and help during the writing of this thesis and for making sure that I feel at home in the HEPP group. I am eternally grateful to my co-supervisor Eirik Gramstad for his invaluable help, patience, ingenuity and also for his nerd jokes. Thank you Farid and Eirik! It has been a pleasure learning from and working with both of you.
I would also like to thank everyone in the HEPP group for their support and for always being nice to me. You have all made this journey a pleasant one. I have greatly enjoyed our quiz sessions. A special thanks goes to Knut for his incredible patience in helping me debug my codes. I would also like to thank my fellow master students Simon, Oda and Helen for their support.
I would also like to thank my brother Gilbert for his moral and financial support.
Last but not least a great thanks goes to my mother who has always prioritized the education and freedom of her children.
Contents
Introduction 5
Notations and Conventions 6
1 The Standard Model 7
1.1 Particle content of the Standard Model . . . 7
1.1.1 Fermions . . . 7
1.1.2 The bosons of the Standard Model . . . 9
1.2 Symmetries and conserved quantities . . . 9
1.3 Formalism . . . 10
1.3.1 Yang-Mills theory . . . 10
1.3.2 Symmetry Breaking . . . 11
1.4 Quantum Chromodynamics . . . 13
1.5 The Electroweak Theory . . . 14
1.5.1 Chirality and the fermion sector . . . 14
1.5.2 The gauge sector . . . 15
1.5.3 Masses of the particles . . . 15
2 Beyond the Standard Model 17 2.1 Limitations of the Standard Model . . . 17
2.2 A brief overview of Supersymmetry . . . 18
2.2.1 Motivation and introduction . . . 18
2.2.2 Charginos and Neutralinos . . . 20
3 Proton-Proton Colliders 22 3.1 Kinematics . . . 22
3.1.1 Two-particle collision . . . 23
3.1.2 Collision products . . . 23
3.2 ppinteractions . . . 26
3.2.1 Partons . . . 26
3.2.2 Hadronization and jets . . . 26
3.3 Luminosity and pile-up . . . 27
4 The Large Hadron Collider and the ATLAS Detector 29 4.1 The Large Hadron Collider . . . 29
4.2 The ATLAS detector . . . 31
4.2.1 Inner detector . . . 31
4.2.2 Calorimeters . . . 32
4.2.3 Muon spectrometer . . . 33
4.2.4 Magnet system . . . 33
4.2.5 The trigger system . . . 34
5 Machine Learning Methods 35 5.1 General introduction . . . 35
5.2 Logistic Regression . . . 37
5.3 Decision trees . . . 38
5.3.1 Boosted decision trees . . . 39
6 Analysis strategy 42 6.1 Data and simulated samples . . . 42
6.1.1 Data set . . . 42
6.1.2 Simulated samples . . . 42
6.2 Object reconstruction and identification . . . 44
6.3 Pre-selection . . . 46
6.4 Statistical interpretation . . . 47
6.4.1 Significance . . . 48
6.4.2 Exclusion . . . 50
6.5 Signal regions for cut-and-count analysis . . . 51
6.6 ML-based analysis . . . 53
7 Analysis results 55 7.1 Results from conventional analysis . . . 55
7.2 Results from ML-based analysis . . . 59
7.3 Comparison between ML and the conventional method and . . 61
Summary and Conclusions 65
Bibliography 67
Introduction
Despite its unprecedented success at explaining observed subatomic phenom- ena, the Standard Model still fails to address issues such as the nature of dark matter and dark energy. New physics theories which are extension to the Standard Model have emerged as an attempt to address these problems.
The biggest remaining issue is to confirm these new theories experimentally.
The fact that these theories have not been confirmed yet does not nec- essarily mean that they are wrong but it is also highly probable that our current analysis methods are not very sensitive to the new physics scenarios.
The most conventional analysis method is the cut-and-count analysis where a series of constraints (or cuts) are applied to the different parameters such as transverse momentum and invariant mass to increase sensitivity. The main issue of this standard approach is that most of the time, it is not clear which cuts to apply.
With the emergence of machine learning (multivariate) methods which is nowadays a “hot” topic amongst data scientists, it is hoped that the per- formance of the search for new physics at the Large Hadron Collider will be improved. The advantage of the machine learning methods is that apart from the pre-selection of events, no additional inputs are fed to the machine learning algorithm so that it has the job of figuring out on which variables to apply the cuts at one fell swoop. These multivariate methods have been proved useful in [2] where the authors apply deep learning methods to search for exotic particles. They used deep neural networks and showed that they performed well in classifying signal samples and background samples with great precision.
The analysis in this thesis is motivated by [2] and we use two different machine learning methods to investigate the production of neutralinos from a simplified supersymmetric extension of the Standard Model. Also a recent paper [4] by the ATLAS collaboration has performed the standard analysis
on the same process. This provides a way to cross-check the standard analysis which is compared to the machine learning methods.
Chapter 1 introduces the Standard Model of particle physics and chapter 2 gives a brief overview of the supersymmetry with emphasis on its particle content and not on formalism. The process targeted by the analysis is also introduced in this chapter. Chapter 3 introduces the kinematic variables and the phenomenology of proton-proton collisions. Chapter 4 is a brief description of the experimental setup at the Large Hadron Collider and the ATLAS detector. Chapter 5 is gives a mountaintop view of the two machine learning algorithms that are used in the analysis. Chapter 6 gives details on the analysis procedures and terminology that is used in experimental particle physics. Finally, in chapter 7, the results of the analysis (sensitivity studies) are presented and discussed.
Notations and Conventions
The notations and conventions used in this thesis are listed:
• Natural units: We use natural units defined by ~ = c = 1, where
~ is the reduced Planck’s constant and c is the speed of light. This implies that energy, momentum and mass have the same unit which is the electronvolt (eV).
• Minkowski space (spacetime): It is denoted as R3,1 and the metric is ηµν = diag(1,−1,−1,−1).
• Vectors: Three-vectors are written as an arrow over a symbol e.g.
X~ = (x, y, z). Four-vectors are written as a = aµ = (a0, ~x). For x∈R3,1, x0 =t where t is time. One-forms are written as aµ =ηµνaν.
• Special unitary group: It is denoted as SU(N) and is defined as SU(N) = {U ∈CN×N|U U† =I, detU = 1}.
• Pauli matrices: they are the generators ofSU(2) and are defined as σ1 =
0 1 1 0
, σ2 =
0 −i i 0
, σ3 =
1 0 0 −1
.
• Dirac matrices: The gamma matrices γµ satisfy the clifford algebra {γµ, γµ}= 2ηµν. In the chiral representation,
γµ=
1 σµ
¯ σµ 1
,
where σµ = (1, σi) and ¯σµ = (1,−σi). The fifth gamma matrix is defined as γ5 =iγ0γ1γ2γ3.
• Slash notation: A/=γµAµ.
Chapter 1
The Standard Model
This chapter gives a mountaintop view of the Standard Model (SM) of par- ticle physics and the theories beyond the SM which are relevant only to the processes investigated in this thesis. Everything that is left out here can be found in particle physics and quantum field theory (QFT) textbooks such as [13] and [11].
1.1 Particle content of the Standard Model
The SM is a gauge theory with full gauge group SU(3)C ×SU(2)L×U(1)Y where C,L andY are physical labels which stand for color, left-handed and hypercharge respectively whose meanings will be discussed later. The SM gives a precise description of electromagnetic, weak and strong phenomena.
In this section, we present particles from an experimentalist’s point of view.
All particles in the Standard Model are classified according to their quan- tum numbers but the main one is the quantum mechanical spin. The two sets of particles are fermions and bosons.
1.1.1 Fermions
Fermions are particles with half-integer spin (1/2, 3/2, 5/2, ...) in units of the reduced Planck’s constant~. When we refer to fermions, we also include antiparticles which have the same spin and mass as their corresponding parti- cles but opposite electric charge and additive quantum numbers (e.g. lepton number). Fermions are also subdivided intoquarks andleptons which, in the
Standard Model, are considered as elementary particles that is they do not have any substructure.
Quarks
Quarks are the only fermions that participate in the strong interactions be- cause they carry, in addition to electric charge, the color charge, which is responsible for the strong force. There are six different types (flavors) of quarks, namely up (u), down (d), charm (c), strange (s), top (t) and bot- tom (b). They exist in three colors: red (r), blue (b) and green (g). The anti-quarks carry anti-color ¯r, ¯b and ¯g. Apart from the top quark which has a lifetime of ∼ 10−25 s, the quarks exist only as colorless bound states and have never been observed as free particles. The two colorless combinations of quarks are:
• Hadrons: Bound states of three quarks or three antiquarks. Colorles combinations of r, g and b for hadrons are for example rgb and ¯r¯b¯g.
The proton (uud) and the (udd) are famous examples of hadrons.
• Mesons: Bound states of a quark and an anti-quark. The colorless combination of r,g and b for mesons arerr,¯ gg¯and b¯b.
Leptons
The leptons are the fermions that do not participate in the strong interactions because they do not carry color charge. They exist as three generations each of which consists of a charged leptonl ∈ {e, µ, τ}and the associated neutrino vl. The electron is the lightest and only stable charged lepton while the others decay via the weak interaction.
The neutrinos are electrically neutral and interact only by the weak in- teraction and this makes them hard to detect. They are quite mysterious particles, for example, in the Standard Model, the neutrinos are massless but it is known from neutrino oscillations1 that at least two of them should be massive.
1Neutrino oscillations is a phenomenon where a neutrino can change flavor e.g. νe can change toνµ. This is possible only if the difference between the masses is non-zero [13].
1.1.2 The bosons of the Standard Model
It is well known that the SM describes the electromagnetic, strong and weak interactions but not gravity. A non-trivial result from perturbative quantum field theory is that all these interactions are mediated by particles known as the gauge bosons which are spin-1 particles. They are:
• The massless photon (γ) for electromagnetism.
• The massive W± and Z bosons for the weak interactions.
• The massless gluons for the strong interactions.
It is important to note that these particles are not only mediators but can be produced copiously at particle colliders.
The only spin-0 elementary particle in the SM is the Higgs boson which is responsible ifor the mass generation of the particles in the electroweak sector. It was a missing piece of the puzzle until it was discovered at the Large Hadron Collider in 2012.
The mediator of gravity is called thegraviton and is a spin-2 particle. It is still at the hypothesis level and has not yet been discovered experimentally.
1.2 Symmetries and conserved quantities
A conserved quantity always arises from the invariance of a given theory (action) under a symmetry transformation. The set of transformations forms a group and may be continuous or discrete. For example, invariance of all relativistic field theories under spacetime translations which is a subset of the full Lorentz group) leads to the conservation of four-momentum.
In the Standard Model, all the interactions are invariant under CP T transformations. C corresponds to charge conjugation which replaces a par- ticle by its antiparticle, P is the parity (~x → −~x) and T is time-revearsal (t → −t). Strong and electromagnetic interactions are invariant the sepa- rate transformation C, P, T and CP but this is not the case for the weak interactions.
The lepton number (L) is a conserved quantity which is a consequence to a global2 U(1) symmetry. Every generation of leptons is assigned a lepton
2A transformation is said to be global if it is not a function ofx∈R3,1.
number L = 1 i.e. to each lepton and its neutrino separately, and L = −1 to their antiparticles. All other particles which are not leptons are assigned L= 0. All interactions conserve lepton number except neutrino oscillations.
The effects of neutrino oscillations are significant at large distances.
The baryon number is also a conserved quantity associated to a global U(1) symmetry. The baryons are assigned B = 1, the anti-baryons B =−1 and all the rest B = 0 including the mesons. This implies that a quark has B = 1/3 while the anti-quark hasB =−1/3.
1.3 Formalism
In this section we introduce the two main ingredients in the construction of the Standard Model.
1.3.1 Yang-Mills theory
Yang-Mills (YM) theory is a gauge theory where the gauge group is a non- abelian Lie group. The relevant group here is SU(N). Consider the spinor (fermionic) field ψ : R3,1 → CN and the SU(N) -valued field Aµ : R3,1 → CN×N which under SU(N) transform as
ψ →U ψ,
Aµ →U AµU†+ i
g(∂µU)U†, (1.1)
where g ∈Ris called the coupling constant andU ∈SU(N). The latter can be written as
U = exp(−igαaTa), a= 1, ..., N2−1 (1.2) where α :R3,1 → R and {Ta} are the generators of SU(N) which obey the Lie algebra
[Ta, Tb] =ifabcTc. (1.3) Hhere, fabc are the structure constants of SU(N). Another important prop- erty is
tr(TaTb) = 1
2δab. (1.4)
{Ta} forms a basis for any SU(N) -valued field, therefore we can write
Aµ=AaµTa (1.5)
The {Aaµ} are referred to as the gauge fields which are interpreted as the gauge bosons of the theory; there are N2−1 of them.
The covariant derivative Dµ is defined as
Dµ=∂µ+igAµ≡∂µ+igAaµTa. (1.6) Next, we define the field strength Fµν =Fµνa Ta as
Fµν =−i
g[Dµ, Dν] =∂µAν −∂νAµ+ig[Aµ, Aν], Fµνa =∂µAaν −∂νAaµ−gfabcTbTc.
(1.7)
The YM Lagrangian LYM is constructed with the requirement that it is invariant under a local SU(N) transformation and it is written as
LYM = ¯ψ(i /D−m)ψ− 1
2tr(FµνFµν), (1.8) where m is the mass of the fermionic field ψ. There are no mass terms for the gauge fields as they would break gauge invariance.
The fermion-boson interaction term hidden in the covariant derivative is Lint ⊃ −gψ /¯Aψ. The most striking difference between YM theory and an abelian gauge theory (e.g. quantum electrodynamics) is that the kinetic term −12tr(FµνFµν) contains cubic and quartic self-interactions of the gauge bosons; this is a direct consequence of [Aµ, Aν]6= 0. The interactions in YM theory are shown diagramatically in Figure 1.1.
It is also worth mentioning that the coupling constant is universal in the sense that the vertex rules in YM theory contain only g. This is a direct consequence of the fact that SU(N) is a simple group. This is different for the electroweak theory whose gauge group is semi-simple and new coupling constants need to be added.
1.3.2 Symmetry Breaking
As mentioned before, the gauge fields in YM theory are massless to pre- serve gauge invariance. We know that some of the gauge bosons in the SM are massive, therefore we need a mechanism that gives them mass without breaking the gauge invariance in the original Lagrangian. We consider only
(a) Fermion-boson interaction (b) Cubic self-interactions
(c) Quartic self-interactions
Figure 1.1: Interaction vertices in YM theory
local SU(2) symmetry breaking with a Higgs doublet as it is the most rele- vant one for future discussions. We introduce a complex scalar field doublet φ :R3,1 →C2 which can be written as
φ = φ1
φ2
. (1.9)
Now consider the model Lagrangian L=−1
2tr(FµνFµν) + (Dµφ)†(Dµφ)− λ 2
φ†φ− v2 2
2
, (1.10)
where v, λ∈R. We fix the unitary gauge here by requiring Reφ1 = Imφ1 = Imφ2 = 0 and Reφ2 = φr
√2 Hence, we can rewrite φ(x) as
φ(x) = 1
√2 0
φr(x)
. (1.11)
The fieldφ(x) is assumed to take a non-zero vacuum expectation value (VEV) φ
written as
φ
= 1
√2 0
v
(1.12)
We expand around the VEV as follows:
φ(x) = 1
√2
0 v +h(x)
, (1.13)
where h(x) is the physical Higgs field. Substituting the expansion around the VEV in the original model Lagrangian, it can be shown that
(Dµφ)†(Dµφ)⊃ g2v2 8
3
X
a=1
(Aaµ)2. (1.14) This implies that all the gauge fields have acquired the same mass m2A = g2v2/4. In the Standard Model, the gauge group to be broken isSU(2)×U(1) (section 1.5.2) and in that case the symmetry breaking is referred to as the Brout-Englert-Higgs (BEH) mechanism.
1.4 Quantum Chromodynamics
Quantum Chromodynamics (QCD) is a pure3 YM theory with gauge group SU(3)C. It is the fundamental theory of strong interactions where the fermionic fields are the quarks and the eight gauge bosons are the gluons.
The three-dimensional space in which the gauge transformation takes place is called the color space, hence the label C in SU(3)C. As stated before, there are six types (flavors) of quarks. The quark field is a color triplet which transforms under the fundamental representation of SU(3)C and it is written as
qf = (qfr, qfb, qgf)T, (1.15) where f represents the flavor index, r,b and g are color indices which stand for “red”, “blue” and “green” respectively (they can also be labelled as 1,2 and 3). The Aµ in the previous section is replaced by Gµ and g by gs, the strong coupling constant. Also, for SU(3),
Ta = λa
2 , (1.16)
3The term “pure” is used here to emphasize the fact that the quark fields in QCD are mass eigenstates which is not the case for a chiral theory where a fermion mass term would break gauge invariance.
where λa are the Gell-Mann matrices. The QCD Lagrangian is written as LQCD=X
f
¯
qf(i /D−mf)qf −1
2tr(FµνFµν). (1.17) As mentioned before, the gluons carry color charge which is not the case for quantum electrodynamics where the photon does not carry theU(1) (elec- tric) charge.
1.5 The Electroweak Theory
Here we present the electroweak theory also known as the Glashow-Weinberg- Salam (GWS) theory. The construction of this theory is less straightforward than QED and QCD4.
1.5.1 Chirality and the fermion sector
The Dirac spinorψ can actually be decomposed in terms ofWeyl spinors ψL and ψR:
ψ = ψL
ψR
. (1.18)
ψL and ψR are known as left-handed (LH) and right-handed (RH) chirality states respectively.
It is an experimental fact that the W and Z bosons couple differently to LH and RH fermions. The W boson couples only to LH fermions whereas the Z boson couple to both LH and RH fermions in an asymmetric way. It is known that the leptons and quarks in the SM exist as doublets and there are three generations of them. These doublets are in turn broken into left- handed and right-handed components according to their weak isospin IW. The neutrinos in the SM are the only ones which exist only as LH states but the rest exist both as LH and RH states. The LH doublets are assigned IW = 12 and are written as
ψL = νl
l−
L
and
u d
L
, (1.19)
4Although the QCD Lagrangian looks deceptively simple, it is a highly non-trivial sub- ject with a tremendous amount of subtleties such as color confinement, non-perturbative QCD and lattice QCD.
wherelstands for lepton andνlfor the corresponding neutrino,ufor up-type quark and d for down-type quark. The RH states haveIW = 0 which means that they are singlets under SU(2)L and are written as
ψR=l−R , uR and dR. (1.20) The weak hypercharge is related to the electric chargeQ and third com- ponent of weak isospin IW3 via
Q= Y
2 +IW3 , IW3 =−IW, ...,0, ..., IW (1.21) The chirality of the fermions implies that a Dirac mass term would break gauge invariance and therefore cannot be present in the action of the theory.
1.5.2 The gauge sector
As mentioned before, the full electroweak gauge group is SU(2)L×U(1)Y. The set of SU(2)L gauge fields is {Wµa}, the coupling constant is g and the generators are the Pauli matrices {σa}. For the U(1)Y part, the gauge field is Bµ and the coupling is g0. The electroweak covariant derivative is written as
Dµ =∂µ+igIWWµaσa+ig0Y
2Bµ. (1.22)
The physical gauge fields for the photon, Z and W± bosons are defined respectively as
Aµ=BµcosθW +Wµ3sinθW
Zµ=−BµsinθW +Wµ3cosθW
Wµ±= 1
√2(Wµ1∓Wµ2)
(1.23)
where θW is the weak mixing angle, a constant determined experimentally.
It is related to the coupling constants by θW = tan−1g0
g. (1.24)
1.5.3 Masses of the particles
Gauge bosons
At low energy, the SU(2)L×U(1)Y symmetry is broken via the BEH mech- anism into SU(2)W, the weak interactions, and U(1)EM, the electromagnetic
interactions. The same Higgs doublet as in section (1.3) is considered with hypercharge Y = 12 but the gauge fields are the photon, Z-boson and W±- bosons. After the symmetry breaking, it is found that the Z and W bosons acquire the following respective masses:
mW = 1
2gv and mZ = 1 2vp
g2+g02. (1.25) These two masses are related by
mW =mZcosθW. (1.26)
After symmetry breaking, the photon remains massless and does not couple to the Higgs field h(x). It is the mediator of electromagnetic interactions.
Fermions
As stated before, a direct mass term cannot be added for the fermions if gauge invariance is to be preserved. This is consequence of the chiral nature of the theory. Instead, the fermion masses arise from their Yukawa couplings to the Higgs scalars φi with i= 1,2.
The bilinears ψL†ψR and ψR†ψL are SU(2) doublets, therefore they may couple to the Higgs scalarsφ∗i orφi. The Lagrangian for the Yukawa couplings of the leptons to the Higgs scalar is
LYuk =− X
l=e,µ,τ
glφi∗ψR†(l)ψLi(νl, l) + h.c. (1.27) where gl is the coupling of the respective lepton to the Higgs scalar and h.c.
denotes hermitian conjugate. For the quarks, the Lagrangian is LYuk =−X
u,d
gdφi∗ψ†R(d)ψLi(u, d)−guijφi∗ψR†(u)ψLj(u, d)
+ h.c. (1.28) Here, ij is the totally anti-symmetric tensor of rank 2, gd and gu are the couplings of the down-type and up-type quarks respectively. Whenφacquires a non-zero VEV, the mass of the lepton is
ml= glv
√2 (1.29)
and the masses of the quarks are mu = guv
√2 and md= gdv
√2. (1.30)
Chapter 2
Beyond the Standard Model
In this chapter we peep through the door of new physics beyond the Standard Model (BSM), more specifically supersymmetry (SUSY). The process to be investigated in this thesis is also introduced. The discussions will be mostly qualitative and the intricate formalism and phenomenology of SUSY will be omitted.
2.1 Limitations of the Standard Model
Despite its success, the Standard Model is quite messy and fails to address a few issues such as:
• Why is QCD a pure Yang-Mills theory while GWS a chiral theory that is why is the gauge group the way it is?
• Where do the ad-hoc Yukawa couplings to fermions come from?
• Why is there a plethora of free parameters that can only be determined experimentally?
• What is dark matter and dark energy?
BSM theories try to solve these puzzles and some examples are: technicolor, supersymmetry, extra-dimensional models, grand unified theories (GUTs) and string theory. Some of them are unification schemes: GUTs try to unify the three couplings whose curves on a coupling strength versus energy scale which do not meet, string theory is more almighty as it tries to combine all
physics phenomena including gravity into one framework and it also proposes a solution to some or all of the mentioned puzzles.
Dark matter is somewhat mysterious and its existence has been inferred from astrophysical observations. It is believed but not certain that dark matter is an elementary particle. One possibility is that dark matter is a weakly interacting particle (WIMP) which is a particle within the scale of weak interaction. At the TeV scale, the WIMP can be associated to the lightest supersymmetric partner (neutralino) in supersymmetric models with R-parity conservation which is multiplicative quantum number defined as
PR= (−1)3(B−L)+2s, (2.1)
where B is the baryon number, L is the lepton number and s is the spin.
2.2 A brief overview of Supersymmetry
2.2.1 Motivation and introduction
It is well known from quantum field theory that beyond tree-level, infinities start to show up. They are taken care of by a procedure called renormaliza- tion which is arguably the deepest and most important topic in physics. For example a calculation of the electron self-energy in QED, which corresponds to the diagram in figure 2.1(a), shows that the correction to the electron mass at one-loop order is [11]
δm= 3αm0 2π ln Λ
m0, (2.2)
where α is the fine-structure constant, m0 =m0(Λ) is the bare mass and Λ is the ultra-violet (UV) cut-off beyond which the theory is not valid. If we for example take Λ∼mZ and mo ∼MeV, the mass correction is very small.
This is a consequence of the fact that Λ appears only in the log.
However, in calculations involving the renormalization of the mass of a scalar field, quadratic divergences arise. For example, when calculating the self-energy of the scalar in Yukawa theory (interaction term Lint =−gφψψ),¯ which corresponds to the diagram in figure 2.1 (b), the one-loop correction to the scalar field’s mass squared is [11]
δm2 =− g2
8π2 Λ2. (2.3)
(a) Electron self-energy in QED
(b) Scalar self-energy Figure 2.1: Radiative correctionsThis implies that scalar fields have “poor” renormalization properties. In the SM, there is unfortunately a scalar field that appears which is the Higgs boson. This leads to an issue known as theHiggs/hierarchy problem which is related to the renormalization properties of the scalar field. The Higgs boson couples to all elementary particles (including itself) except the photon and gluons. This implies that there are more than one diagrams which contribute to its mass correction. The coupling constant is proportional to the mass of the particle of interest. Since the top quark is the heaviest of all elementary particles, it contributes the most to the radiative correction of the Higgs’
mass.
The divergence inδm2is negative for a fermion loop. This suggest that we can introduce a new theory with a symmetry between fermions and bosons in such a way that a boson loop, which contributes positively, cancels the negative fermion loop. This requires the boson to have the same properties as the fermion except for the spin which differs by 1/2. This leads to su- persymmetry which solves part of the hierarchy problem associated to loop corrections.
The idea behind SUSY is that a superpartner (orsparticle) is introduced for every SM particle. The superpartners of fermions are called sfermions.
For example the spin-0 superpartner of a lepton l is a slepton ˜l. Similarly squarks are the superpartners of quarks.
For gauge bosons, the superpartners are calledgauginos, such as gluinos ( ˜G), winos ( ˜W) and binos ( ˜B) which are spin-1/2 particles. SUSY has an extended Higgs sector and the Higgs bosons also have spin-1/2 superpart- ners which are called higgsinos. The higgsinos exist as neutral and charged particles.
There exists a formal way to transform from fermions to bosons and vice versa and this transformation is known as a supersymmetry transformation.
The fermion and its boson superpartner are different states of the same su- permultiplet.
The fact that no superpartner has ever been observed implies that SUSY must be a broken symmetry and the symmetry breaking mechanism ensures that the superpartners are so massive that they cannot be detected. This fact also causes the loop cancellations to be imperfect but if the symmetry breaking scale is small enough (TeV scale), the disparity can be kept small enough so that the radiative corrections to the mass of the Higgs are small.
The mechanism for SUSY breaking is still mystery.
The common feature in all SUSY theories is that the superpartners do not decay into known particles. If they exist, the final decay product should be the lightest SUSY particle (LSP) which has not been detected yet. The LSP is a possible candidate for the dark matter WIMP mentioned previously.
Detection of the superpartners is one of the major hopes of the Large Hadron Collider.
2.2.2 Charginos and Neutralinos
The charged and neutral higgsinos mix with winos and binos to form charginos {χ˜±1,χ˜±2} and neutralinos {χ˜01,χ˜02,χ˜03,χ˜04}respectively. They are expected to be produced with relatively high rates at colliders through the decays of squarks and sleptons, for example [8]
˜
q →qχ˜02, χ˜02 →l±˜l∓, ˜l± →l±χ˜01. (2.4) The ˜χ01 is expected to be very weakly interacting and to escape detectors which implies that it is in the form of missing transverse energy. To get information on the spectrum of sparticles, detailed studies are performed for example on the distribution of the number of jets, number of same and opposite sign leptons, multiple leptons and lepton flavors. It is important to mention that the cross-sections for sparticle production depend highly on their masses, in our particular case those of ˜χ±1 and ˜χ01 or more specifically on the mass splitting defined as the difference between the masses of the chargino and the neutralino: ∆m=m( ˜χ±1)−m( ˜χ01).
Charginos and neutralinos can be produced directly but at a lower rate by a few processes such as Drell-Yan. This leads us to the process investigated in this thesis. The analysis targets the direct production of ˜χ+1 χ˜−1 which both decay into an on-shell W-boson and ˜χ01 (the LSP). We consider the decay of
Figure 2.2: SUSY process under investigation
the W only into isolated leptons (e, µ) with opposite charge and significant missing transverse energy (MET). The MET is expected from both neutrinos and the LSPs in the final states. The process is illustrated pictorially in figure 2.2.
The analysis of this process is quite challenging due to the significant SM background contributions, especially from the diboson W W.
Chapter 3
Proton-Proton Colliders
This chapter introduces the kinematic variables used in this thesis and some phenomenological aspects of proton-proton (pp) collisions. More details can be found in textbooks on collider physics such as [6].
3.1 Kinematics
The starting point is the four-momentum of a particle which is defined as
p≡pµ = (E, ~p), (3.1)
whereEis the relativistic energy and~p= (px, py, pz) is the three-momentum.
They are both related to the rest mass m of the particle as
E =γm and ~p=γm~v, (3.2)
where~v is the velocity and γ = 1/√
1−~v2 with |~v| ∈[0,1). These relations lead to the following important result:
p·p≡p2 =m2 =⇒ E2 =~p2+m2. (3.3) A particle is said to be on-shell ifp2−m2 = 0 while off-shell (or virtual) if p2−m2 6= 0. The concept of virtual particles is very important in perturbative QFT where they act as mediators.
3.1.1 Two-particle collision
The Center-of-Mass (CM) reference frame of two colliding particles is often the same as the laboratory frame. It is defined as the frame in which the sum of the three-momenta of the particles is zero.
Consider two colliding identical particles (1 and 2) whose four-momenta are p1 andp2. By definition, in the CM frame,p~1 =−p~2 such thatE =E1 = E2. The CM energy is defined as ECM2 =E1+E2 = 2E and the Mandelstam variable s is defined as
s= (p1+p2)2 =ECM2 =⇒ √
s =ECM. (3.4)
3.1.2 Collision products
We now discuss the kinematics of the products in a two-particle collision sys- tem. The convention in colliders is to take the z-axis as the axis of collision, the positivex-axis as pointing towords the center of the accelerator ring and the positive y-axis as pointing upwards. The (x, y) plane is called thetrans- verse. The four-momentum of a particle can then be denoted asp~= (~pT, pz) where~pT = (px, py) is known as the transverse momentum andpz as the lon- gitudinal momentum. The magnitude of the transverse momentum is given by
pT =q
p2x+p2y. (3.5)
The transverse energy is defined as ET =
q
m2+p2T =p
E2−p2z. (3.6)
The transverse variables are all invariant under Lorentz boosts in the z- direction. It is convenient to use spherical coordinatesφ and θwhich are the azimuthal angle and polar angle respectively such that pT = |~p|sinθ. This is illustrated in figure 3.1
The rapidity of a particle is related to its energy and longitudinal mo- mentum by
y = 1 2ln
E+pz E−pz
. (3.7)
At high momenta (relativistic limit),|~p| msuch thatE−pz ≈2|~p|sin2(θ/2) and E+pz ≈2|~p|cos2(θ/2), the rapidity can be approximated by the pseu-
(a) Transverse plane (b) Longitudinal plane Figure 3.1: Collision plane
dorapidity (η) as
y≈ −ln
tanθ 2
≡η. (3.8)
The pseudo-rapidity is more convenient experimentally because (i) it does not include the mass of the desired particle which in most cases is unknown and (ii) because of the occurence of a plateau: particle multiplicity per unit rapidity is nearly constant. Also for reasons of convenience we define the distance between two particles (1 and 2) in the (η, φ) space as
∆R=p
∆η2+ ∆φ2, (3.9)
where ∆η = η1 −η2 and ∆φ = φ1 −φ2. For experimental purposes, the basic kinematic variables arem,pT,ηandφ. Next we define other kinematic variables which are used in the analysis targeted in this thesis.
From the conservation of total momentum, we know that P
i~pT,i = 0 where the sums runs over all the particles produced. However, not all parti- cles are detectable which implies that we can split the sum as
X
i
~
pT ,i =~pmissT +X
~
pT (3.10)
where the sum on the right-hand side is over all the particles which can be detected. This leads to the definition missing transverse momentum:
~
pmissT =−X
~
pT, (3.11)
where the sum is over all the particles which can be detected. We also define the missing transverse energy (MET) as ETmiss =|~pmissT |.
In practice, when the MET is reconstructed, there are some contributions from unwanted signals that cannot be calibrated and identified. For this rea- son, it is useful to introduce the MET significanceS which, in the relativistic limit (ET =|~pT|) is defined as
S = ETmiss
pPET. (3.12)
A high value of significance indicates that the event is more likely to contain desired objects (e.g. neutralinos) that escape direct detection.
Another important variable is the invariant mass of two particles (1 and 2) which is defined as m12 = (p1+p2)2 and is written in terms of the basic variables as
m12= 2pT1pT2[cosh (η1−η2)−cos (φ1−φ2)]. (3.13) Some particles cannot be detected directly (e.g. neutrinos) but are inferred from the transverse mass. Suppose a particle decays into particles 1 and 2 where 2 is the invisible one, then the transverse mass is given by
mT = q
m21+m22+ 2(ET1ET2−~pT1·~pT2). (3.14) In the relativistic limit (ET =|~pT|), the transverse mass can be written as
mT =p
2ET1ET2(1−cos ∆φ) (3.15) where ∆φ is the angle between the two particles with transverse momenta
~
pT1 and ~pT2. The maximum value ofmT is the mass of the decaying particle.
In our particular SUSY scenario introduced in the previous chapter (W W → lνχ˜01lνχ˜01), there are more that one invisible particles and a more suitable kinematic variable is the stransverse mass mT2 defined as
mT2(l, l, ETmiss) = min
~ pmissT
{max[mT(~pT1, ~qT1), mT(~pT2, ~qT2)]} (3.16) where~pT1and~pT2 are the transverse momenta of the two leptons and~pmissT =
~
qT1 +~qT2 contains all the information about the invisible particles. The minimization is performed over all the possible configurations of ~qT that is over all possible values of ~qT1 and ~qT2. The stransverse mass is very useful because in the end all we only know about the neutralinos is ~pmissT .
3.2 pp interactions
Proton-proton collisions can be classified into elastic, diffractive and non- diffractive processes. Elastic and diffractive processes fall into non-perturbative QCD which is a highly non-trivial subject. They are lowpT phenomena and the particles are produced mostly along the beam line therefore they are not interesting for our current analysis. Non-diffractive processes can be studied using perturbation theory and are appropriate for our purposes because the particles produced in the targeted analysis have highpT. Non-diffractive pro- cesses involving high momentum transfer are also known as hard scattering processes.
3.2.1 Partons
The proton is not elementary but is made of quarks and gluons which are col- lectively known aspartons. Appcollision is actually a parton-parton collision and each parton carries only a fraction of the proton’s total momentum.
Consider two relativistic protons colliding along thez-axis such that their four-momenta are p1 = (E,0,0, E) and p2 = (E,0,0,−E). We assume that the two colliding partons have negligible three-momenta components in the (x, y) plane such that the momenta are~q1 = (0,0, q1) and~q2 = (0,0, q2). The momentum fraction carried by the partons are defined as
x1 = q1
E and x2 = q2
E (3.17)
such that their four-momenta can be written as
Q1 =x1(E,0,0, E) and Q1 =x2(E,0,0,−E). (3.18) A particle produced in the collision has a maximum rest mass given by (Q1 +Q2)2 = x1x2s. The momentum distribution between the partons is described by theparton distribution function (PDF). This function can only be determined experimentally usually in deep inelastic scattering (DIS) and is basically a probability density function denoted as f(x). More details on partons and PDFs can be found in [13].
3.2.2 Hadronization and jets
As mentioned previously, quarks and gluons are not observed as free particles but as color singlets. When trying to separate the quarks of a colorless object,
the color field acquires enough energy to create new partons. Quarks radiate gluons which in turn create quark-antiquark pairs and this process continues until the energy is low enough (less than the QCD scale ΛQCD) to form hadrons. This process is known as hadronization. It is not an established subject and hence not fully understood.
In ppcollisions, many hadrons are created as end products and they are referred to as hadronic showers. In particle detectors, the hadronic showers are observed in narrow cones which are known asjets. It is important to note that jets are not only the products of hard scattering but they also come from initial state and final state radiation (radiation of gluons from quarks and antiquarks). A proper treatment of hadronization and jets can be made only in QCD.
3.3 Luminosity and pile-up
The event rate (number of events per unit time dN/dt) of a given scattering process R is related to the cross-section1 σ by R=Lσ whereL is the lumi- nosity, a parameter completely determined by the properties of the colliding beams (assuming a Gaussian profile) [13]:
L = n1n2
4πσxσyf (3.19)
In an accelerator, the particles are grouped into bunches that are brought into collision and here, n1 and n2 are the number of particles per bunch, σx and σy are the beam’s transverse widths and f is the rate at which the bunches cross. The number of events is given by integrating the rate over time:
N = Z
Rdt=σ Z
L(t) dt. (3.20) The integrated luminosity is a measure of the amount of data collected over a period of time and has the inverse unit of cross-section namely inverse femtobarns (fb−1).
1The differential cross-section is a measure of the differential (quantum) probability of a process to occur. The total cross-section is obtained by integration over a given solid angle.
Pile-up is the phenomenon that occurs when there are more than one collision during the bunch crossing. These additional collisions do not lead to hard scattering and very accurate measurements2 are required to reconstruct the primary vertex where the interaction of interest took place. One example of a difficulty pile-up poses is that it produces jets which have to be taken into account before doing any analysis.
2Measurements with minimum bias trigger (without too much filtering) are performed to be able to reconstruct all the vertices and identify the primary vertex.
Chapter 4
The Large Hadron Collider and the ATLAS Detector
This chapter gives a brief description of the experimental setup at the Large Hadron Collider (LHC) and the ATLAS detector from which all the data used in this thesis is obtained. The discussions will be very descriptive and history will be omitted. All the (updated) details on the experimental setup can be found on the CERN website. The organization hosting the high energy physics experiment of our interest is CERN, a french acronym which stands for European Organization for Nuclear Research. It is located on the border between Switzerland (Geneva) and France. Thanks to CERN light has been shed on several aspects of high energy physics through the discovery of many particles such as the Z and W bosons and the Higgs boson.
4.1 The Large Hadron Collider
The LHC is the world’s most powerful particle accelerator and has reached, at the time of writing, a maximum center of mass energy of 13 TeV in proton- proton collisions. It is circular with a circumference of 27 km and is situated 100 m underground.
The LHC is circular and the two particle beams are accelerated through the ring and are made to cross each other inside the detectors. The beams are guided around the ring by superconducting magnets which are cooled down to −271.3◦C by liquid helium. The beams are steered by 1232 dipole magnets and focused by 392 quadrupole magnets. A schematic diagram of
Figure 4.1: CERN accelerator complex [9]
the accelerator complex is given in figure 4.1.
Single protons, prepared by the ionization of hydrogen gas, are injected in the LINAC 2 accelerator and are accelerated to 50 MeV. They are then fed into the Proton Synchroton Booster (PSB) which accelerates them further to 1.4 GeV. The beam energy is increased further in the Proton Synchroton (PS) to 26 GeV and inside, they are arranged into bunches. The beam then goes through the Super Proton Synchroton (SPS) and the energy reached is 450 GeV. Finally, the beam enters the LHC where the energy is increased to the desired center of mass energy that is 13 TeV.
Finely tuned radio frequency (RF) cavities1 inside the LHC are used to:
• Accelerate the particles so that the ones with lower (higher) energy are accelerated (decelerated) and those with the desired energy remain untouched,
• Ensure that the particles stay in bunches.
The maximum number of bunches at which the LHC operates is 2808,
1An RF cavity is a metallic chamber that confines electromagnetic fields at 400 MHz in the case of the LHC
the number of protons in each bunch is ∼ 1011 and the bunch spacing is 25 ns. This corresponds to a design luminosity of 1034 cm−2s−1.
The four main experiments at the LHC are:
• ATLAS (A Toroidal LHC ApparatuS): more details will be given in the next section.
• CMS (Compact Muon Collider): like ATLAS, it is a multipurpose dedector which searches for new physics such as SUSY and also per- forms high-precision measurements on the Higgs boson. It also records and studies heavy ions collisions.
• ALICE (A Large Ion Collider Experiment): it focuses on heavy ion collisions (lead-lead and lead-proton) to study the quark-gluon plasma (QGP), a phase of matter where the quarks are free and which is be- lieved to have existed right after the Big Bang (around 1 µs).
• LHCb (Large Hadron Collider beauty): it studies CP-violating pro- cesses involvingb-quarks to understand the asymmetry between matter and anti-matter in the universe.
4.2 The ATLAS detector
ATLAS [5] is a cylindrical detector designed to capture most of the products in collisions and therefore covers a solid angle of nearly 4π. The central part is called thebarrel (|η|.2.5) and the two end parts are called end-caps (|η|&
2). The layout of the ATLAS detector is shown in figure 4.2. The detector consists of five main components: an inner detector (ID), calorimeters, a muon spectrometer (MS) system, a magnet system and the trigger and data acquisition (DAQ) system. These components will be discussed in some details in the next subsections.
4.2.1 Inner detector
The pixel detector along with the semiconductor tracker and transition ra- diation tracker make up the inner detector. It is surrounded by a solenoid magnet of field strength 2 T at the center. It has a measurement coverage of |η| < 2.5 The main purpose of the ID is to measure tracks of charged
Figure 4.2: The ATLAS detector
particles and event vertices. The magnetic field allows the determination of charge and momentum of the particles.
The momentum measurements in the ID have a resolution2 of σpT/pT = 0.05%pT ⊕1% where a⊕b := √
a2+b2. The resolution gives a measure of the performance of a given apparatus.
4.2.2 Calorimeters
Calorimeters stop particles and measure their energies. They are divided in to a barrel part and two end-caps like the ID. In ATLAS, there are two types of calorimeters, namely the electromagnetic calorimeter (ECal) and the hadronic calorimeter (HCal).
2The resolution of a measurable quantityxis the fractional uncertainty given byσx/x
Electromagnetic calorimeter
The main task of the ECal is to stop electrons, positrons and photons. How- ever, charged hadrons and photons from neutral pion decays also deposit energy in the ECal. It is made of layers of lead and liquid argon. Lead is the passive medium in which most of the interactions occur while argon is theac- tive medium where the atoms are ionized and the electrons are deposited on electrodes. The ECal has an energy resolution of σE/E = 10%/√
E⊕0.7%.
Hadronic calorimeter
As its name suggests, the HCal is used for the detection of hadrons and consists of three different subsystems where the first two share the same technology as the ECal:
1. The hadronic end-cap (HEC) is located right outside the end-cap of the ECal and copper is used as the passive medium while the active medium is argon. It has an energy resolution ofσE/E = 50%/√
E⊕3%.
2. The forward calorimeter (FCal) consists of three basic units where the first one is used for electromagnetic measurements while the other two are intended for hadronic interactions. It has the same energy resolu- tion as the HEC.
3. The tile calorimeter is located outside the ECal and consists of alternat- ing layers of steel (passive medium) and plastic scintillator tiles (active medium). The energy resolution of the tile calorimeter is σE/E = 100%/√
E⊕10%.
4.2.3 Muon spectrometer
In the MS, precision measurements are performed on the muons that exit the calorimeters. The layout of the MS is the same as the other components with a barrel part and two end-caps. The momentum resolution of the MS is σpT/pT = 10%/√
E⊕0.7% at pT = 1 TeV.
4.2.4 Magnet system
Besides the solenoid magnet in the ID, the ATLAS magnet system consists of one barrel toroid (magnet) an two end-caps toroids. They have the same
task as the solenoid magnet in the ID. The magnetic field strength for the barrel toroid and the end-cap toroids are 0.5 T and 1 T respectively.
4.2.5 The trigger system
The trigger system is, roughly speaking, a filter of events which ensures that interesting events are kept for data analysis. This filtering is necessary due to the limited data storage capacity of 1 kHZ while the event rate is 40 MHz.
The trigger system is classified into two levels namely the Level-1 (L1) trigger and the High-Level Trigger (HLT).
The L1 trigger is hardware based and is part of the electronics of the detec- tor. It has been designed to keep events (from the MS and the calorimeters) with high pT leptons, photons, jets and hadronically decaying tau leptons as well as events with high missing transverse energy and large transverse mass. It reduces the event rate to about 100 kHz and defines regions of in- terest (RoI) based on the (φ, η) coordinates of the “interesting” objects. The maximum decision time for the L1 trigger is 25 ns and the decision is sent to the read-out-system (ROS) within 2.5 µs.
The HLT is software based and accesses the ROS to further narrow down the event rate to 1 kHz. It is based on the region of interest defined by the L1 trigger and has an average processing time of 0.2 s per event. Offline processing of data is made from all the events that pass the HLT.
Chapter 5
Machine Learning Methods
This chapter reviews the two machine learning (ML) methods used in this thesis namely logistic regression and boosted decision trees (BDT). The ma- terial presented here is heavily based on the excellent notes prepared and provided by James Catmore (University of Oslo, Department of High En- ergy Physics) and on the 2018 Geilo Winter School course notes on machine learning. For more details on the topics presented in this chapter or deep learning in general, the reader is referred to for example [14].
5.1 General introduction
Machine learning is a computing tool used in the analysis of big data with a large number of variables such as the data from collider experiments. It is based on statistical inference and it is used to effectively perform a specific task without using explicit instructions. The strength of ML techniques lies in its ability to extrapolate pattern from the data set and to make predictions on new data based on what it has learned.
The only machine learning approach we will consider issupervised learning where the algorithm learns from a set of labeled data called the training sample in order to make predictions on an unlabeled data set also known as the testing sample. The performance of a machine learning algorithm is determined by the comparison between the labeled and unlabeled sets.
Although we will use the algorithms as a tool, it is important to understand the workings behind the “black-box” so that we can perform modifications on the different features according to our problem.
We start with a general regression problem, then we will translate it to binary classifiers. Our input data in this case is the set of values of the parameters belonging to a feature hyperspace. We denote each value of the (independent) parameter as1 x(i) wherei= 1, ..., mand x= (x1, ..., xn)∈ H.
We define the feature hyperspace H as the set of all the parameters for the system under consideration:
H ={x1, x2, ..., xn}. (5.1) The output (dependent) variable is written as y(i) = y(x(i)). Now we want to model the output, also called a hypothesis and we denote it as hθ(x(i)).
The hypothesis is exactly the function that a learning algorithm uses for prediction and is characterized by a set of parameters {θ1, ..., θl}. Suppose that the hypothesis deviates from the data by (i):
hθ(x(i)) =y(i)+(i). (5.2) In any regression analysis, we seek a set of parameters θ = {θ1, ..., θl} that minimize the loss function J(θ) i.e. we want to find minθL(θ). For example, the loss function for linear regression is
L(θ) = 1 2m
m
X
i=1
(hθ(x(i))−y(i))2. (5.3) Although we have considered the linear regression case for simplicity, the discussion may easily be generalized to more complex regression algorithms.
Gradient descent method of minimization
In gradient descent method, we start with some θ and we keep changing it to reduce J(θ) until we end up (hopefully) at a minimum. Formally, the (batch) gradient descent algorithm is
θi+1 =θi−α∇θL(θ), (5.4) where ∇θ = (∂/∂θ1, ..., ∂/∂θ1) and αinR is the learning rate which gives a measure of the size of the step we take towards the minimum. We want α small enough to converge towards the minimum but large enough for rapid convergence, therefore it should be chosen judiciously in the machine learning algorithms.
1Note that the superscript index does not refer to different features (parameters) but to the entry in a given training set. To denote different features in a feature hyperspace, we use a subscript.