Search for new physics phenomena in events with two leptons and missing transverse energy using machine learning

(1)

Search for new physics phenomena in events with two leptons and missing transverse energy using

machine learning

Michaël Etienne Arlandoo

Thesis submitted for the degree of Master in Subatomic Physics

60 credits

Department of Physics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

(2)

(3)

(4)

c 2019 Micha¨el Etienne Arlandoo

Search for new physics phenomena in events with two leptons and missing transverse energy using machine learning

http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo

(5)

Abstract

In this thesis, we investigate the use of machine learning methods to see if we can improve the sensitivity in the search for new physics phenomena at the Large Hadron Collider compared to the conventional analysis. The specific process analyzed is the pair-production of charginos leading to a final state with opposite-sign leptons and missing transverse energy. The two machine learning algorithms used are the logistic regression and boosted decision trees.

The performance of the logistic regression was quite bad as expected but the boosted decision tree performed better than the conventional method in the inclusive signal regions. Also a systematic method has been designed to compare the different machine learning algorithms by imposing them to output the same decision function.

(6)

Acknowledgements

First, I would like to thank my main supervisor Farid Ould-Saada for giving me the opportunity to delve into a subject that was, prior to this thesis, completely unknown to me. Thank you Farid for your support and help during the writing of this thesis and for making sure that I feel at home in the HEPP group. I am eternally grateful to my co-supervisor Eirik Gramstad for his invaluable help, patience, ingenuity and also for his nerd jokes. Thank you Farid and Eirik! It has been a pleasure learning from and working with both of you.

I would also like to thank everyone in the HEPP group for their support and for always being nice to me. You have all made this journey a pleasant one. I have greatly enjoyed our quiz sessions. A special thanks goes to Knut for his incredible patience in helping me debug my codes. I would also like to thank my fellow master students Simon, Oda and Helen for their support.

I would also like to thank my brother Gilbert for his moral and financial support.

Last but not least a great thanks goes to my mother who has always prioritized the education and freedom of her children.

(7)

Introduction

Despite its unprecedented success at explaining observed subatomic phenomena, the Standard Model still fails to address issues such as the nature of dark matter and dark energy. New physics theories which are extension to the Standard Model have emerged as an attempt to address these problems.

The biggest remaining issue is to confirm these new theories experimentally.

The fact that these theories have not been confirmed yet does not nec- essarily mean that they are wrong but it is also highly probable that our current analysis methods are not very sensitive to the new physics scenarios.

The most conventional analysis method is the cut-and-count analysis where a series of constraints (or cuts) are applied to the different parameters such as transverse momentum and invariant mass to increase sensitivity. The main issue of this standard approach is that most of the time, it is not clear which cuts to apply.

With the emergence of machine learning (multivariate) methods which is nowadays a “hot” topic amongst data scientists, it is hoped that the performance of the search for new physics at the Large Hadron Collider will be improved. The advantage of the machine learning methods is that apart from the pre-selection of events, no additional inputs are fed to the machine learning algorithm so that it has the job of figuring out on which variables to apply the cuts at one fell swoop. These multivariate methods have been proved useful in [2] where the authors apply deep learning methods to search for exotic particles. They used deep neural networks and showed that they performed well in classifying signal samples and background samples with great precision.

The analysis in this thesis is motivated by [2] and we use two different machine learning methods to investigate the production of neutralinos from a simplified supersymmetric extension of the Standard Model. Also a recent paper [4] by the ATLAS collaboration has performed the standard analysis

(11)

on the same process. This provides a way to cross-check the standard analysis which is compared to the machine learning methods.

Chapter 1 introduces the Standard Model of particle physics and chapter 2 gives a brief overview of the supersymmetry with emphasis on its particle content and not on formalism. The process targeted by the analysis is also introduced in this chapter. Chapter 3 introduces the kinematic variables and the phenomenology of proton-proton collisions. Chapter 4 is a brief description of the experimental setup at the Large Hadron Collider and the ATLAS detector. Chapter 5 is gives a mountaintop view of the two machine learning algorithms that are used in the analysis. Chapter 6 gives details on the analysis procedures and terminology that is used in experimental particle physics. Finally, in chapter 7, the results of the analysis (sensitivity studies) are presented and discussed.

(12)

Notations and Conventions

The notations and conventions used in this thesis are listed:

• Natural units: We use natural units defined by ~ = c = 1, where

~ is the reduced Planck’s constant and c is the speed of light. This implies that energy, momentum and mass have the same unit which is the electronvolt (eV).

• Minkowski space (spacetime): It is denoted as R^3,1 and the metric is η^µν = diag(1,−1,−1,−1).

• Vectors: Three-vectors are written as an arrow over a symbol e.g.

X~ = (x, y, z). Four-vectors are written as a = a^µ = (a⁰, ~x). For x∈R^3,1, x⁰ =t where t is time. One-forms are written as a_µ =η_µνa^ν.

• Special unitary group: It is denoted as SU(N) and is defined as SU(N) = {U ∈C^N^×N|U U^† =I, detU = 1}.

• Pauli matrices: they are the generators ofSU(2) and are defined as σ¹ =

0 1 1 0

, σ² =

0 −i i 0

, σ³ =

1 0 0 −1

.

• Dirac matrices: The gamma matrices γ^µ satisfy the clifford algebra {γ^µ, γ^µ}= 2η^µν. In the chiral representation,

γ^µ=

1 σ^µ

¯ σ^µ 1

,

where σ^µ = (1, σⁱ) and ¯σ^µ = (1,−σⁱ). The fifth gamma matrix is defined as γ⁵ =iγ⁰γ¹γ²γ³.

• Slash notation: A/=γ^µA_µ.

(13)

Chapter 1 The Standard Model

This chapter gives a mountaintop view of the Standard Model (SM) of particle physics and the theories beyond the SM which are relevant only to the processes investigated in this thesis. Everything that is left out here can be found in particle physics and quantum field theory (QFT) textbooks such as [13] and [11].

1.1 Particle content of the Standard Model

The SM is a gauge theory with full gauge group SU(3)_C ×SU(2)_L×U(1)_Y where C,L andY are physical labels which stand for color, left-handed and hypercharge respectively whose meanings will be discussed later. The SM gives a precise description of electromagnetic, weak and strong phenomena.

In this section, we present particles from an experimentalist’s point of view.

All particles in the Standard Model are classified according to their quantum numbers but the main one is the quantum mechanical spin. The two sets of particles are fermions and bosons.

1.1.1 Fermions

Fermions are particles with half-integer spin (1/2, 3/2, 5/2, ...) in units of the reduced Planck’s constant~. When we refer to fermions, we also include antiparticles which have the same spin and mass as their corresponding particles but opposite electric charge and additive quantum numbers (e.g. lepton number). Fermions are also subdivided intoquarks andleptons which, in the

(14)

Standard Model, are considered as elementary particles that is they do not have any substructure.

Quarks

Quarks are the only fermions that participate in the strong interactions because they carry, in addition to electric charge, the color charge, which is responsible for the strong force. There are six different types (flavors) of quarks, namely up (u), down (d), charm (c), strange (s), top (t) and bot- tom (b). They exist in three colors: red (r), blue (b) and green (g). The anti-quarks carry anti-color ¯r, ¯b and ¯g. Apart from the top quark which has a lifetime of ∼ 10⁻²⁵ s, the quarks exist only as colorless bound states and have never been observed as free particles. The two colorless combinations of quarks are:

• Hadrons: Bound states of three quarks or three antiquarks. Colorles combinations of r, g and b for hadrons are for example rgb and ¯r¯b¯g.

The proton (uud) and the (udd) are famous examples of hadrons.

• Mesons: Bound states of a quark and an anti-quark. The colorless combination of r,g and b for mesons arerr,¯ gg¯and b¯b.

Leptons

The leptons are the fermions that do not participate in the strong interactions because they do not carry color charge. They exist as three generations each of which consists of a charged leptonl ∈ {e, µ, τ}and the associated neutrino v_l. The electron is the lightest and only stable charged lepton while the others decay via the weak interaction.

The neutrinos are electrically neutral and interact only by the weak interaction and this makes them hard to detect. They are quite mysterious particles, for example, in the Standard Model, the neutrinos are massless but it is known from neutrino oscillations¹ that at least two of them should be massive.

1Neutrino oscillations is a phenomenon where a neutrino can change flavor e.g. ν_e can change toν_µ. This is possible only if the difference between the masses is non-zero [13].

(15)

1.1.2 The bosons of the Standard Model

It is well known that the SM describes the electromagnetic, strong and weak interactions but not gravity. A non-trivial result from perturbative quantum field theory is that all these interactions are mediated by particles known as the gauge bosons which are spin-1 particles. They are:

• The massless photon (γ) for electromagnetism.

• The massive W^± and Z bosons for the weak interactions.

• The massless gluons for the strong interactions.

It is important to note that these particles are not only mediators but can be produced copiously at particle colliders.

The only spin-0 elementary particle in the SM is the Higgs boson which is responsible ifor the mass generation of the particles in the electroweak sector. It was a missing piece of the puzzle until it was discovered at the Large Hadron Collider in 2012.

The mediator of gravity is called thegraviton and is a spin-2 particle. It is still at the hypothesis level and has not yet been discovered experimentally.

1.2 Symmetries and conserved quantities

A conserved quantity always arises from the invariance of a given theory (action) under a symmetry transformation. The set of transformations forms a group and may be continuous or discrete. For example, invariance of all relativistic field theories under spacetime translations which is a subset of the full Lorentz group) leads to the conservation of four-momentum.

In the Standard Model, all the interactions are invariant under CP T transformations. C corresponds to charge conjugation which replaces a particle by its antiparticle, P is the parity (~x → −~x) and T is time-revearsal (t → −t). Strong and electromagnetic interactions are invariant the separate transformation C, P, T and CP but this is not the case for the weak interactions.

The lepton number (L) is a conserved quantity which is a consequence to a global² U(1) symmetry. Every generation of leptons is assigned a lepton

2A transformation is said to be global if it is not a function ofx∈R^3,1.

(16)

number L = 1 i.e. to each lepton and its neutrino separately, and L = −1 to their antiparticles. All other particles which are not leptons are assigned L= 0. All interactions conserve lepton number except neutrino oscillations.

The effects of neutrino oscillations are significant at large distances.

The baryon number is also a conserved quantity associated to a global U(1) symmetry. The baryons are assigned B = 1, the anti-baryons B =−1 and all the rest B = 0 including the mesons. This implies that a quark has B = 1/3 while the anti-quark hasB =−1/3.

1.3 Formalism

In this section we introduce the two main ingredients in the construction of the Standard Model.

1.3.1 Yang-Mills theory

Yang-Mills (YM) theory is a gauge theory where the gauge group is a non- abelian Lie group. The relevant group here is SU(N). Consider the spinor (fermionic) field ψ : R^3,1 → C^N and the SU(N) -valued field A_µ : R^3,1 → C^N^×N which under SU(N) transform as

ψ →U ψ,

A_µ →U A_µU^†+ i

g(∂_µU)U^†, (1.1)

where g ∈Ris called the coupling constant andU ∈SU(N). The latter can be written as

U = exp(−igαâTâ), a= 1, ..., N²−1 (1.2) where α :R^3,1 → R and {Tâ} are the generators of SU(N) which obey the Lie algebra

[Tâ, T^b] =ifâbcT^c. (1.3) Hhere, fâbc are the structure constants of SU(N). Another important prop- erty is

tr(T^aT^b) = 1

2δ^ab. (1.4)

{T^a} forms a basis for any SU(N) -valued field, therefore we can write

A_µ=A^a_µT^a (1.5)

(17)

The {A^a_µ} are referred to as the gauge fields which are interpreted as the gauge bosons of the theory; there are N²−1 of them.

The covariant derivative Dµ is defined as

D_µ=∂_µ+igA_µ≡∂_µ+igAâ_µTâ. (1.6) Next, we define the field strength F_µν =F_µνâ Tâ as

F_µν =−i

g[D_µ, D_ν] =∂_µA_ν −∂_νA_µ+ig[A_µ, A_ν], F_µνâ =∂_µAâ_ν −∂_νAâ_µ−gfâbcT^bT^c.

(1.7)

The YM Lagrangian L_YM is constructed with the requirement that it is invariant under a local SU(N) transformation and it is written as

L_YM = ¯ψ(i /D−m)ψ− 1

2tr(F_µνF^µν), (1.8) where m is the mass of the fermionic field ψ. There are no mass terms for the gauge fields as they would break gauge invariance.

The fermion-boson interaction term hidden in the covariant derivative is L_int ⊃ −gψ /¯Aψ. The most striking difference between YM theory and an abelian gauge theory (e.g. quantum electrodynamics) is that the kinetic term −¹₂tr(F_µνF^µν) contains cubic and quartic self-interactions of the gauge bosons; this is a direct consequence of [A_µ, A_ν]6= 0. The interactions in YM theory are shown diagramatically in Figure 1.1.

It is also worth mentioning that the coupling constant is universal in the sense that the vertex rules in YM theory contain only g. This is a direct consequence of the fact that SU(N) is a simple group. This is different for the electroweak theory whose gauge group is semi-simple and new coupling constants need to be added.

1.3.2 Symmetry Breaking

As mentioned before, the gauge fields in YM theory are massless to pre- serve gauge invariance. We know that some of the gauge bosons in the SM are massive, therefore we need a mechanism that gives them mass without breaking the gauge invariance in the original Lagrangian. We consider only

(18)

(a) Fermion-boson interaction ^(b) Cubic self-interactions

(c) Quartic self-interactions

Figure 1.1: Interaction vertices in YM theory

local SU(2) symmetry breaking with a Higgs doublet as it is the most relevant one for future discussions. We introduce a complex scalar field doublet φ :R^3,1 →C² which can be written as

φ = φ₁

φ₂

. (1.9)

Now consider the model Lagrangian L=−1

2tr(F_µνF^µν) + (D_µφ)^†(D^µφ)− λ 2

φ^†φ− v² 2

2

, (1.10)

where v, λ∈R. We fix the unitary gauge here by requiring Reφ₁ = Imφ₁ = Imφ₂ = 0 and Reφ₂ = φ_r

√2 Hence, we can rewrite φ(x) as

φ(x) = 1

√2 0

φ_r(x)

. (1.11)

The fieldφ(x) is assumed to take a non-zero vacuum expectation value (VEV) φ

written as

φ

= 1

√2 0

v

(1.12)

(19)

We expand around the VEV as follows:

φ(x) = 1

√2

0 v +h(x)

, (1.13)

where h(x) is the physical Higgs field. Substituting the expansion around the VEV in the original model Lagrangian, it can be shown that

(D_µφ)^†(D^µφ)⊃ g²v² 8

3

X

a=1

(A^a_µ)². (1.14) This implies that all the gauge fields have acquired the same mass m²_A = g²v²/4. In the Standard Model, the gauge group to be broken isSU(2)×U(1) (section 1.5.2) and in that case the symmetry breaking is referred to as the Brout-Englert-Higgs (BEH) mechanism.

1.4 Quantum Chromodynamics

Quantum Chromodynamics (QCD) is a pure³ YM theory with gauge group SU(3)_C. It is the fundamental theory of strong interactions where the fermionic fields are the quarks and the eight gauge bosons are the gluons.

The three-dimensional space in which the gauge transformation takes place is called the color space, hence the label C in SU(3)_C. As stated before, there are six types (flavors) of quarks. The quark field is a color triplet which transforms under the fundamental representation of SU(3)_C and it is written as

qf = (q_f^r, q_f^b, q^g_f)^T, (1.15) where f represents the flavor index, r,b and g are color indices which stand for “red”, “blue” and “green” respectively (they can also be labelled as 1,2 and 3). The A_µ in the previous section is replaced by G_µ and g by g_s, the strong coupling constant. Also, for SU(3),

T^a = λ^a

2 , (1.16)

3The term “pure” is used here to emphasize the fact that the quark fields in QCD are mass eigenstates which is not the case for a chiral theory where a fermion mass term would break gauge invariance.

(20)

where λ^a are the Gell-Mann matrices. The QCD Lagrangian is written as L_QCD=X

f

¯

q_f(i /D−m_f)q_f −1

2tr(F_µνF^µν). (1.17) As mentioned before, the gluons carry color charge which is not the case for quantum electrodynamics where the photon does not carry theU(1) (electric) charge.

1.5 The Electroweak Theory

Here we present the electroweak theory also known as the Glashow-Weinberg- Salam (GWS) theory. The construction of this theory is less straightforward than QED and QCD⁴.

1.5.1 Chirality and the fermion sector

The Dirac spinorψ can actually be decomposed in terms ofWeyl spinors ψ_L and ψ_R:

ψ = ψ_L

ψ_R

. (1.18)

ψ_L and ψ_R are known as left-handed (LH) and right-handed (RH) chirality states respectively.

It is an experimental fact that the W and Z bosons couple differently to LH and RH fermions. The W boson couples only to LH fermions whereas the Z boson couple to both LH and RH fermions in an asymmetric way. It is known that the leptons and quarks in the SM exist as doublets and there are three generations of them. These doublets are in turn broken into left- handed and right-handed components according to their weak isospin I_W. The neutrinos in the SM are the only ones which exist only as LH states but the rest exist both as LH and RH states. The LH doublets are assigned I_W = ¹₂ and are written as

ψ_L = ν_l

l⁻

L

and

u d

L

, (1.19)

4Although the QCD Lagrangian looks deceptively simple, it is a highly non-trivial subject with a tremendous amount of subtleties such as color confinement, non-perturbative QCD and lattice QCD.

(21)

wherelstands for lepton andν_lfor the corresponding neutrino,ufor up-type quark and d for down-type quark. The RH states haveI_W = 0 which means that they are singlets under SU(2)L and are written as

ψ_R=l⁻_R , u_R and d_R. (1.20) The weak hypercharge is related to the electric chargeQ and third com- ponent of weak isospin I_W³ via

Q= Y

2 +I_W³ , I_W³ =−IW, ...,0, ..., IW (1.21) The chirality of the fermions implies that a Dirac mass term would break gauge invariance and therefore cannot be present in the action of the theory.

1.5.2 The gauge sector

As mentioned before, the full electroweak gauge group is SU(2)_L×U(1)_Y. The set of SU(2)_L gauge fields is {W_µ^a}, the coupling constant is g and the generators are the Pauli matrices {σ^a}. For the U(1)_Y part, the gauge field is B_µ and the coupling is g⁰. The electroweak covariant derivative is written as

D_µ =∂_µ+igI_WW_µ^aσ^a+ig⁰Y

2B_µ. (1.22)

The physical gauge fields for the photon, Z and W^± bosons are defined respectively as

Aµ=BµcosθW +W_µ³sinθW

Zµ=−BµsinθW +W_µ³cosθW

W_µ^±= 1

√2(W_µ¹∓W_µ²)

(1.23)

where θW is the weak mixing angle, a constant determined experimentally.

It is related to the coupling constants by θ_W = tan⁻¹g⁰

g. (1.24)

1.5.3 Masses of the particles

Gauge bosons

At low energy, the SU(2)_L×U(1)_Y symmetry is broken via the BEH mechanism into SU(2)_W, the weak interactions, and U(1)_EM, the electromagnetic

(22)

interactions. The same Higgs doublet as in section (1.3) is considered with hypercharge Y = ¹₂ but the gauge fields are the photon, Z-boson and W^±- bosons. After the symmetry breaking, it is found that the Z and W bosons acquire the following respective masses:

m_W = 1

2gv and m_Z = 1 2vp

g²+g⁰². (1.25) These two masses are related by

m_W =m_Zcosθ_W. (1.26)

After symmetry breaking, the photon remains massless and does not couple to the Higgs field h(x). It is the mediator of electromagnetic interactions.

Fermions

As stated before, a direct mass term cannot be added for the fermions if gauge invariance is to be preserved. This is consequence of the chiral nature of the theory. Instead, the fermion masses arise from their Yukawa couplings to the Higgs scalars φ_i with i= 1,2.

The bilinears ψ_L^†ψ_R and ψ_R^†ψ_L are SU(2) doublets, therefore they may couple to the Higgs scalarsφ^∗_i orφ_i. The Lagrangian for the Yukawa couplings of the leptons to the Higgs scalar is

L_Yuk =− X

l=e,µ,τ

g_lφ^i∗ψ_R^†(l)ψ_Lⁱ(ν_l, l) + h.c. (1.27) where g_l is the coupling of the respective lepton to the Higgs scalar and h.c.

denotes hermitian conjugate. For the quarks, the Lagrangian is L_Yuk =−X

u,d

g_dφî∗ψ^†_R(d)ψ_Lⁱ(u, d)−g_uîjφî∗ψ_R^†(u)ψ_L^j(u, d)

+ h.c. (1.28) Here, ^ij is the totally anti-symmetric tensor of rank 2, g_d and g_u are the couplings of the down-type and up-type quarks respectively. Whenφacquires a non-zero VEV, the mass of the lepton is

m_l= g_lv

√2 (1.29)

and the masses of the quarks are m_u = g_uv

√2 and m_d= g_dv

√2. (1.30)

(23)

Chapter 2 Beyond the Standard Model

In this chapter we peep through the door of new physics beyond the Standard Model (BSM), more specifically supersymmetry (SUSY). The process to be investigated in this thesis is also introduced. The discussions will be mostly qualitative and the intricate formalism and phenomenology of SUSY will be omitted.

2.1 Limitations of the Standard Model

Despite its success, the Standard Model is quite messy and fails to address a few issues such as:

• Why is QCD a pure Yang-Mills theory while GWS a chiral theory that is why is the gauge group the way it is?

• Where do the ad-hoc Yukawa couplings to fermions come from?

• Why is there a plethora of free parameters that can only be determined experimentally?

• What is dark matter and dark energy?

BSM theories try to solve these puzzles and some examples are: technicolor, supersymmetry, extra-dimensional models, grand unified theories (GUTs) and string theory. Some of them are unification schemes: GUTs try to unify the three couplings whose curves on a coupling strength versus energy scale which do not meet, string theory is more almighty as it tries to combine all

(24)

physics phenomena including gravity into one framework and it also proposes a solution to some or all of the mentioned puzzles.

Dark matter is somewhat mysterious and its existence has been inferred from astrophysical observations. It is believed but not certain that dark matter is an elementary particle. One possibility is that dark matter is a weakly interacting particle (WIMP) which is a particle within the scale of weak interaction. At the TeV scale, the WIMP can be associated to the lightest supersymmetric partner (neutralino) in supersymmetric models with R-parity conservation which is multiplicative quantum number defined as

P_R= (−1)^3(B−L)+2s, (2.1)

where B is the baryon number, L is the lepton number and s is the spin.

2.2 A brief overview of Supersymmetry

2.2.1 Motivation and introduction

It is well known from quantum field theory that beyond tree-level, infinities start to show up. They are taken care of by a procedure called renormalization which is arguably the deepest and most important topic in physics. For example a calculation of the electron self-energy in QED, which corresponds to the diagram in figure 2.1(a), shows that the correction to the electron mass at one-loop order is [11]

δm= 3αm₀ 2π ln Λ

m₀, (2.2)

where α is the fine-structure constant, m₀ =m₀(Λ) is the bare mass and Λ is the ultra-violet (UV) cut-off beyond which the theory is not valid. If we for example take Λ∼m_Z and m_o ∼MeV, the mass correction is very small.

This is a consequence of the fact that Λ appears only in the log.

However, in calculations involving the renormalization of the mass of a scalar field, quadratic divergences arise. For example, when calculating the self-energy of the scalar in Yukawa theory (interaction term L_int =−gφψψ),¯ which corresponds to the diagram in figure 2.1 (b), the one-loop correction to the scalar field’s mass squared is [11]

δm² =− g²

8π² Λ². (2.3)

(25)

(a) Electron self-energy in QED

^(b) Scalar self-energy Figure 2.1: Radiative corrections

This implies that scalar fields have “poor” renormalization properties. In the SM, there is unfortunately a scalar field that appears which is the Higgs boson. This leads to an issue known as theHiggs/hierarchy problem which is related to the renormalization properties of the scalar field. The Higgs boson couples to all elementary particles (including itself) except the photon and gluons. This implies that there are more than one diagrams which contribute to its mass correction. The coupling constant is proportional to the mass of the particle of interest. Since the top quark is the heaviest of all elementary particles, it contributes the most to the radiative correction of the Higgs’

mass.

The divergence inδm²is negative for a fermion loop. This suggest that we can introduce a new theory with a symmetry between fermions and bosons in such a way that a boson loop, which contributes positively, cancels the negative fermion loop. This requires the boson to have the same properties as the fermion except for the spin which differs by 1/2. This leads to supersymmetry which solves part of the hierarchy problem associated to loop corrections.

The idea behind SUSY is that a superpartner (orsparticle) is introduced for every SM particle. The superpartners of fermions are called sfermions.

For example the spin-0 superpartner of a lepton l is a slepton ˜l. Similarly squarks are the superpartners of quarks.

For gauge bosons, the superpartners are calledgauginos, such as gluinos ( ˜G), winos ( ˜W) and binos ( ˜B) which are spin-1/2 particles. SUSY has an extended Higgs sector and the Higgs bosons also have spin-1/2 superpartners which are called higgsinos. The higgsinos exist as neutral and charged particles.

There exists a formal way to transform from fermions to bosons and vice versa and this transformation is known as a supersymmetry transformation.

(26)

The fermion and its boson superpartner are different states of the same su- permultiplet.

The fact that no superpartner has ever been observed implies that SUSY must be a broken symmetry and the symmetry breaking mechanism ensures that the superpartners are so massive that they cannot be detected. This fact also causes the loop cancellations to be imperfect but if the symmetry breaking scale is small enough (TeV scale), the disparity can be kept small enough so that the radiative corrections to the mass of the Higgs are small.

The mechanism for SUSY breaking is still mystery.

The common feature in all SUSY theories is that the superpartners do not decay into known particles. If they exist, the final decay product should be the lightest SUSY particle (LSP) which has not been detected yet. The LSP is a possible candidate for the dark matter WIMP mentioned previously.

Detection of the superpartners is one of the major hopes of the Large Hadron Collider.

2.2.2 Charginos and Neutralinos

The charged and neutral higgsinos mix with winos and binos to form charginos {χ˜^±₁,χ˜^±₂} and neutralinos {χ˜⁰₁,χ˜⁰₂,χ˜⁰₃,χ˜⁰₄}respectively. They are expected to be produced with relatively high rates at colliders through the decays of squarks and sleptons, for example [8]

˜

q →qχ˜⁰₂, χ˜⁰₂ →l^±˜l^∓, ˜l^± →l^±χ˜⁰₁. (2.4) The ˜χ⁰₁ is expected to be very weakly interacting and to escape detectors which implies that it is in the form of missing transverse energy. To get information on the spectrum of sparticles, detailed studies are performed for example on the distribution of the number of jets, number of same and opposite sign leptons, multiple leptons and lepton flavors. It is important to mention that the cross-sections for sparticle production depend highly on their masses, in our particular case those of ˜χ^±₁ and ˜χ⁰₁ or more specifically on the mass splitting defined as the difference between the masses of the chargino and the neutralino: ∆m=m( ˜χ^±₁)−m( ˜χ⁰₁).

Charginos and neutralinos can be produced directly but at a lower rate by a few processes such as Drell-Yan. This leads us to the process investigated in this thesis. The analysis targets the direct production of ˜χ⁺₁ χ˜⁻₁ which both decay into an on-shell W-boson and ˜χ⁰₁ (the LSP). We consider the decay of

(27)

Figure 2.2: SUSY process under investigation

the W only into isolated leptons (e, µ) with opposite charge and significant missing transverse energy (MET). The MET is expected from both neutrinos and the LSPs in the final states. The process is illustrated pictorially in figure 2.2.

The analysis of this process is quite challenging due to the significant SM background contributions, especially from the diboson W W.

(28)

Chapter 3 Proton-Proton Colliders

This chapter introduces the kinematic variables used in this thesis and some phenomenological aspects of proton-proton (pp) collisions. More details can be found in textbooks on collider physics such as [6].

3.1 Kinematics

The starting point is the four-momentum of a particle which is defined as

p≡p^µ = (E, ~p), (3.1)

whereEis the relativistic energy and~p= (p_x, p_y, p_z) is the three-momentum.

They are both related to the rest mass m of the particle as

E =γm and ~p=γm~v, (3.2)

where~v is the velocity and γ = 1/√

1−~v² with |~v| ∈[0,1). These relations lead to the following important result:

p·p≡p² =m² =⇒ E² =~p²+m². (3.3) A particle is said to be on-shell ifp²−m² = 0 while off-shell (or virtual) if p²−m² 6= 0. The concept of virtual particles is very important in perturbative QFT where they act as mediators.

(29)

3.1.1 Two-particle collision

The Center-of-Mass (CM) reference frame of two colliding particles is often the same as the laboratory frame. It is defined as the frame in which the sum of the three-momenta of the particles is zero.

Consider two colliding identical particles (1 and 2) whose four-momenta are p₁ andp₂. By definition, in the CM frame,p~₁ =−p~₂ such thatE =E₁ = E₂. The CM energy is defined as E_CM² =E₁+E₂ = 2E and the Mandelstam variable s is defined as

s= (p₁+p₂)² =E_CM² =⇒ √

s =E_CM. (3.4)

3.1.2 Collision products

We now discuss the kinematics of the products in a two-particle collision system. The convention in colliders is to take the z-axis as the axis of collision, the positivex-axis as pointing towords the center of the accelerator ring and the positive y-axis as pointing upwards. The (x, y) plane is called thetrans- verse. The four-momentum of a particle can then be denoted asp~= (~p_T, p_z) where~p_T = (p_x, p_y) is known as the transverse momentum andp_z as the longitudinal momentum. The magnitude of the transverse momentum is given by

p_T =q

p²_x+p²_y. (3.5)

The transverse energy is defined as E_T =

q

m²+p²_T =p

E²−p²_z. (3.6)

The transverse variables are all invariant under Lorentz boosts in the z- direction. It is convenient to use spherical coordinatesφ and θwhich are the azimuthal angle and polar angle respectively such that p_T = |~p|sinθ. This is illustrated in figure 3.1

The rapidity of a particle is related to its energy and longitudinal momentum by

y = 1 2ln

E+p_z E−p_z

. (3.7)

At high momenta (relativistic limit),|~p| msuch thatE−pz ≈2|~p|sin²(θ/2) and E+p_z ≈2|~p|cos²(θ/2), the rapidity can be approximated by the pseu-

(30)

(a) Transverse plane (b) Longitudinal plane Figure 3.1: Collision plane

dorapidity (η) as

y≈ −ln

tanθ 2

≡η. (3.8)

The pseudo-rapidity is more convenient experimentally because (i) it does not include the mass of the desired particle which in most cases is unknown and (ii) because of the occurence of a plateau: particle multiplicity per unit rapidity is nearly constant. Also for reasons of convenience we define the distance between two particles (1 and 2) in the (η, φ) space as

∆R=p

∆η²+ ∆φ², (3.9)

where ∆η = η₁ −η₂ and ∆φ = φ₁ −φ₂. For experimental purposes, the basic kinematic variables arem,pT,ηandφ. Next we define other kinematic variables which are used in the analysis targeted in this thesis.

From the conservation of total momentum, we know that P

i~p_T,i = 0 where the sums runs over all the particles produced. However, not all particles are detectable which implies that we can split the sum as

X

i

~

p_{T ,i} =~p^miss_T +X

~

p_T (3.10)

where the sum on the right-hand side is over all the particles which can be detected. This leads to the definition missing transverse momentum:

~

p^miss_T =−X

~

p_T, (3.11)

(31)

where the sum is over all the particles which can be detected. We also define the missing transverse energy (MET) as E_T^miss =|~p^miss_T |.

In practice, when the MET is reconstructed, there are some contributions from unwanted signals that cannot be calibrated and identified. For this rea- son, it is useful to introduce the MET significanceS which, in the relativistic limit (ET =|~pT|) is defined as

S = E_T^miss

pPE_T. (3.12)

A high value of significance indicates that the event is more likely to contain desired objects (e.g. neutralinos) that escape direct detection.

Another important variable is the invariant mass of two particles (1 and 2) which is defined as m₁₂ = (p₁+p₂)² and is written in terms of the basic variables as

m₁₂= 2p_T₁p_T₂[cosh (η₁−η₂)−cos (φ₁−φ₂)]. (3.13) Some particles cannot be detected directly (e.g. neutrinos) but are inferred from the transverse mass. Suppose a particle decays into particles 1 and 2 where 2 is the invisible one, then the transverse mass is given by

mT = q

m²₁+m²₂+ 2(ET1ET2−~pT1·~pT2). (3.14) In the relativistic limit (ET =|~pT|), the transverse mass can be written as

m_T =p

2E_T₁E_T₂(1−cos ∆φ) (3.15) where ∆φ is the angle between the two particles with transverse momenta

~

p_T₁ and ~p_T₂. The maximum value ofm_T is the mass of the decaying particle.

In our particular SUSY scenario introduced in the previous chapter (W W → lνχ˜⁰₁lνχ˜⁰₁), there are more that one invisible particles and a more suitable kinematic variable is the stransverse mass m_T₂ defined as

m_T₂(l, l, E_T^miss) = min

~ p^miss_T

{max[m_T(~p_T₁, ~q_T₁), m_T(~p_T₂, ~q_T₂)]} (3.16) where~p_T₁and~p_T₂ are the transverse momenta of the two leptons and~p^miss_T =

~

q_T₁ +~q_T₂ contains all the information about the invisible particles. The minimization is performed over all the possible configurations of ~qT that is over all possible values of ~q_T₁ and ~q_T₂. The stransverse mass is very useful because in the end all we only know about the neutralinos is ~p^miss_T .

(32)

3.2 pp interactions

Proton-proton collisions can be classified into elastic, diffractive and non- diffractive processes. Elastic and diffractive processes fall into non-perturbative QCD which is a highly non-trivial subject. They are lowp_T phenomena and the particles are produced mostly along the beam line therefore they are not interesting for our current analysis. Non-diffractive processes can be studied using perturbation theory and are appropriate for our purposes because the particles produced in the targeted analysis have highpT. Non-diffractive processes involving high momentum transfer are also known as hard scattering processes.

3.2.1 Partons

The proton is not elementary but is made of quarks and gluons which are col- lectively known aspartons. Appcollision is actually a parton-parton collision and each parton carries only a fraction of the proton’s total momentum.

Consider two relativistic protons colliding along thez-axis such that their four-momenta are p₁ = (E,0,0, E) and p₂ = (E,0,0,−E). We assume that the two colliding partons have negligible three-momenta components in the (x, y) plane such that the momenta are~q₁ = (0,0, q₁) and~q₂ = (0,0, q₂). The momentum fraction carried by the partons are defined as

x₁ = q₁

E and x₂ = q₂

E (3.17)

such that their four-momenta can be written as

Q₁ =x₁(E,0,0, E) and Q₁ =x₂(E,0,0,−E). (3.18) A particle produced in the collision has a maximum rest mass given by (Q₁ +Q₂)² = x₁x₂s. The momentum distribution between the partons is described by theparton distribution function (PDF). This function can only be determined experimentally usually in deep inelastic scattering (DIS) and is basically a probability density function denoted as f(x). More details on partons and PDFs can be found in [13].

3.2.2 Hadronization and jets

As mentioned previously, quarks and gluons are not observed as free particles but as color singlets. When trying to separate the quarks of a colorless object,

(33)

the color field acquires enough energy to create new partons. Quarks radiate gluons which in turn create quark-antiquark pairs and this process continues until the energy is low enough (less than the QCD scale ΛQCD) to form hadrons. This process is known as hadronization. It is not an established subject and hence not fully understood.

In ppcollisions, many hadrons are created as end products and they are referred to as hadronic showers. In particle detectors, the hadronic showers are observed in narrow cones which are known asjets. It is important to note that jets are not only the products of hard scattering but they also come from initial state and final state radiation (radiation of gluons from quarks and antiquarks). A proper treatment of hadronization and jets can be made only in QCD.

3.3 Luminosity and pile-up

The event rate (number of events per unit time dN/dt) of a given scattering process R is related to the cross-section¹ σ by R=Lσ whereL is the luminosity, a parameter completely determined by the properties of the colliding beams (assuming a Gaussian profile) [13]:

L = n1n2

4πσ_xσ_yf (3.19)

In an accelerator, the particles are grouped into bunches that are brought into collision and here, n1 and n2 are the number of particles per bunch, σ_x and σ_y are the beam’s transverse widths and f is the rate at which the bunches cross. The number of events is given by integrating the rate over time:

N = Z

Rdt=σ Z

L(t) dt. (3.20) The integrated luminosity is a measure of the amount of data collected over a period of time and has the inverse unit of cross-section namely inverse femtobarns (fb⁻¹).

1The differential cross-section is a measure of the differential (quantum) probability of a process to occur. The total cross-section is obtained by integration over a given solid angle.

(34)

Pile-up is the phenomenon that occurs when there are more than one collision during the bunch crossing. These additional collisions do not lead to hard scattering and very accurate measurements² are required to reconstruct the primary vertex where the interaction of interest took place. One example of a difficulty pile-up poses is that it produces jets which have to be taken into account before doing any analysis.

2Measurements with minimum bias trigger (without too much filtering) are performed to be able to reconstruct all the vertices and identify the primary vertex.

(35)

Chapter 4 The Large Hadron Collider and the ATLAS Detector

This chapter gives a brief description of the experimental setup at the Large Hadron Collider (LHC) and the ATLAS detector from which all the data used in this thesis is obtained. The discussions will be very descriptive and history will be omitted. All the (updated) details on the experimental setup can be found on the CERN website. The organization hosting the high energy physics experiment of our interest is CERN, a french acronym which stands for European Organization for Nuclear Research. It is located on the border between Switzerland (Geneva) and France. Thanks to CERN light has been shed on several aspects of high energy physics through the discovery of many particles such as the Z and W bosons and the Higgs boson.

4.1 The Large Hadron Collider

The LHC is the world’s most powerful particle accelerator and has reached, at the time of writing, a maximum center of mass energy of 13 TeV in proton- proton collisions. It is circular with a circumference of 27 km and is situated 100 m underground.

The LHC is circular and the two particle beams are accelerated through the ring and are made to cross each other inside the detectors. The beams are guided around the ring by superconducting magnets which are cooled down to −271.3^◦C by liquid helium. The beams are steered by 1232 dipole magnets and focused by 392 quadrupole magnets. A schematic diagram of

(36)

Figure 4.1: CERN accelerator complex [9]

the accelerator complex is given in figure 4.1.

Single protons, prepared by the ionization of hydrogen gas, are injected in the LINAC 2 accelerator and are accelerated to 50 MeV. They are then fed into the Proton Synchroton Booster (PSB) which accelerates them further to 1.4 GeV. The beam energy is increased further in the Proton Synchroton (PS) to 26 GeV and inside, they are arranged into bunches. The beam then goes through the Super Proton Synchroton (SPS) and the energy reached is 450 GeV. Finally, the beam enters the LHC where the energy is increased to the desired center of mass energy that is 13 TeV.

Finely tuned radio frequency (RF) cavities¹ inside the LHC are used to:

• Accelerate the particles so that the ones with lower (higher) energy are accelerated (decelerated) and those with the desired energy remain untouched,

• Ensure that the particles stay in bunches.

The maximum number of bunches at which the LHC operates is 2808,

1An RF cavity is a metallic chamber that confines electromagnetic fields at 400 MHz in the case of the LHC

(37)

the number of protons in each bunch is ∼ 10¹¹ and the bunch spacing is 25 ns. This corresponds to a design luminosity of 10³⁴ cm⁻²s⁻¹.

The four main experiments at the LHC are:

• ATLAS (A Toroidal LHC ApparatuS): more details will be given in the next section.

• CMS (Compact Muon Collider): like ATLAS, it is a multipurpose dedector which searches for new physics such as SUSY and also per- forms high-precision measurements on the Higgs boson. It also records and studies heavy ions collisions.

• ALICE (A Large Ion Collider Experiment): it focuses on heavy ion collisions (lead-lead and lead-proton) to study the quark-gluon plasma (QGP), a phase of matter where the quarks are free and which is believed to have existed right after the Big Bang (around 1 µs).

• LHCb (Large Hadron Collider beauty): it studies CP-violating processes involvingb-quarks to understand the asymmetry between matter and anti-matter in the universe.

4.2 The ATLAS detector

ATLAS [5] is a cylindrical detector designed to capture most of the products in collisions and therefore covers a solid angle of nearly 4π. The central part is called thebarrel (|η|.2.5) and the two end parts are called end-caps (|η|&

2). The layout of the ATLAS detector is shown in figure 4.2. The detector consists of five main components: an inner detector (ID), calorimeters, a muon spectrometer (MS) system, a magnet system and the trigger and data acquisition (DAQ) system. These components will be discussed in some details in the next subsections.

4.2.1 Inner detector

The pixel detector along with the semiconductor tracker and transition radiation tracker make up the inner detector. It is surrounded by a solenoid magnet of field strength 2 T at the center. It has a measurement coverage of |η| < 2.5 The main purpose of the ID is to measure tracks of charged

(38)

Figure 4.2: The ATLAS detector

particles and event vertices. The magnetic field allows the determination of charge and momentum of the particles.

The momentum measurements in the ID have a resolution² of σ_p_T/p_T = 0.05%p_T ⊕1% where a⊕b := √

a²+b². The resolution gives a measure of the performance of a given apparatus.

4.2.2 Calorimeters

Calorimeters stop particles and measure their energies. They are divided in to a barrel part and two end-caps like the ID. In ATLAS, there are two types of calorimeters, namely the electromagnetic calorimeter (ECal) and the hadronic calorimeter (HCal).

2The resolution of a measurable quantityxis the fractional uncertainty given byσ_x/x

(39)

Electromagnetic calorimeter

The main task of the ECal is to stop electrons, positrons and photons. How- ever, charged hadrons and photons from neutral pion decays also deposit energy in the ECal. It is made of layers of lead and liquid argon. Lead is the passive medium in which most of the interactions occur while argon is theac- tive medium where the atoms are ionized and the electrons are deposited on electrodes. The ECal has an energy resolution of σ_E/E = 10%/√

E⊕0.7%.

Hadronic calorimeter

As its name suggests, the HCal is used for the detection of hadrons and consists of three different subsystems where the first two share the same technology as the ECal:

1. The hadronic end-cap (HEC) is located right outside the end-cap of the ECal and copper is used as the passive medium while the active medium is argon. It has an energy resolution ofσ_E/E = 50%/√

E⊕3%.

2. The forward calorimeter (FCal) consists of three basic units where the first one is used for electromagnetic measurements while the other two are intended for hadronic interactions. It has the same energy resolution as the HEC.

3. The tile calorimeter is located outside the ECal and consists of alternat- ing layers of steel (passive medium) and plastic scintillator tiles (active medium). The energy resolution of the tile calorimeter is σ_E/E = 100%/√

E⊕10%.

4.2.3 Muon spectrometer

In the MS, precision measurements are performed on the muons that exit the calorimeters. The layout of the MS is the same as the other components with a barrel part and two end-caps. The momentum resolution of the MS is σpT/pT = 10%/√

E⊕0.7% at pT = 1 TeV.

4.2.4 Magnet system

Besides the solenoid magnet in the ID, the ATLAS magnet system consists of one barrel toroid (magnet) an two end-caps toroids. They have the same

(40)

task as the solenoid magnet in the ID. The magnetic field strength for the barrel toroid and the end-cap toroids are 0.5 T and 1 T respectively.

4.2.5 The trigger system

The trigger system is, roughly speaking, a filter of events which ensures that interesting events are kept for data analysis. This filtering is necessary due to the limited data storage capacity of 1 kHZ while the event rate is 40 MHz.

The trigger system is classified into two levels namely the Level-1 (L1) trigger and the High-Level Trigger (HLT).

The L1 trigger is hardware based and is part of the electronics of the detector. It has been designed to keep events (from the MS and the calorimeters) with high p_T leptons, photons, jets and hadronically decaying tau leptons as well as events with high missing transverse energy and large transverse mass. It reduces the event rate to about 100 kHz and defines regions of interest (RoI) based on the (φ, η) coordinates of the “interesting” objects. The maximum decision time for the L1 trigger is 25 ns and the decision is sent to the read-out-system (ROS) within 2.5 µs.

The HLT is software based and accesses the ROS to further narrow down the event rate to 1 kHz. It is based on the region of interest defined by the L1 trigger and has an average processing time of 0.2 s per event. Offline processing of data is made from all the events that pass the HLT.

(41)

Chapter 5 Machine Learning Methods

This chapter reviews the two machine learning (ML) methods used in this thesis namely logistic regression and boosted decision trees (BDT). The ma- terial presented here is heavily based on the excellent notes prepared and provided by James Catmore (University of Oslo, Department of High En- ergy Physics) and on the 2018 Geilo Winter School course notes on machine learning. For more details on the topics presented in this chapter or deep learning in general, the reader is referred to for example [14].

5.1 General introduction

Machine learning is a computing tool used in the analysis of big data with a large number of variables such as the data from collider experiments. It is based on statistical inference and it is used to effectively perform a specific task without using explicit instructions. The strength of ML techniques lies in its ability to extrapolate pattern from the data set and to make predictions on new data based on what it has learned.

The only machine learning approach we will consider issupervised learning where the algorithm learns from a set of labeled data called the training sample in order to make predictions on an unlabeled data set also known as the testing sample. The performance of a machine learning algorithm is determined by the comparison between the labeled and unlabeled sets.

Although we will use the algorithms as a tool, it is important to understand the workings behind the “black-box” so that we can perform modifications on the different features according to our problem.

(42)

We start with a general regression problem, then we will translate it to binary classifiers. Our input data in this case is the set of values of the parameters belonging to a feature hyperspace. We denote each value of the (independent) parameter as¹ x⁽ⁱ⁾ wherei= 1, ..., mand x= (x₁, ..., x_n)∈ H.

We define the feature hyperspace H as the set of all the parameters for the system under consideration:

H ={x₁, x₂, ..., x_n}. (5.1) The output (dependent) variable is written as y⁽ⁱ⁾ = y(x⁽ⁱ⁾). Now we want to model the output, also called a hypothesis and we denote it as h_θ(x⁽ⁱ⁾).

The hypothesis is exactly the function that a learning algorithm uses for prediction and is characterized by a set of parameters {θ₁, ..., θ_l}. Suppose that the hypothesis deviates from the data by ⁽ⁱ⁾:

h_θ(x⁽ⁱ⁾) =y⁽ⁱ⁾+⁽ⁱ⁾. (5.2) In any regression analysis, we seek a set of parameters θ = {θ₁, ..., θ_l} that minimize the loss function J(θ) i.e. we want to find minθL(θ). For example, the loss function for linear regression is

L(θ) = 1 2m

m

X

i=1

(h_θ(x⁽ⁱ⁾)−y⁽ⁱ⁾)². (5.3) Although we have considered the linear regression case for simplicity, the discussion may easily be generalized to more complex regression algorithms.

Gradient descent method of minimization

In gradient descent method, we start with some θ and we keep changing it to reduce J(θ) until we end up (hopefully) at a minimum. Formally, the (batch) gradient descent algorithm is

θ_i+1 =θ_i−α∇_θL(θ), (5.4) where ∇_θ = (∂/∂θ₁, ..., ∂/∂θ₁) and αinR is the learning rate which gives a measure of the size of the step we take towards the minimum. We want α small enough to converge towards the minimum but large enough for rapid convergence, therefore it should be chosen judiciously in the machine learning algorithms.

1Note that the superscript index does not refer to different features (parameters) but to the entry in a given training set. To denote different features in a feature hyperspace, we use a subscript.

Search for new physics phenomena in events with two leptons and missing transverse energy using machine learning