Treatment of non-Gaussian noise in invariant mass calculations

(1)

Treatment of non-Gaussian errors in invariant mass calculation

Helga Holmestad

50 100 150 200 250 300

The mass PDF for a single event [GeV]

0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018

Probability density

Thesis presented for the Master of Science degree in Experimental Particle Physics

August 2012

(2)

(3)

Abstract

The Gaussian Sum Filter is a track reconstruction algorithm for treating energy loss by bremsstrahlung, and produces non-Gaussian estimates for the track parameters.

This thesis explores a method of propagating these non-Gaussian errors into a non- Gaussian estimate of the invariant mass. It is tested if the method can be used to improve the invariant mass resolution in ATLAS, and if it gives a good description of the errors on the invariant mass. The result showed that the invariant mass resolution is not improved, but a large improvement in the description of errors is found.

3

(4)

(5)

Acknowledgments

During the year it took me to write this thesis, I have received lots of help and sup- port from many people. First I want to thank my supervisor Are Strandlie and his PhD student Håvard Gjersdal for many hours of help and discussion. I also want to thank Ole Røhne, Silje Raddum, Magnar Bugge, Alex Read and Wolfgang Liebig for always answering my more or less sensible questions. Kyrre Ness Sjøbæk and Håvard Gjersdal have given me a lot of good advice concerning programming, and really helped me develop my programming skills. I really appreciate it, and I can’t understand how you have been able to be so patient and helpful when I have been stressed and grumpy.

5

(6)

(7)

Introduction and thesis outline

Elementary particle physics is the study of the most fundamental particles of our world, particles that can’t be split into smaller pieces. Most of these fundamental particles are not stable, but will decay immediately after they are produced. These unstable particles can be produced by colliding stable particles, and the purpose of the Large Hadron Collider (LHC) at CERN is to do exactly this, in order to investigate the existence and properties of heavier and unstable particles. Since these interesting particles decays very rapidly, it is impossible to study them directly, and their properties must be from the decay products. For measuring the decay products, detectors like the ATLAS detector surround the collision point.

The inner part of ATLAS does position measurements of the particles passing through the detector, and from these position measurements it is possible to estimate their state vector. The state vector fully describes where the particle is and how it is moving at a given point in time, and the full state vector will from here on be calledx. The state vectorxis built up by a position- and a momentum 3- vector, and the process of finding the state vectorxfrom the position measurement is called track reconstruction. Very often the most interesting is the momentum 3-vector, because the momenta of the decay products can be used to reconstruct the mass of the decaying particle. In this thesis the momentum 3-vector is given asq= [φ, θ, q/p], which is convenient to use in a cylindrical detector with a magnetic bending field¹. A description of the ATLAS coordinate system is found in Section 4.1.

This thesis will study reconstruction of invariant mass where the decay products are electrons. The invariant mass is calculated from the momenta qof the electrons, and only information from track reconstruction is used to estimate q.

Many well known particles can decay into electrons, such as the Z boson. The Higgs boson has many possible decay channels, and one of these is decay into two electron/positron pairs. This is called the golden channel, because it gives an easily recognizable signature in the detector.

1The most common otherwise is to use Cartesian version[px, py, pz], butqis closer to cylindrical coordinates which are used in ATLAS.

11

(12)

For electrons, a track reconstruction algorithm called the Gaussian Sum Filter (GSF) can be used. This algorithm takes into account that the probability density function (PDF) for the energy loss of electrons passing trough the material in the detector has a non-Gaussian nature, and produces a non-Gaussian PDF for the state vector x of the electron. This PDF will from here on be called the track PDF. The standard approach is to extract one single estimate forxthat is used in further analysis, discarding information about the shape of the track PDF. Details about this procedure is found in Section 6.1. It is however possible to use the full track PDF to make a non-Gaussian PDF for the mass estimate, and this PDF will from here on be called themass PDF. This thesis will investigate the possibilities of using the mass PDF to extract more information from the reconstructed event.

There are two main objectives, the first is to see if the mass PDF gives a good representation of the reconstructed mass. The next objective is to see if the mass PDF can be used to improve the mass estimate of a decaying particle. How the mass PDF is calculated, and how the performance of this method is tested is discussed in detail in Chapter 6.

Previous work on the topic of this thesis has already been done by Håvard Gjersdal in his master thesis [16], and to the best of the author’s knowledge this is the only previously written up work on this topic. This work showed some promising results and therefore this method of making a mass PDF is interesting.

There are several reasons why the mass PDF approach has to be explored in more detail with the data and tools available today. Most importantly, the amount of Monte Carlo (MC) data analyzed was too small due to computing problems. His thesis was written in 2008, at a time when the LHC and ATLAS had not yet started to operate and only MC data where available. MC production mimics the detectors response while knowing exactly which particles are created and both the type of and true state vectorxof the decay products. The fact that the LHC and the ATLAS detector have been operating for some time gives the opportunity to study both real data and MC data. This important fact also impacts MC data, because MC production is calibrated against the real detector. MC data from 2008 was not as well calibrated as it is today, because the ATLAS detector had not been operating nor tested as a complete and integrated system. MC data from 2011 is used in this thesis, and should better mimic the real performance of the ATLAS detector then MC data from 2008 did.

The outline of the thesis is as follows: Chapter 2 is gives a brief general introduction to particle physics. Chapter 3 explains the main features of how particles interact with matter. Since particle detectors depend on particles interacting with matter this gives the background for Chapter 4, where the ATLAS detector is presented. Chapter 5 presents track reconstruction in general, explains the Kalman filter and the GSF which is an extension of the Kalman filter. This chapter should give the theory needed for understanding the work done in this thesis. Chapter 6 describes how the mass PDF is calculated, and then presents how the data is retrieved and analyzed. Chapter 7 is a preliminary study of the track PDF. Chapter 8, 9, and 10 presents the result based MC data for the processes Higgs → 4e,

(13)

13 J/ψ→ee, Z →ee. Chapter 11 presents some results using real data. Chapter 12 discusses the results, and draws the conclusions.

(14)

(15)

Chapter 2

Introducing particle physics

At CERN an enormous amount of time and effort has been put into building the LHC with all its detectors. Currently much effort is spent in analyzing the data and operating the detectors and the collider. This chapter briefly describes the main goals of the large experiments at the LHC [19]. Then follows a brief description of the accelerator system, including the Large Hadron Collider [2].

2.1 The Standard Model

Throughout history, mankind have been interested in knowing what are the fundamental building blocks of our world. The Standard Model of particle physics is the most successful theory so far, describing all the known fundamental particles and fundamental forces except gravity. Figure 2.1 shows a table with all the known particles in the Standard Model, except from Higgs. The Standard Model is a quantum field theory, and it is the quantization of the fields that gives us the particles. The particles in the Standard Model are divided into fermions having half integer spin, and bosons having integer spin. The fermions are split into quarks and leptons.

Quarks carry color charge and interacts mainly through the strong force. Since the quarks are charged fermions, they also interacts through the electromagnetic and the weak force. Quarks has an important property called confinement. In short this means that free quarks can’t exist, they always form hadrons. When the quarks in a hadron is separated, new quarks are created from the vacuum and quickly form new hadrons. This process is called hadronization, and this mechanism can cause one quark-quark collision to produce a whole cascade of hadrons. All leptons interacts trough the weakly force, and only the electrically charged leptons in addition interacts electromagnetically. All the fermions are found in three generations, the second being heavier than the first and the third generation being the heaviest. This leads to the fact that the second and third generation fermions are not stable, therefore the stable matter in the universe is mainly made up of electrons and up/down quarks.

The definition of a boson is a particle with integer spin. The elementary bosons 15

(16)

Figure 2.1: All the known particles in the Standard Model

we know today are force carrying particles. The electromagnetic force is mediated by the photon, the weak force by theZ andW^±, and the strong force by the gluons. The photons mediate the electromagnetic force, theZ andW^±mediates the weak force and the gluons the strong force. In order to prove their existence, and hence give evidence to the Standard Model, particle accelerators have been built.

In a collider experiment, stable but lighter particles, such as protons or electrons, collide and interact. This interaction can cause other particles to be created, following the predictions of the Standard Model. The mass-energy equivalence gives the possibilities for heavier particles to be made, if some of the kinetic energy of the colliding particles is converted into mass. Such heavy particles are unstable, and decays quickly. The only particles living long enough to be measured in a detector are electrons/positrons, muons, photons, proton and neutrons, some mesons, and neutrinos. It should be mentioned that neutrinos are very difficult to measure as they have only weak interaction, and in ATLAS they can only be seen by missing transverse momenta¹. Single quarks can not be measured directly because of confinement, but are seen implicitly as jets in the detector. A jet is a narrow cone of hadrons produced by the hadronization of quarks and gluons. Short-lived particles which are not possible to measure in a detector are studied by measuring their decay products. The Standard Model predicts what the possible decay products should be, and their respective probabilities. The mass of the short-lived particle is then reconstructed by assuming a specific decay possibility. More observations than expected at some specific mass then serve as an evidence of a short-lived particle. So far all the particles that build up the Standard Model, probably including Higgs boson, are today found. The new boson discovered summer 2012 is com-

1For the definition of transverse momenta see Section 4.1

(17)

2.2. THE LARGE HADRON COLLIDER (LHC) 17 patible with the Standard Model Higgs, however more data is needed to finally conclude that it is the Standard Model Higgs [9]. The existence of the Higgs boson further validates the Standard Model. The Higgs boson is the quantization of the Higgs field. According to quantum field theory all fields are associated with a particle, hence finding the Higgs particle will validate the existence of the Higgs field. The Higgs field can explain why particles have mass, and all particles gets their mass by interacting with this field. The Higgs field resist the acceleration of the particles, which is exactly the same effect usually associated with mass.

Event thought the Standard Model can explain much about our world, there are still some problems the Standard Model does not solve, such as the matter - anti-matter asymmetry of the universe, dark matter, and it gives no explanation for gravity. Supersymmetry is one of the theories trying to explain these problems.

Supersymmetric particles are also looked for in ATLAS, and the other detectors at the CERN.

2.2 The Large Hadron Collider (LHC)

The LHC is a synchrotron hadron collider, and its main purpose is to make proton- proton collisions, although it also collides heavy ions such as lead. The circumfer- ence of the LHC is 27 km, and it has the possibility of accelerating protons up to an energy of 7 TeV, giving a maximum center of mass energy of 14 TeV. It is the partons, that means the quarks and gluons inside the protons, that actually collides.

This means that the energy in a collision will not be as high as 14 TeV, as the energy is distributed among the partons. Still the energy is high enough to go into the TeV scale, where theorists predicts to find some of the supersymetric particles.

An overview of the accelerator complex is shown in Figure 2.2. As seen from the figure, the protons are accelerated in a chain of several accelerators before entering the LHC. The protons are first extracted from a hydrogen plasma, and are then accelerated by Linac 2. Linac 2 injects into the PS Booster, which is the first synchrotron. This further injects into the PS, which injects into the SPS. The SPS serves as an injector to the LHC, and the energy is gradually raised in each of the accelerators in the chain. The LHC has four intersection points where the particles collide, and the detectors are located here. LHC has four main detectors:

the general purpose detectors ATLAS and CMS, LHCb whose main purpose is to study the physics of bottom quarks, and ALICE which mainly investigates heavy ion collisions.

(18)

Figure 2.2: Overview of the CERN accelerator complex

(19)

Chapter 3

Particles interacting with matter

The ATLAS detector exploits the fact that particles interact with matter to do position and energy measurements of the particles passing trough it. This chapter explains how particles interact with matter, in order to give a background for understanding the ATLAS detector, which is described in Chapter 4. The inner detector is described in greater detail there, as it contains the tracking detector providing the position measurements used for track reconstruction. The position measurements in the inner detector are calledhits. The ideal for a tracking detector is to measure as many hits as possible, while the particle passing trough the detector interacts with as little material as possible, as any interaction might cause the particle to lose energy or change direction. However, there have to be some interactions in order to measure the hits. For tracking to be optimal, the effects of particles interacting with matter have to be understood and taken into consideration in the best possible way.

The particle passing trough the detector can loose energy through ionization- or radiative energy loss, while multiple scattering changes the direction of the particle.

Since there is a magnetic field in the detector, energy loss will also influence the path of the particles. In addition to the interactions mentioned here, there are also strong- and weak interactions. Strong interaction happens between hadrons and is in ATLAS most important in the calorimeters. Weak interaction has a very low cross section, and therefore do not deposit significant amounts of energy in the ATLAS detector.

3.1 The Bethe-Bloch formula for ionization energy loss

All particles, except highly relativistic electrons and positrons, lose energy mainly through ionization energy loss [25]. This happens as the traversing particle ionizes or excites the atoms in the material, making the particle lose energy to the material.

The Bethe-Block equation

− dE

dx

= 2πN_ar_e²m_ec²ρZ A

z² β

log_e

2m_eγ²v²W_max I²

−2β²

(3.1) 19

(20)

describes the mean ionization energy loss−_dE

dx

, where the lengthx= ^distance_ρ is normalized to the density of the material. Further, the numerical constant2πNar_e²mec² = 0.153 MeV cm²/g, and the rest of the symbols used in the equation are shown in Table 3.1.

Symbol Definition

re classical electron radius me electron mass

N_a Avogadros number I mean excitation potential

Z atomic number weight of absorbing material A atomic weight of absorbing material

ρ density of absorbing material

z charge of incident particle in units of e β ^v_c of the incident particle

γ √¹

1−β²

Wmax Maximum energy transfer in a single collision

Table 3.1: Definition of the different symbols used in the Bethe-Block formula.

In Figure 3.1 the different types of energy losses for electrons going trough lead is shown. This shows that ionization energy loss per unit length goes down as the energy increases. This is also true for other particles than electron. Energy loss by ionization is a stochastic process, but the variance is small enough for a deterministic approach to be used [12].

3.2 Energy loss by bremsstrahlung

When charged particles are accelerated fast enough they can emit photons, caus- ing the particles to loose energy. This process is calledbremsstrahlung, and also happen when charged particles travel through material and are deflected in the electric field around the nuclei. The cross section for this process is proportional tom⁻², wherem is the mass of the incoming particle [17, Chapter 27]. Thus energy loss by bremsstrahlung is very important for electrons, which are light, as shown in Figure 3.1. This figure shows that energy loss by bremsstrahlung gets more important at higher energies, mainly because other kinds of energy loss gets less important. Bremsstrahlung can also change the path of the electron, because the photon has a momentum, and the total momentum has to be conserved¹. Fur- ther, when there is a magnetic field, the energy loss by itself changes the path of

1It is usually assumed that the scattering nucleus is much heavier than the electron and does not exchange momentum, such that it is effectively a 2-body process.

(21)

3.3. MULTIPLE SCATTERING 21

Bremsstrahlung Lead (Z = 82) Positrons

Electrons

Ionization Møller (e−)

Bhabha (e+) Positron

annihilation 1.0

0.5

0.20

0.15

0.10

0.05 (cm2g−1)

E (MeV) 01

10 100 1000

1 E−dE dx(X0−1)

Figure 3.1: Fractional energy loss per radiation length passage of electrons through lead, given as a function of electron energy. Figure from [17].

the particle. Bremsstrahlung is a stochastic process, as it involves relatively few interactions when passing trough a piece of material².

The process is modeled by the Bethe-Heitler probability distribution f(z) = (−log_e(z))^c−1

Γ(c) , (3.2)

wherec≡ _log^t

e(2),z ≡ ^E_E^f

i, andt ≡ _x^x

0. The variabletis the amount of material traversed in radiation lengths, and Γ is the gamma function. The energy when entering the material isE_i, and the energy after traversing the material isE_f.

Figure 3.2 shows some examples of the Bethe-Heitler distribution for different material thicknesses. This shows that the Bethe-Heitler distribution is highly non- Gaussian. This is important, as the PDF used to describe energy loss in the Kalman filter is a Gaussian approximation. The highly non-Gaussian shape of the Bethe- Heitler distribution is therefore the main reason why the Gaussian sum filter is introduced for track reconstruction.

3.3 Multiple scattering

When a particle pass through material, it can be deflected by the electric field of the nuclei in the material. This effect is called Coulomb scattering, and when the

2This is different from Bethe-Bloch energy loss, which involves a large number of small interactions, such that the total energy loss has a very small spread.

(22)

Figure 3.2: Examples of the Bethe-Heitler distribution for different material thicknesses. Figure from [16].

number of scattering processes in a material is larger then about 20, the process can be treated statistically and is called multiple Coulomb scattering. The PDF describing the angle of deflection has almost the shape of a Gaussian, but with longer tails. Usually a good approximation is to use a Gaussian distribution, with mean zero and standard deviation

θ₀ = 13.6MeV βcp zx₀√

x

1 + 0.038 log_e x

x₀

, (3.3)

whereβc is the speed of the particle,p its momentum, z the charge, and _x^x

0 the length the particle travel in radiation lengths [17].

When track reconstruction is done in ATLAS, multiple scattering is modeled as a Gaussian. It is possible to incorporate these longer tails in the GSF [12], but the gain in accuracy is so small that this is not implemented in the current ATLAS tracking software [7].

(23)

Chapter 4

The ATLAS detector

The ATLAS detector is designed to detect the long lived particles produced by the collisions in the LHC. Many of the particles produced decay immediately to lighter particles, and most of these are measured as they pass trough the detector and deposit energy. However, the neutrinos are not measured as they only interact weakly, and thus have a very small probability of interacting with the material in the detector. The only signature to look for in order to detect if an event contained neutrinos is therefore if the total transverse momentum in the event is unbalanced.

In Figure 4.1 the ATLAS detector is shown, and the multiple sub-detectors are visible. Closest to the interaction point is the inner detector, which is a tracking detector that measures the path of the charged particles passing trough it. Outside the inner detector is a solenoid magnet which gives an axial (z-direction) magnetic field of 2 T, bending the charged particles in a helical path. This enables deter- mination of the momentum of the charged particles from the bending radius. The calorimeters are located outside the inner detector and its magnet system. The purpose of the calorimeters is to stop the particles in order to measure their full energy.

The only interacting particle not stopped by the calorimeters is the muon, as the muon does not have strong interacting and is a minimum ionizing particle. The outermost part of ATLAS is the muon system, consisting of a tracking detector and a toroid magnet which creates a magnetic field bending the muons in a different direction than the inner magnet. The energy of the muons can then be calculated from their momenta.

The sub-detectors havebarrelandendcapparts, the barrel being the cylindrical part surrounding the interaction point, and the endcaps the flat parts closing off the top and bottom of the barrel. This is shown in Figure 4.1(b).

This chapter presents the different parts of the ATLAS detector briefly. The number of hits and the resolution in the inner detector is explained more in detail, as this is important for track reconstruction.

23

(24)

(a) Cross section of the ATLAS detector, including some traversing particles.

(b) Overview of the ATLAS detector with sub-detector names.

Figure 4.1: The ATLAS detector.

(25)

4.1. THE ATLAS COORDINATE SYSTEM 25

4.1 The ATLAS coordinate system

The ATLAS coordinate system is shown in Figure 4.2, and enables description of a position inside the detector. In the Cartesian system the z-axis is defined by the beam axis, the positive x-axis points towards the middle of the LHC ring, and the y- axis points upwards [2]. It is often more convenient to use the angular coordinates (R, φ, θ)defined in the figure, where(R, φ)is the normal polar coordinates in the (x, y)-plane, and the polar angleθis relative to the z-axis.

These angular coordinates are also used when describing momenta. In the inner detector the charged particles are bent in circles in the(R, φ)plane, which is the same as the(x, y)plane. The sum of momenta in this plane(P

pt)is approximately zero due to momentum conservation. A non-zeroP

p_tthus indicates that some particle(s) escaped undetected, such as a neutrino. The initial momentum in the z-direction is not zero, as the collisions are generally not symmetric in this direction, and thus only massive particles can produce decay products with high pt.

z

^(beam)

y

(up)

R

⁼

x

2 + y

2

x

^(LHC^cent^er)

Φ θ

Figure 4.2: The ATLAS coordinate system.

4.2 The inner detector

The inner detector is the innermost tracking detector, and lies inside the inner solenoid which gives an axial magnetic field of 2 T. The purpose of the inner detector is to provide many position measurements (hits), which are used to determine

(26)

the particle trajectories. Figure 4.3 shows the layout of the inner detector, consisting of three sub-detectors. The innermost sub-detector is the pixel detector, which is surrounded by the Semiconductor Tracker (SCT), and outside this is the Transi- tion Radiation Tracker (TRT). Typically the pixel detector provides three hits per particle, the SCT four, and the TRT approximately 36. The outer diameter of the inner detector is 2.1 m and the length is 6.2 m, including endcaps [3]. In general the resolution is much higher in the(R, φ)plane than in the z direction, because bending in this plane is used to determined the transverse momenta of the particles.

Figure 4.3: The layout of the inner detector.

4.2.1 Pixel detector

The pixel detector is a semiconductor tracker, and has the best resolution of the sub-detectors. It consist of three cylinders, and the radius of the cylinders are 4, 10, and 13 cm. There are also five endcap discs on each end [2]. This usually gives three hits for each particle.

The pixel detector is built using pixelated silicon diodes, and due to its cost this type of detector can only be used in this innermost layers where it is critical to get the most exact hits. The accuracy of the detector is 10µm in the(R, φ)plane, and 115µm in the z direction. The thickness of one layer is about 1.7% of a radiation length, where a radiation length is defined as the length an electron has to travel through material in order to on average lose a fraction 1/e of its energy. This means that the particle on average loses very little of its energy when traversing the pixel detector. Because the pixel detector has very high resolution and is close to the interaction point, it is important for finding secondary vertices.

(27)

4.3. THE CALORIMETERS 27 4.2.2 Semiconductor Tracker (SCT)

The SCT sits outside the pixel detector, and is a semiconductor strip detector. It consists of eight detector layers, mounted pairwise with a small (40 mrad) crossing angle. The measurements from the two layers in each pair are then combined to make one space point, hence the SCT normally gives four space points to be used for track reconstruction. The resolution in the SCT barrel is 17µm in the (R, φ) plane, 580µm in z, and a hit separation of at least 200µm is needed to distinguish two tracks [2].

4.2.3 Transition Radiation Tracker (TRT)

TRT is the outermost tracking detector, and is built using straw tube detector tech- nology. A straw tube looks like a big drinking straw, with a wire inside and readout connectors at each end. The inside is metallized, and an electric field is set up with the outer wall acting as a cathode and the wire acting as a anode. The straw is filled with xenon gas, and when a charged particle passes trough the straw, the gas is ionized and the separated electrons and ions drifts in the electric field. This causes a current, which can be read out. Each straw is 4 mm in diameter, and the length is 144 cm or less in the barrel. The TRT barrel mostly provides information in the (R, φ)plane, has an accuracy between 50–130µm, and usually gives 36 hits that can be used for tracking [2].

4.3 The calorimeters

Outside the inner detector and its magnet system the calorimeters are found. The calorimeters measures the energy of the particles by creating an electromagnetic or hadronic cascade. While it is important that the inner detector contains as little material as possible, it is important for a calorimeter to contain enough material to stop the particles, except the muons and neutrinos which are practically unstoppable.

Since hadrons are able transverse more material than electrons and photons before being stopped, the electromagnetic calorimeter (ECAL) is closer to the interaction point than the hadronic calorimeter (HCAL). The energy resolution ∆E/E in a calorimeter is proportional toE⁻¹² [19], leading to the fact that calorimeters are relatively more precise for high energy particles.

4.3.1 The electromagnetic calorimeter (ECAL)

In the ECAL the aim is to stop all light particles which have electromagnetic interaction, meaning electrons, positrons and photons. When these particles hits the ECAL, they loose energy dominantly by bremstralung for electrons and pair- production for photons. In these processes new electrons and photons are created, which again results in more bremsstrahlung and pair production. This is called an

(28)

electromagnetic shower. The ECAL is a sampling calorimeter, which means it consists of several layers of absorbers to create electromagnetic showers with detectors in between. Lead is used to create the electromagnetic showers, and liquid argon is used as the active detector material. As the particles from the shower ionizes the liquid argon, the separated electrons and ions drifts in an electric field that is applied on the detector, which creates a signal. The total thickness of the ECAL is 22 radiation lengths in the barrel, and 24 in the endcaps. This thickness stops most of the electrons, positrons and photons.

4.3.2 The hadronic calorimeter (HCAL)

The main purpose of the HCAL is to measure the energy of hadrons, which are not stopped in the ECAL. Hadrons are bound states of quarks and interacts strongly, and when a hadron hits the HCAL a hadronic shower is created. This happens when the particles hits the nuclei in the material and creates new hadrons via hadronization, which is explained in Chapter 2.1. The HCAL is also a sampling calorimeter, and iron plates are used to create the hadronic showers, while the active detector material between the iron plates are plastic scintillators. When ionizing radiation hits a scintillator, the energy is absorbed and re-emitted as light. To read out the signal a photo-multiplier is used.

4.4 Muon spectrometer

Most muons penetrate all the material in the detector, and therefore the outermost part of ATLAS is the muon spectrometer. This identifies muons and measure they momenta. The muon spectrometer consist of a torroidal magnet system and a tracking detector inside the magnets. This bends the muons around an axis perpendic- ular to the bending axis in the inner detector. From the bending of the tracks and knowledge of the magnetic field, the momentum and energy of the muons can be calculated.

4.5 Trigger system

The ATLAS detector produce an enormous rate of data, and as it is not be possible to process and store all this data, triggers are used to filter for the interesting collisions. When a trigger fires, a message is passed around that this event are to be investigated more closely. There are three trigger levels, and only the events which pass through all three trigger levels are stored for offline analysis. The main goal of the triggers is to select particles and jets with large transverse momenta.

A large transverse momentum indicates that a heavier particle has been produced.

The triggers also look for events with lots of missing transverse momentum, in- dicating neutrinos or other non-detected particles. The first level trigger L1 only use a limited amount of information, and defines some region of interest, which

(29)

4.5. TRIGGER SYSTEM 29 are parts of the detector where interesting features has been seen. The L2 trigger mainly uses these regions for further filtering. The L3 trigger, also called the event filter, does a reconstruction of the event before deciding whether to keep the event.

The trigger system reduces the event rate to approximately 200 Hz, down from the collision rate of maximum 40 MHz.

(30)

(31)

Chapter 5

Track reconstruction

Track reconstruction is the process of translating position measurements (hits) in the detector into estimates of the state vectorxand the trajectory of the charged particles that passes trough the detector. As already defined, the state vectorxis built up of a position 3-vector and a momentum 3-vector. These six terms describes where in space a particle is located, and how it is moving. The goal of track reconstruction is to usually to find an estimate for the state vectorxas close to the perigee as possible, where the perigee is defined as the point on the track that is closest to the collision point when the track is projected into the x-y plane. Track reconstruction is illustrated in Figure 5.1. Figure 5.1(a) shows only the hits in the detector, and from this it is impossible to say anything about the trajectory of the particles. Figure 5.1(b) shows the reconstructed tracks.

(a) Output from the detector, only hits. (b) The reconstructed tracks from the hits.

Figure 5.1: Illustration of track finding and track fitting. Picture from [15].

Usually track reconstruction is divided into track finding and track fitting.

Track finding is to find the set of hits belonging to the same particle, and track fitting is to estimate the particle’s state vectorxfrom the found hits.

31

(32)

5.1 Track finding

The processing of the detector readout yields only a set of hits in the detector, as shown in Figure 5.1(a). The first step of track reconstruction is to collect a subset of hits which is assumed to belong to the same charged particle traversing the detector [10]. This process is called track finding. A common method is to first find a track seed by iterating over hit triplets in the pixel detector, checking if the triplet is valid origin for a track. When a valid triplet is found, the hits are used to estimate the initial track parameters. The track is then propagated through the detector layers using a Kalman filter, which is explained in Section 5.2.2. If enough hits are picked up by the propagated track, these hits are associated to the track and handed to track fitting algorithms, which makes precise estimates of the track parameters.

5.2 Track fitting

Track fitting is the process of estimating the state vectorxof a particle by using the hits assumed to belong to that specific particle. This can be done by different methods. In this section, the Least Square Method (LSM), the Kalman filter, and the Gaussian Sum Filter (GSF), which is an extension of the Kalman filter, is explained. These are some of the common methods for track fitting in ATLAS. Until 2012 the Kalman filter and LSM were the standard methods used by ATLAS, and these two methods gives exactly the same estimates for the state vectorxat the perigee. From 2012, the GSF is the standard method for fitting electron tracks in ATLAS, while the LSM is used for all other particles [7].

5.2.1 Least square method (LSM)

The simplest and most common method for track fitting is the LSM, which is described in [12]. The idea behind this method is to minimize the sum of the squared distances between the measured hits and the estimated trajectory of the particle.

This trajectory can for instance be a straight line or the arc of a circle, and in 3D it may also be a helix. In a detector, the trajectory is a straight line if there is no magnetic field, and if there is a homogeneous magnetic field the trajectory is a helix.

A very simple example in 2D of the LSM is to fit a straight line to measurements. Figure 5.2 shows an example of fitting a straight line to the four measurement points {(1,1), (3,2), (5,5), (8,9)}, all having the same uncertainty in y- direction. The fitted line is the line that minimizes the sum of the squared distances between the measurement and the line.

For track fitting the same idea is applied. First it is assumed that the state vector x0trueis the true state vector at the perigee, the symbolx0is used to indicate a state vector at the perigee. From the state vectorx0, the trajectory and the evolution of the particle’s state vectorxcan be found by using the equations of motion while taking energy loss and bending due to the magnetic field into account. The goal

(33)

5.2. TRACK FITTING 33 y

x Figure 5.2: Simple example of the LSM. The straight line is the line that minimizes the squared distances between the measurement and the line.

is to find the best estimate forx0true, and this estimate is called˜x0. It is assumed that the measurements are made inndifferent measurement planes corresponding to the detector planes.

The vector valued functionzmaps the state vectorx0into all the measurement planes, such thatz(x₀)is the expected measurements ifx₀is the state vector at the perigee, given the description of the geometry, the magnetic field, and the materials in the detector. The length of thez(x0)is`, where`is the number of measurements all together, i.e. the number of measurement in one detector plane multiplied with the number of detector planes. As an example, if there are six detector planes and two dimensional measurements are provided in each plane, thenz(x₀)is a vector with 12 elements giving the expected 12 measurements.

The vectormcontains the measurements, and also has length`. The measurement vectormhas a covariance matrix, which take into account the uncertainties due to multiple scattering, the stochastic nature of energy loss, and measurement errors.

The best estimate ofx˜₀ is found by minimizing the function given in Equation (5.1). This function is the squared difference between the expected measurement

(34)

vectorz(˜x0)and the actual measurement vectorm. The matrixW is the inverse covariance matrix of the measurement vectorm.

M(x) = (m−z(˜x₀))^TW(m−z(˜x₀)) (5.1) It is worth noticing that because the inverse covariance matrix is included, the measurements are weighted. This means that measurements with large confidence contributes more to the final estimate than measurements not trusted that much.

Also position measurements far away from the perigee are weighted down, since the particle has more scattering before these measurements. Finding the state vector˜x₀ minimizing equation (5.1) can be done analytically ifzis a linear function.

The functionz(x₀)is linear ifz0 =Ax0+c, whereAis a matrix andcis a vector.

Ifz(x₀)is linear, the best estimate forx₀_trueis

˜

x0= (A^TWA)⁻¹A^TW·(m−c). (5.2) Ifz(x₀)is not a linear function, as frequently is the case, one has to make a linear approximation such that the function is expressed on the formz₀ =Ax₀+c. This can be done by a first order Taylor expansion around some expansion pointx⁰0.

The LSM is the optimal method for reconstructing tracks, in the sense that it yields unbiased estimates with minimum variance if the following is fulfilled:

• The measurements are unbiased, i.e. the expectation value of the measurement error are zero(hi= 0).

• The functionzis linear.

• All experimental errors are Gaussian distributed. This means the measurement errors are Gaussian distributed, and all errors from energy loss and multiple scattering are Gaussian distributed.

As seen from Equation (5.1), the LSM requires the inverse covariance matrix Wto be found. The covariance matrix has dimension`×`, where`is the number of measurements, and the computing time for inverting matrices grows asO(`³).

If the number of measurements is large, this is a problem.

5.2.2 The Kalman filter

This section presents the Kalman filter [12, 13]. The Kalman filter is important for this thesis, as it is the background for the GSF. Generally the Kalman filter is an alternative to the LSM, giving exactly the same estimates for the state vectorxat the perigee while avoiding the problem of inverting large matrices if the number of measurements is large.

The track model is the mathematical model for how the state vectorxevolves between the detector planes, given the material and the magnetic field. The basic idea of the Kalman filter is to start with an estimate for the state vectorxin one

(35)

5.2. TRACK FITTING 35 detector plane, and use the track model to propagate this estimate to the next detector plane. After propagation, the estimated state vectorxis updated using the measurements in the new detector plane.

When describing the method, it is assumed that the measurements are taken in discrete detector planesk < n, wherenis the total number of detector planes. The definition of all vectors and matrices used is shown in Table 5.1.

Symbol Definition

x_k|j The state vector in the measurement plane k given data from all planes up to j. In Figure 5.3 the state vector is drawn as an arrow.

C_k|j The full covariance matrix in detector planekgiven data from all planes up toj.

f_k The track model from measurement planek−1to k. This function transforms the state vector in measurement planek−1to the state vector in planek.

Fk The linearisation of the function f around detector plane k. The linearisation is done by a first order Taylor expansion. If the track model is linear, this is the same asfk.

δk The process noisefrom plane k−1 to k. This is the uncertainty in the state vector due to the material between measurement planek−1and planek, and includes uncertainty from energy loss and multiple scattering.

Q(δ_k) =Q_k The covariance matrix of the process noise.

Hk The transformation between the state vector and the measurement in plane k. H_k(x_k|j) transforms the state vectorx_k|jinto the expected measurement.

k The measurement errors in planek.

C(_k) =V_k=W⁻¹_k The covariance matrix of the measurement errors in planek.

m_k The measurements in detector layer k.

Table 5.1: Explanation of the different symbols used in the Kalman filter.

The Kalman filter will now be described with the help of Figure 5.3. In this figure the green circles represent hits in the detector, and the arrow illustrates the estimated track parameters. In this simplified model there is only material in the measurement planes, hence material effects like multiple scattering and energy loss happen only here. The starting point is a prediction for the state vectorx0 and the covariance matrix C0 in one plane. This starting point is given from the track finder, and is shown in Figure 5.3(a). The covariance matrix is omitted in the figure in order to make the pictures clearer. The estimate is then propagated to the

(36)

(a) Estimate of x in the first detector layer. (b) Extrapolating the estimate of x.

to the next detector layer

(c) Updating the estimate for x by including the measurement.

(d) Extrapolate again.

(e) Updating again.

Figure 5.3: Cartoon illustrating the Kalman filter

(37)

5.2. TRACK FITTING 37 next detector layer, as shown in Figure 5.3(b). This is called the prediction step, and the predicted states are calculated using Equation (5.3). The index convention is as follows: The first index gives the detector surface, while the last index indicates which detector surfaces has been taken into account, hencexk|k−1 is the prediction of statex_kusing all the measurements up to but not includingm_k. As seen from Equation (5.3), the linearization of the track model is used to transport the covariance matrix from one detector plane to the next.

xk|k−1 =fk(x_k|k−1)

C_k|k−1 =F_kC_k−1|k−1F^T_k +Q_k (5.3) The next step is updating the estimate for the state vectorxby including the measurements in detector surfacek. In the updating, the value ofxminimizing Equation (5.4) is determined.

L(x) = (m_k−H_kx)^TW_k(m_k−H_kx)

+(x_k|k−1−x)^TC⁻¹_k|k−1(x_k|k−1−x) (5.4) It is worth noting that Equation (5.4) is the difference between the predicted measurements and the actual measurement, plus the difference between the updated state and the predicted state. The state vectorxthat minimizes Equation (5.4) is the updated state and is given by

x_k|k =C_k|k h

C⁻¹_k|k−1x_k|k−1+H^T_kW_km_k i

C_k|k = [C⁻¹_k|k−1+H^T_kW_kH_k]⁻¹ . (5.5) The updated state is illustrated in Figure 5.3(c). It is worth noting that the updated statex_k|k is the weighted sum of the measurement and the predicted state. The covariance matrix for the state vector is derived from the expression forx_k|k in Equation (5.5). After some algebra, a more convenient formula for the updating step is obtained, shown in Equation (5.6).

x_k|k=x_k|k−1+K_k(m_k−H_kx_k|k−1)

K_k=C_k|k−1H^T_k(V_k+H_kC_k|k−1H^T_k)⁻¹ (5.6) Ck|k= (I−KkHk)Ck|k−1

This is the expression usually found in literature.

The whole procedure is then repeated, as shown in Figure 5.3(d) and (e), until all the measurement surfaces are taken into account. The state vectorxn|nat the last measurement point takes into account all measurements. In order to use all the information to build the predicted state vector x_k|n in some plane k < n, smoothingcan be used. Smoothing means running the Kalman filter backwards,

(38)

and the formulas for the smoothed states are shown in Equation (5.7).

xk|n=xk|k−Ak(xk+1|k−xk+1|n)

C_k|n=C_k|k−A_k(C_k+1|k−C_k+1|n)A^T_k (5.7) A_k=C_k|kF^T_k+1C⁻¹_k+1|k

When smoothing is done, estimates for the state vectorx_k|n are obtained in all surfaces, and these estimates takes all measurements into account. The estimated state vectorxin one detector surface is equivalent to what one would get by using the LSM to find the estimate in that specific detector surface. This means the Kalman filter with smoothing is the same as doing the LSMntimes, wherenis the number of measurement planes.

Since the Kalman filter is a recursive version of the LSM, the same require- ments are required for the Kalman filter to be optimal as for the LSM. These re- quirements are described in Section 5.2.1.

5.2.3 Gaussian sum filter (GSF)

This section describes the Gaussian sum filter (GSF) [12, 5, 14, 4]. When the GSF is used for track fitting, the result is a track PDF estimating the state vector x.

The GSF is based on the Kalman filter, but takes into account that energy loss, scattering, and measurement errors¹are not described by Gaussian PDFs.

The Kalman filter assumes Gaussian distributed processes. For most particles this is a good enough approximation, but not for electrons. Electrons lose energy predominantly through bremstralung, and this type of energy loss is better modeled by the Bethe-Heitler distribution, shown in Figure 3.2. Approximating the Bethe- Heitler distribution as a Gaussian is a very crude approximation, and GSF therefore approximates it as a weighted sum of Gaussians instead of one single Gaussian.

ThisGaussian mixtureis given as P(z) =

Nbh

X

i=0

g_iN (z;z_i, V_i) , (5.8) where the fractional energy loss z is defined by z ≡ E_f/Ei, Ei is the energy when entering the material, andE_f is the energy after the particle has gone trough the material. The functionN (z;z_i, V_i) is a Gaussian function with meanz_i and varianceVi. The weight of the i-th component isgi. Figure 5.4 illustrates how a sum of two Gaussian functions are added into one function closer to the Bethe- Heitler distribution. In ATLAS, the Gaussian mixture used to approximate the Bethe-Heitler distribution is calculated by minimizing the integral of the absolute difference between Equation (5.8) and the Bethe-Hetiler it is approximating [5].

1In ATLAS scattering and measurement errors are treated as a Gaussian process, therefore the non-Gaussian treatment of these are omitted here.

(39)

5.2. TRACK FITTING 39

E_f/E_i

-2 -1 0 1 2

0 0.5 1 1.5 2 2.5 3 3.5 4

(a) The two Gaussians separately

E_f/E_i

-2 -1 0 1 2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

(b) Sum of the two Gaussians

Figure 5.4: Estimating the Bethe-Heitler distribution as a sum two Gaussians

(40)

In the following, Figure 5.5 will be used to explain the GSF, and how the GSF is closely linked to the Kalman filter. In ATLAS a sum of 6 Gaussians is used to model the Bethe-Heitler distribution in each layer of material, but for simplicity only two Gaussian components are used in this explanation. The idea is to do several extrapolations at the same time, where each of the extrapolations are done in exactly the same way as for the Kalman filter. The energy loss in each of the extrapolations uses one of the components from the Gaussian sum in Equation (5.8) modeling the Bethe-Heitler distribution.

As with the Kalman filter, an initial estimate for the state vector x₀ and covariance matrix C0 is needed, shown in Figure 5.5(a). In this picture one starts with a single estimate for x, but it is also possible to start with an estimate that is a Gaussian mixture. Each of the Gaussian components of Equation (5.8) gives an estimate for the energy loss in the material and hence one propagation to the next detector layer. Each of these new components have a weight corresponding to multiplying the weight of the state vectorxbefore entering the material with the weight of the Gaussian component giving the energy loss. The prediction step is shown in Figure 5.5(b). After the prediction step, the updating step is done for each of the estimates. This is done for each component using Equation (5.3), exactly as with the Kalman filter, and is illustrated in Figure 5.5(c). Also the weights of the different state vectors are updated as

q_k|kⁱ ∝q_k|k−1ⁱ N(m_k;H_kx^j_k|k−1,V_k+H_kC^T_k|k−1). (5.9) This shows that the weight of a component us updated by multiplying with the Gaussian function representing the component, evaluated in the measurement point m_k. The constant of proportionality is such that all the weights in one detector plane sum up to one. Equation (5.9) results in the component being weighted down if it is far away from the measurement, resulting in the estimated trajectory following the measurements more closely. The next step is to again extrapolate each of the estimates to the next layer, using each of the components in the Gaussian sum approximation of the Bethe-Heitler distribution. This is shown in Figure 5.5(d).

The estimates are then again updated, as shown in Figure 5.5(e).

The background for doing the procedure described is that the component entering the material is convoluted with the material effects in the detector layer. There- fore the number of component leaving the material is multiplied with the number of Gaussian components describing the material effects. The convolution of two Gaussian functions is a new Gaussian, and therefore the estimate when leaving the material is a Gaussian mixture.

The method of always propagating several Kalman filters from each estimate quickly leads to an unmanageable number of estimates. When the number of estimates reach a maximum, two estimates that are close to each other are merged together into one Gaussian estimate, preserving their mean. In ATLAS the maximum number of components is set to 12.

The background for this thesis is the non-Gaussian estimates that is the output when the GSF is used for track fitting. The form of the final estimate for the

(41)

5.2. TRACK FITTING 41

(a) Estimate ofxin the first detector layer. (b) Extrapolating the estimate ofx to the next detector layer

(c) Updating the estimate forxby by including the measurement.

(d) Extrapolate again.

(e) Updating again.

Figure 5.5: Cartoon illustrating the GSF.

(42)

state vectorx, taking all measurements into account, is given in Equation (5.10).

This shows that the estimate for the state vector is a weighted sum of multivariate Gaussians.

P x_k|k

=

p

X

i=1

g_iN

x_k|k;xⁱ_k|k,Cⁱ_k|k

(5.10) This is the non-Gaussian track PDF. For ATLAS, this means that the track PDF is a weighted sum of 12 Gaussians, each having a corresponding weight and covariance matrix.

(43)

Chapter 6

A non-Gaussian PDF for the mass estimate

This chapter describes the main goals of and the theory behind the thesis. First, in Section 6.1 the main goals are repeated and explained in more detail. Then in Section 6.2 it is shown how the invariant mass is reconstructed using the track parameters, and how an estimate of the variance of the mass is found using linear error propagation. Section 6.3 describes how the mass PDF is made, and Section 6.4 explains how the mass PDF is analyzed. Section 6.5 describes how the data needed is retrieved.

To start with some reminding and clarification of the notation: the state vector x fully describes where the particle is and how it is moving at a given point in time. It is built up by a position- and a momentum 3-vector, and in this context the most interesting is the momentum 3-vector, which is used to calculate the invariant mass. In the data used a 3-vector is given as q = (φ, θ, q/p), and the symbol q indicates that this form is used. When referring to the Cartesian form of the momenta (p_x, p_y, p_z), the symbol p is used. The ATLAS coordinate system is described in Section 4.1.

6.1 Why use a non-Gaussian mass PDF?

As shown in Chapter 5, the GSF algorithm for track fitting gives a track PDF for the state vectorxof the electrons. The track PDF is a weighted sum of multivariate¹Gaussian estimates forx, as given by Equation (5.10), and each of the single Gaussian estimates building up this sum will be referred to as acomponent. The standard in ATLAS is to have maximum of 12 components, making the track PDF a sum of 12 multivariate Gaussian estimates. This means that for each electron there are 12 estimates for the state vectorxwith 12 corresponding estimates for their covariance matrix, and 12 corresponding weights. The track PDF is then non-

1Non-zero covariance between the track parameters of a single track PDF component.

43

(44)

Gaussian, reflecting the skewed nature of energy loss by bremsstrahlung. When the track PDF is to be used for further analysis, the normal approach is tocollapseit into one single estimate for the state vectorxand one single estimate for its covariance matrix. The standard in ATLAS is to use the mode of the track PDF to find an estimate forx, and to take the weighted mean of the 12 covariance matrices as the single estimate for the covariance matrix [7]. However, by collapsing the track PDF into one single Gaussian estimate, much information about the shape of the track PDF is lost, in particular information about its skewedness. This thesis explores the possibility of using the full track PDFs to make non-Gaussian mass PDFs, and it is checked if these PDFs represents the reconstructed masses in a good way, and if they can be used for improving the mass resolution. When reconstructing the invariant mass, only the 3-momentaqfrom the track PDF is needed.

6.2 Invariant mass reconstruction and linear error prop- agation

In this section it is shown how the invariant mass can be reconstructed if the track parameters for each electron is given as one single Gaussian estimate. This is how the invariant mass is reconstructed by using the collapsed track PDF, and it will later form the basis for making the full mass PDF.

6.2.1 Invariant mass reconstruction

The mass of a decayed mother particle can be reconstructed by considering energy and momentum conservation, if the 4-momenta(E/c, p_x, p_y, p_z)of the decay products is known. The total energy of the decaying particle is (c≡1)

E²=m²₀+p² , (6.1)

wherem0 is the rest mass of the particle andp = (px, py, pz)is its 3-momentum.

The decay will conserve both energy and momentum, and thus the rest mass of the mother particle can be found as

W =p

E²− |p|² , (6.2) where the energyEand momentumpis calculated from the decay products as

E =

n

X

i=1

q

p²_x_i+p²_y_i+p²_z_i and |p|² =p²_x+p²_y+p²_z , (6.3)

(45)

6.2. MASS RECONSTRUCTION 45 and the vector components are given as (particle chargesqi):

p_x=

n

X

i=1

q_i q

p −1

i

cos(φ_i) sin(θ_i)

py =

n

X

i=1

qi

q p

−1 i

sin(φi) sin(θi) (6.4) p_z =

n

X

i=1

q_i q

p −1

i

cos(θ_i).

The sums runs over all the decay products of the mother particle which mass is estimated, hence the number of decay products is n. It should be noted that the mass of the electrons are disregarded, because the total energy of the electrons is much larger than the rest energy. The resultW of Equation (6.2) is also called the invariant mass, which means that the calculation is independent of the reference frame. This is important because the mother particle is usually not at rest in the lab frame, but can be boosted due to asymmetry in the z-component of the colliding partons in the proton-proton collisions.

6.2.2 Linear error propagation

The uncertainty in the reconstructed mass is estimated by transforming the uncertainty of the momentaqof the decay produces into a variance for the mass estimate, and this transformation is done by linear error propagation. The uncertainty in the momentaqof the decay products is given by the covariance matrix

C=







C1,1 C1,2 · · · C1,n

C2,1 C2,2 . .. ...

... . .. C_i,i . .. ...

... . .. ... . .. ...

... . .. Cn−1,n−1 Cn−1,n

Cn,1 · · · Cn,n−1 Cn,n







, (6.5)

where the on-diagonal sub-matrices are given as

C_ii=







σ_φ²

i cov(φ_i, θ_i) cov(φ_i,(q/p)_i) cov(θi, φi) σ²_θ

i cov(θi,(q/p)i) cov((q/p)i, φi) cov(θi,(q/p)i) σ_(q/p)²

i





 , (6.6)

and the off-diagonal sub-matrices areCi,j6=i =0. The number of decay products aren.