Tailoring Deformable Models to Extract Meaningful Metrics and Landmarks from 3D Echocardiograms

125  Download (0)

Full text


Håkon Strand Bølviken

Tailoring Deformable Models to Extract Meaningful Metrics and Landmarks from 3D


Thesis submitted for the degree of Philosophiae Doctor

Department of Informatics

The Faculty of Mathematics and Natural Sciences



Series of dissertations submitted to the

Faculty of Mathematics and Natural Sciences, University of Oslo No. 2598

ISSN 1501-7710

All rights reserved. No part of this publication may be

reproduced or transmitted, in any form or by any means, without permission.

Cover: UiO.

Print production: Graphics Center, University of Oslo.



This thesis is submitted to the Department of Informatics at the University of Oslo as partial fulfillment of the requirements for the degree of Philosophiae Doctor(PhD). The research has been a part of the INIUS project, and is funded as an innovation PhD with a focus on cooperation with private industry, in this case GE Vingmed Ultrasound. The main supervisor was Professor Eigil Samset at GE Vingmed Ultrasound and the Department of Informatics, University of Oslo. Co-supervisors were Fredrik Orderud from 2017 to 2021, Jørn Bersvendsen from 2017 to 2018, Sten Roar Snare from 2018 to the end of the project, and Federico Veronesi from 2021 to the end of the project.


I would like to thank the University of Oslo for financing this PhD, and the Department of Informatics for giving me the chance to pursue this research and for giving me an exciting place to work with many people to talk to and get new ideas from.

I would also like to thank GE Vingmed Ultrasound for the chance to get some experience with work in the industry. The collaboration between academia and industry has been interesting and a great learning experience.

My main advisor Eigil Samset has been a great source of advice and responses on the articles and on providing help with how to best move forwards on the PhD.

His steady hand was helpful in determining what best to do. Fredrik Orderud was helpful in understanding the RCTL software that made the backbone of the PhD and for discussion around the implementation of the changes I made, and how the industrial software pipeline works. Jørn Bersvendsen was helpful in giving me literature, advice and ideas for the early projects, particularly in the beginning of the PhD when I found myself rather lost. Sten Roar Snare was kind enough to join as a supervisor when Jørn left, and helped with the design of the projects, as well as with practical issues with GE software and hardware.

Olivier Gerard was also quite helpful with the design of and article writing on the TEE standard views article.

I would like to thank Pål Brekke at Rikshospitalet for providing clinical input and performing RV contouring.

A particular thanks goes to Federico Veronesi, who has been extremely willing to help throughout the thesis. He was always willing to take time to help with software-issues and project designs. Without him this PhD might not have been finished. He also took over as supervisor at the very end.

Finally, I would like to thank my friends and family, who have been a great source of relief and have stopped me from letting work consume my life these


past years.

Håkon Strand Bølviken Oslo, January 2023



Cardiac ultrasound, or echocardiography, is essential in modern cardiology and is seen as a part of the regular physical examination by many cardiologists. It is cost-effective, safe and relatively easy to use. Real-time 3D echocardiography has emerged as an important method for imaging patients with cardiovascular disease, although perceived as more complex to perform than a regular 2D echo.

With an aging population follows an increase in patients with cardiovascular disease. This puts more strain on hospitals in general, resulting in a demand for tools that can perform accurate analysis of echocardiographic images with ease and automation. One underpinning problem which has relevance for many metrics of cardiac morphology and function is the segmentation of the images.

This thesis is focusing on tailoring dynamic anatomical models of the heart so that they can be accurately fitted to ultrasound images in a way that allows for practical analysis. This was done by the modification and generalization of a well-established method for such applications: a Doo-Sabin subdivision surface with control points updated dynamically using a Kalman filter. This method has the advantage of utilizing both known theory about heart anatomy and movement and details of the observed heart structure from the 3D image. The thesis focuses on three different applications and customizations of this general framework.

The first was to create a more accurate segmentation of the right ventricle by developing a new representation of the subdivision surface, better suited for capturing the complex shape of the right ventricle of the heart. The importance of accurate assessment of the right ventricle has received increased attention. We compared the estimated ventricle volumes to a ground truth yielding satisfactory results.

Second, we developed a method to automatically derive anatomical landmarks and context from transesophageal echocardiographic(TEE) images, to allow for automatic reformation of 3D TEE images into standard 2D views. This would allow clinicians to easily find image planes within a 3D volume for improved visualization and analysis. We were able to demonstrate real-time performance without loss of accuracy compared to other state-of-the-art methods.

Lastly, we developed a comprehensive geometric model that describes all four chambers of the heart, allowing for simultaneous segmentation of all visible chambers, as well as tracking of key anatomical landmarks. The method provided acceptable accuracy compared to single-chamber methods while providing the advantage of multi-chamber tracking segmentation.

Overall, this thesis contributes to the field of echocardiographic segmentation with shape modeling that has practical applications in estimating chamber volumes, ventricular function, chamber-chamber interactions as well as peri-


procedural workflow improvements.


List of Papers

Paper I

Håkon Strand Bølviken, Jørn Bersvendsen, Fredrik Orderud, Sten Roar Stange, Pål Brekke, Eigil Samset “Two Methods for Modified Doo-Sabin Modeling of Non-Smooth Surfaces - Applied to Right Ventricle Modelling”. In: Journal of Medical Imaging. Vol. 7, no. 6 (2020), pp. 1–17. DOI: 10.1117/1.JMI.7.6.067001.

Paper II

Håkon Strand Bølviken, Olivier Gerard, Federico Veronesi, Eigil Samset

“Automatic Alignment of Standard Views for Transesophageal Echocardiographic Images”. In: Journal of Medical Imaging. Vol. 9, no. 5 (2022), DOI:


Paper III

Håkon Strand Bølviken, Federico Veronesi, Eigil Samset “Simultaneous Seg- mentation of all Four Chambers in Cardiac Ultrasound Images”. In: Computer Methods in Biomechanics and Biomedical Engineering Imaging & Visualization.

(2022), DOI: 10.1080/21681163.2022.2073913.



Preface i

Abstract iii

List of Papers v

Contents vii

1 Introduction 1

1.1 Background . . . 1

1.2 Aims of Study . . . 3

1.3 Structure of the Thesis . . . 4

1.4 Context of Study . . . 4

2 Background 7 2.1 Cardiac Shape and Function . . . 7

2.2 Imaging Modalities . . . 9

2.3 Segmentation . . . 13

2.4 Kalman Filters . . . 19

2.5 Segmentation Framework Overview . . . 22

2.6 Comparison of Segmentation Methods . . . 24

2.7 Other Frameworks in Use . . . 25

3 Summary of Presented Works 29 3.1 Two Methods for Modified Doo-Sabin Modeling of Non- Smooth Surfaces - Applied to Right Ventricle Modelling . 29 3.2 Automatic Alignment of Standard Views for Trans- esophageal Echocardiographic Images . . . 31

3.3 Simultaneous segmentation of all four chambers in cardiac ultrasound images . . . 32

4 Discussion 35 4.1 Contributions . . . 35

4.2 Choice of Methods . . . 42

4.3 Validation of Algorithms . . . 42

4.4 Evaluation of the Kalman Filter Approach to Segmentation 43 4.5 Future Improvements and Applications . . . 44

5 Conclusions 47


Bibliography 49

Papers 60

I Two Methods for Modified Doo-Sabin Modeling of Non- Smooth Surfaces - Applied to Right Ventricle Modelling 61

I.1 Introduction . . . 62

I.2 Methods . . . 63

I.3 Results . . . 73

I.4 Discussion . . . 75

I.5 Conclusion . . . 79

I.6 Biographies . . . 80

I.7 Disclosures . . . 80

I.8 acknowledgments . . . 80

Bibliography 81 II Automatic Alignment of Standard Views for Trans- esophageal Echocardiographic Images 83 II.1 Introduction . . . 83

II.2 Materials and Methods . . . 85

II.3 Results . . . 89

II.4 Discussion . . . 91

II.5 Conclusions . . . 93

II.6 Acknowledgements . . . 95

II.7 Conflict of Interest . . . 95

Bibliography 97 III Simultaneous Segmentation of all Four Chambers in Cardiac Ultrasound Images 99 III.1 Introduction . . . 99

III.2 Materials and Methods . . . 100

III.3 Results . . . 105

III.4 Discussion . . . 106

III.5 Conclusions . . . 111

Bibliography 113


Chapter 1


1.1 Background

1.1.1 Medical Context

According to the European Society of Cardiology, around 45 % of European deaths, and 32 % of worldwide deaths, are caused by cardiovascular dis- ease(CVD)[108]. More than 4 million people die from CVD each year, 1.4 million before the age of 75. The most common CVDs are coronary heart disease and strokes. Congenital heart disease[107] is also a big issue, affecting 1 in a hundred newborns. Sixty years ago only 20 % of those with congenital heart disease survived to adulthood, some of the main causes of death being heart failure and endocarditis, which is an inflammation in the endocardium. Thanks to new medical technology the percentage of survivors has now increased to 90


Another type of CVD is valvular dysfunction[81]. Valvular dysfunction can mean scarring or leakage of the valves, leading the heart to have decreased ability to pump blood. Rheumatic heart disease is a major cause of valvular disease in developing countries.

The effects of CVD are not limited to death, but can lead to long-term problems for patients, and it can be a burden for the healthcare systems around the world. Many types of heart problems can be effectively treated and managed if accurate information about the heart can be provided, this is for instance the case for several valve-based heart problems and interventions [78][55]. This means that accurate imaging and analysis of those images are an important field of study in medicine.

1.1.2 Medical Imaging

Medical imaging is a key evaluation method of a patient’s heart. There are many modalities for cardiac imaging, including computed tomography (CT), angiography, positron emission tomography(PET), magnetic resonance imaging (MRI), and ultrasound. MRI uses magnetic fields, and measures the response of

hydrogen atoms to these fields, giving accurate images of the heart [23].

CT uses a series of X-rays from all angles in a circle around the body to construct images. Different tissues will absorb different amounts of X-rays, and by measuring this images can be constructed. The resulting images are highly accurate, but require exposing the patient to radiation.

Angiography is a type of X-ray imaging of the blood vessels, where a dye is injected into the bloodstream to enhance the contrast. This gives detailed images of the vessels.


PET scans involve the injection of a radioactive agent. Depending on the exact agent it will disperse in the body in different ways, where the agent can be traced by measuring the radiation, giving information about, for instance, glucose uptake.

The modalities mentioned above are all very useful in several applications, but they all require heavy equipment, transporting the patient, and several of the methods also exposes the patient to radiation.

Cardiac ultrasound, or echocardiography, is in many cases the first-line imaging modality due to its ease of use, low cost, portability and safety as it does not use radiation. An ultrasound probe can be small enough to carry in your pocket. Ultrasound works by sending sound waves into the body and measuring the echo from when the waves propagate through heterogeneities, as well as the time for the echo to come back. Classically, echocardiography was limited to 2D cross-sections of the heart, but in the last decades real-time 3D echocardiography has become a viable tool. Ultrasound does, however, suffer from lower image quality than MRI or CT. In addition, there is typically more noise and limits related to the acoustic window. It also requires a skilled operator to place the probe in optimal positions.

There are several types of probes used for acquiring ultrasound images.

Transthoracic echocardiography(TTE) is the typical method, where a probe is put on the chest. This is an easy way of acquiring an echocardiographic image, but there can be issues with the ribs limiting the field of view and fat which can limit the penetration of the ultrasound waves. Transesophageal echocardiography(TEE) is a different form of ultrasound cardiovascular imaging [47] where a probe is passed into the patient’s esophagus, and ultrasound images are recorded from behind the heart. This has several advantages in terms of unobstructed acquisition, but there are fewer methods available for the automatic identification and delineation of structures using TEE images. TEE is also more uncomfortable for the patient.

2D echocardiography has advantages in terms of image resolution. 3D images are in theory ideal for capturing the complex structure of the heart, and tools for analyzing these images give the clinician the necessary information on how the heart is functioning and its morphology.

1.1.3 Medical Image Processing in Echo

A central problem in medical image processing is image segmentation. In the context of echocardiographic imaging, segmentation refers to determining the geometrical delineation of structures like ventricles or valves in the image. The output is often a model describing the structure. The resulting model can be used for a variety of purposes, like measuring the volume of the structure or locating important landmarks. Studies on segmentation have mainly focused on the left ventricle due to its critical role as the organ responsible for pumping blood throughout the body[82]. However, recently more attention has been given to the right ventricle(RV)[34] and for a complete analysis of heart function all chambers need to be evaluated [113] [92][37]. The left and right atriums have


Aims of Study also been segmented, often together with the left or right ventricles[26][49][109].

A few reports describe all chambers being segmented together [90], [117][116][54].

Many approaches exist to segment the heart chambers, including atlas-based segmentation[8] and machine learning approaches like U-Nets[96]. One of the approaches, which is the basis for the methods presented in this paper, is a Kalman filter-based algorithm first described by Orderud.[86]. This approach gives a very general framework and can be adjusted for different use cases. The Kalman filter allows the researcher to input knowledge about the system directly into the design, but typically requires an initial guess, and the quality of the final segmentation will depend on the quality of the starting guess. The problem of finding a good starting guess is open and can be a difficult problem if there is a large spread in the possible true values. Depending on the filter and its initial values, a Kalman filter can output unnatural values.

1.2 Aims of Study

The main goals of this study were to improve workflow and diagnostic accuracy in 3D echocardiography by detecting landmarks and clinical measures. In particular, the discovery of methods for accurate segmentation of the right ventricle, for determining the position of the mitral valve and aortic outflow tract in TEE images, and for the simultaneous segmentation of all four cardiac chambers were the focus.

The main proposition of this thesis is that the framework described by Orderud and Bersvendsen can be applied to new practical applications by expanding on the framework and inventing new variations of it. These expansions will be justified by studying the practical applications where they can be of use.

This study will expand on the model to allow it to function in a larger variety of cases. These expansions include changing the deformable model that is fitted to the image in order to allow it to adapt to new segmentation problems. The expansions also include changing the algorithm that fits the model to the image, allowing it to properly fit the model in cases where the regular algorithm would have problems. This will be useful for the field, both by providing new tools for image analysis in general, and in addition, providing results for the specific cases allows for comparisons between the algorithm used in this thesis and other tools.

For these purposes, several modifications were made to the algorithm originally created by Orderud and Bersvendsen. One of these modifications was to change the algorithm to be better able to model sharp corners. While most of the heart has smooth, rounded corners, there are parts of the right ventricle that has sharp edges, and a surface model better suited for sharpness could improve accuracy in those areas and in general allow for the model to segment a larger variety of shapes. This required a novel change to the Doo-Sabin algorithm[32], expanding the possible shapes it, and the algorithm framework in general, can take on.

In order to make the use and analysis of TEE images easier, an algorithm that could automatically find the salient features of the image, in particular


the mitral valve and the aortic outflow tract, was created. This was used to construct a standard view of the image. A standard view of cardiac images is the standard positioning of 2D slices through a 3D cardiac image so that the slices are showing the structures of interest in a standard way to the reader. However, there are difficulties with TEE images, the biggest being the large variation in where the aorta is in the image. The traditional Kalman filter would struggle with this sort of large variation in possible values, so creating a modified Kalman filter capable of searching through the large number of possible aorta placements was one of the aims.

One of the aims of this PhD was to create a four-chamber model capable of segmenting the chambers simultaneously. This would allow for a more complete analysis of the heart, as well as to allow for the study of the dynamic interactions between the chambers.

In summary, this thesis aims to:

•Allow for a sharp, accurate version of the RV and test this on real images.

•Create an algorithm for automatic reformation of standard views for 3D TEE images by finding the aorta and mitral valve.

• Create a full four-chamber model of the heart capable of accurately segmenting the 4 chambers in an ultrasound image with minimal input from the user.

1.3 Structure of the Thesis

The thesis will begin with an introduction, which will cover a brief explanation of the heart and its function, and some modalities for imaging the heart. It will also make a brief comparison of those modalities to ultrasound, the main modality of the rest of the thesis. After that, several methods of segmenting ultrasound images will be described, and attention will be given to segmentation frameworks in use in commercial products today. We will also cover what we believe are the strengths and weaknesses of these frameworks.

We will then describe each of the papers of the thesis. These papers all have the commonality that the framework by Orderud and Bersvendsen is applied to new use cases, specifically finding landmarks and volumetric metrics in echocardiographic images. This is done by modifying and expanding the framework’s capabilities. Comparisons of the results to other literature will be made, and the value of the contributions of this thesis will be discussed based on that.

1.4 Context of Study

This work has been a part of the INIUS(Intelligent Interventional Ultrasound Scanner) project at the University of Oslo, an innovation cluster focused on creating new tools for intelligent image-guided treatments. It was an innovation PhD, meaning cooperation between industry and academia for the purpose of inventing new methods to solve problems in society. The focus was on applied


Context of Study research that is both academically relevant and can be commercialized. Due to this, the main focus of the PhD has been clinically applicable solutions rather than theoretical contributions.

One-fourth of the PhD was working at GE Healthcare. This consisted of fixing bugs in their software and integrating novel algorithms described in this thesis with their commercial software. The sharp Doo-Sabin models have been integrated into the software, as have other parts of the algorithms described in the thesis. GE software has been the basis of the algorithms used in this thesis, and it has also been for validation in two papers.


Chapter 2


This chapter will give a short introduction to the topics that are important background for this thesis. These topics include an introduction to the heart and its function, ultrasound imaging of the heart, and segmentation of cardiac ultrasound images. A discussion of state-of-the-art and existing research along with a discussion of current methods is also given.

2.1 Cardiac Shape and Function

The heart is an important muscular organ whose role is to pump blood throughout the body. The heart consists of four chambers, the left ventricle (LV), the right ventricle(RV), the left atrium (LA) and the right atrium(RA). In addition, there are several valves for the purpose of preventing the blood from flowing backward.

The left and right atriums are connected to their respective ventricles through valves which makes the blood-flow only happen in one direction. The ventricles are separated by a wall called the septum. Figure 2.1 shows a diagram of the heart portraying the basic anatomy.

The walls of the heart are made up of three layers: the endocardium, which is in direct contact with the blood, the myocardium, which is the muscle that performs the main contraction and the pericardium, which is a tissue surrounding the heart. The myocardium is considered to be nearly incompressible[105].

Figure 2.1: Diagram of the heart. Used under Creative Commons Attribution Share Alike 3.0, courtesy of ZooFari.


Figure 2.2: Ultrasound image of the RV extracted from a 3D image. Parts of the LV can be seen to the right in the image. The RV wraps around the LV, giving it a crescent shape from this angle. This angle is what you would see when viewing from the apex of the right ventricle, known as a short-axis view.

2.1.1 The Left and Right Ventricles

The left and right ventricles are quite different in form and function. The left ventricle has a shape like a bullet, with a flat base and a contracting apex. It is fairly symmetric around the axis between the base and the apex. The right ventricle has a more complex geometry that wraps around the left ventricle and gives a crescent-looking shape [100] [34]. Figure 2.2 shows an ultrasound image of the RV.

The right heart pumps de-oxygenated blood to the lungs so it can take up oxygen, and the left heart pumps oxygen-filled blood through the body. The pumping happens in two stages: the systole and the diastole. Systole is the stage where the ventricles contract, meaning blood is pumped out of the ventricles.

The point at which the volume is lowest is called end systole. Diastole is when blood flows into the ventricles from the atriums, increasing the ventricle volume.

The point at which the volume is highest is called end diastole.

Since the left ventricle is the chamber pumping blood throughout the body, LV disease is often more serious than in the RV [14] [46] [82]. Due to this, the LV has historically gotten the most attention, but the clinical importance of the RV size and function as a predictor for cardiac disease has become increasingly understood in recent years[34]. RV performance has been shown to have prognostic and therapeutic consequences in a variety of heart diseases, from arrhythmogenic cardiomyopathy to pulmonary hypertension and left ventricular failure[75] [46]. Because of this, the amount of research on the RV is increasing.

2.1.2 Measuring Cardiac Function

Examinations of cardiac function usually involve evaluating the overall ability of the heart to pump blood, called global function.For the ventricles, two common measures of global function are the stroke volume and ejection fraction.


Imaging Modalities Labeling the end diastolic volume, defined as the maximum volume during a cardiac cycle, as EDV and the end systolic volume, or the minimum volume during a cardiac cycle, as ESV, the stroke volume(SV) is EDV-ESV, giving the total amount of blood pumped out of the heart. Ejection fraction(EF) is the percentage of the maximum volume of the chamber that is pumped out. In short:


EF = 100 SV EDV

Ejection fraction is considered a good metric for global function, as many heart diseases lead to lowered heart contraction [63].

2.2 Imaging Modalities

Imaging modalities refer to the different ways to produce an image of a patient’s heart. Cardiac imaging is a vital means of gaining information about the patient’s heart function and is often necessary for an appropriate diagnosis.

Several modalities exist, including magnetic resonance imaging(MRI), computed tomography(CT), positron emission tomography(PET) and echocardiography, also called cardiac ultrasound[91][104].

This thesis focuses on the analysis of cardiac ultrasound images, but this section is dedicated to giving an overview of the different imaging modalities used for cardiac imaging.

2.2.1 Magnetic Resonance Imaging

MRI works by creating a magnetic field that reacts with the magnetic fields of individual hydrogen atom cores, meaning protons.[58] This forces the hydrogen atoms to align with the magnetic field. When a varying radiofrequency current is sent through the body, the hydrogen atoms get excited and once that field is turned off, the hydrogen atoms return to their previous states and emit a signal that can be detected. Seeing as the human body contains a lot of water and tissue which contains hydrogen, this allows for separating between different types of tissue and the construction of images[104]. The signal depends on the type of tissue the hydrogen atom is in.

Images produced through MRI typically have great resolution, and the process is safe for the patient. However, MRI is costly and can be problematic for patients with metal, like pacemakers, in them. It also requires the patient to be put into an MRI machine.

Cardiac Magnetic Resonance Imaging(CMR) has increased in popularity as a research subject in the past decades, with the number of accepted abstracts increasing more than 700 % from 1998 to 2018 [65]. CMR is used for a variety of purposes, including inflammatory heart diseases and congenital heart disease. In


addition, a number of novel imaging techniques have been developed, expanding the clinical uses[61].

An example of these techniques is 4D Flow CMR [33], a method for measuring blood flow and flow velocity. This gives additional information to the clinician.

The typical resolution for this method is between 1.5 and 3 mm, and acquisition time is typically between 5 and 25 minutes. This is useful for blood flow analysis, like determining regions of high or low blood velocity.

2.2.2 X-Rays

X-rays are a form of electromagnetic radiation with a wavelength of less than 0.0001 mm, so small it can partially move through the body. As X-rays pass through the body, they get absorbed by the body tissue, and on the other side a detector can measure how much radiation was absorbed in total along each straight line from the source of the X-ray. X-rays react differently to different types of tissues, bones and air, but a limitation of traditional X-ray scans is that they give a 2D projection and do not give an indication of the depth different tissues are at[104]. It also exposes the target to ionizing radiation.

Computed tomography uses a rotating X-ray source around the patient with detectors to capture the radiation after it has passed through the patient. This can then be used to mathematically construct a 3D image with a very high degree of accuracy. CT equipment has to be stationary and it is expensive to use.

In addition, it exposes the body to more ionizing radiation than regular X-rays.

CT has been established as a useful tool for identifying people with or at risk of having coronary heart disease. It can also evaluate coronary calcium scores and calcified plaque[16], but the patient’s radiation exposure must be taken into account[15].

Angiography [51] is a form of X-ray imaging where dye is injected into the bloodstream, which creates greater contrast in the image. This makes it easier to study details of the arteries.

2.2.3 Positron Emission Tomography

During a Positron Emission Tomography(PET) examination, a radioactive agent is injected into the patient. A tracer in the agent determines how the liquid moves in the body, for instance if it is taken up by tissues. The radiation from the agent is measured, and a PET image is constructed. This means that PET only shows the location of the radioactive material and another image, like CT, is often used at the same time in order to provide context about where the organs are in relation to the radioactive material[104].

PET is highly versatile depending on what tracer is used. If the tracer is taken up by tissues, it might show regions of high metabolism. A different use case is to measure myocardial blood flow, that is to say, blood flow through heart muscles[59].

The Positron in Positron emission tomography scans refers to a particular type of radiation, but similar methods are used with other types of radiation.


Imaging Modalities

Figure 2.3: Ultrasound image showing most of the four heart chambers.

Radionuclide ventriculography(RVG) [97] uses a radioactive tracer in the blood to determine cardiac functionality, but it emits gamma radiation.

2.2.4 Cardiac Ultrasound

Medical ultrasound is a diagnostic imaging modality that functions by transmitting high-frequency ultrasound pulses from a transducer through the body, and then interpreting the echo received by the transducer[104]. Different organs, blood and other tissue give different echoes, which means that the returning echo can be transformed into an image showing the tissue. This is called echocardiography when applied to the heart. Figure 2.3 shows an example of an ultrasound image.

When acquiring data, the ultrasound beam is focused in a single direction, and the image is created by measuring the amount of echo reflected back to the transducer over time. This process is then repeated for every direction in the desired image region. This image formation process is limited by the speed of sound, as the transducer has to wait for the wave to propagate to the end of the imaging sector and back, in order to receive the echoes. If a different signal was sent out before that, it would be difficult to differentiate which signal any echo comes from. This means that there is an inverse correlation between the number of beams fired, i.e. the spatial resolution and the number of images that can be generated per time. 3D images typically suffer from this, as there are more directions to measure due to the extra dimension. However, 3D images can give more accurate information about ventricle volumes and other features.

This is because 3D captures the entire ventricle, instead of one or several slices, which leaves gaps where we do not know what the ventricle is like. According to the European Association of Cardiovascular Imaging 3D images provide more accurate and reproducible data when estimating LV function [62]. The American Society of Echocardiography and the European Association of Cardiovascular Imaging recommend 3D images for volumetric measurements of the LV in adult patients when the image quality is good and such images are available[63].


Echocardiography can take advantage of the Doppler effect to gain more information. The Doppler effect is that sound waves, when they hit a moving object, will have their wavelength changed according to the velocity of the object.

By measuring the change in wavelength it is possible to determine that velocity and so gain information about, for instance, blood flow. Note that this method only measures the speed in the direction the sound wave is traveling, movements normal to that direction would not be measured. Transducer Types

The transducer is the tool that sends out the ultrasound and receives the echo.

The transducer is typically outside of the body and placed on the chest. The image resulting from this is called a Transthoracic Echocardiogram(TTE). From this position, there can be an issue with the ribs, lungs or sternum obscuring parts of the heart.

In some cases, a specialized transducer is instead sent down to the esophagus, where it takes a Transesophageal Echocardiogram(TEE). This allows for an image taken from a shorter distance, unobstructed by the ribs and other organs.

TEE images are typically clearer and can show objects that TTE images cannot and are used in cases where TTE images do not give satisfactory results. 3D TEE is recommended as a reliable and possibly preferred method for measurements related to the aortic annulus by the American Society of Echocardiography and the European Association of Cardiovascular Imaging[63]. As the probe is passed down the esophagus, this process is more invasive. It also takes a longer time to perform.

2.2.5 Comparison of Ultrasound to other Modalities

Medical ultrasound is often the first-line modality, due to its ease of use, low cost and safety. There is no radiation that could be a risk to the patient, and the transducer can be small enough to carry around, instead of having the patient be put in a huge machine. In addition, ultrasound images can be constructed in real-time.

Medical ultrasound does, however, have weaknesses in terms of image quality.

The resolution can be low for images with a long depth. This includes temporal resolution, which in general is worse for 3D than 2D echocardiographic images.

Ultrasound images can also be blocked by the bones or other tissue, meaning that some parts of the body can be difficult to depict. The right ventricle of the heart is an example of a region of the heart that is often partially blocked by the rib cage when using TTE images[89]. Noise and artifacts can also make the evaluation of an ultrasound image difficult.

A comprehensive comparison of how applicable different modalities are for different use cases can be found in an article by Doherty et al.[31]

Echocardiography scores the highest on all use cases related to an initial evaluation of asymptomatic patients and for many cases related to initial evaluations of patients with symptoms of heart disease. It also scored highly on


Segmentation several other uses, including guidance during a left atrial appendage occlusion operation, where TEE imaging is preferred. However, CT and CMT are generally evaluated better on follow-up testing to clarify the initial tests.

2.3 Segmentation

Segmentation is an important method for analyzing an ultrasound image of the heart. Segmentation of an image in general means separating an image into distinct parts by labeling the pixels. An image captured by one of the imaging modalities described in Section 2.2 only consists of pixels or voxels, either 2D or 3D blocks with a number representing the lightness of that block. To extract meaning from the pixels or voxels can be a difficult process, and the focus is both on the accuracy of that extraction and how long time it takes, as running in real-time is a necessity for several applications. In this thesis, we will focus on delineating ventricles, atriums, valves or the aorta.

There are many methods for cardiac segmentation, including atlas-based segmentation [8], Hough transforms [25] and machine learning approaches like U-Nets[96]. A review of all the methods for cardiac segmentation is outside the scope of this thesis, but this section will detail some of the techniques for echocardiographic segmentation that are in use. This includes deformable methods, which is the method the research in this thesis focus on. Noble et al.[83], Mazaheri et al. [72] and Chen et al.[23] provide reviews of various methods for echocardiographic segmentation.

2.3.1 Deformable Models

Deformable surfaces are a popular method for segmentation. While several approaches exist, in general it starts with having a mathematical model for the structure to be segmented. This consists of a general shape model and a deformation space that details how this model can be changed based on input.

The image that is to be segmented has to be analyzed in order to determine the best fit of the model to the image. This can consist of, for instance, checking for strong gradients, meaning seeing where there is a big change in the intensity.

Finally, the mathematical model is changed based on the image analysis using a fitting algorithm. Many algorithms can be used for fitting, including Kalman filters[57] and deep learning algorithms[23].

The advantage of deformable models is that it is able to incorporate strong priors. The general shape, which parts can change and which cannot, size, topology and other properties can be incorporated. Known facts about the structure can in this way be incorporated into the model, which is an advantage if the image quality is poor. On the other hand, it does mean that the model can have problems in atypical cases if the model is too limited and restricted.

There are many types of deformable models in use [76]. The simplest type of model would be point clouds, where the model is simply a series of points without a topology or connection between them. A deformable model of this


type would be very limited, as you typically want a more explicit definition of the segmented shape. Simplex models provide the most simple realistic model:

points with a simple topology, connecting them. The model deformation would be done by moving the points. One version is the triangular mesh, where all faces of the model are triangular.

Another type, implicit surfaces, are the solution to an equation, typically the zero-set to a function. Level sets, as described in Section 2.3.5, is an example of this type of model.

For this thesis, parametric surface models, also called explicit surfaces, were used. Parametric surface models cover a large variety of different models, many of which have been used for ultrasound segmentation, like B-splines [1] or subdivision surfaces [87]. In general, a parametric surface model maps a point u∈R2 to a pointp∈R3 by a function

p=f(u) = (fx(u), fy(u), fz(u)) (2.1) A subset of parametric surface models are basis-surfaces, where the surface is constructed from control vertices qi and basis-functions bi:

f(u) =X


bi(u)qi (2.2)

An advantage to parametric surface models is that a smooth surface can be generated with relatively few degrees of freedom, creating a robust model. By adding control vertices more degrees of freedom can be added where needed.

The basis functions usually have finite support, meaning that only a few of the nodes are used in the calculation of any given surface point. A change in a control vertex therefore only changes a local region instead of the entire model.

B-splines and subdivision algorithms like Doo-Sabin [32] or Catmull-Clark [20] are examples of basis surfaces. For this thesis, Doo-Sabin surfaces were used. They have an advantage over B-splines since they can use a generic topology, while B-splines are limited to topologies made of squares. Compared to Catmull-Clark, the Doo-Sabin algorithm is slightly faster, while still producing high-quality meshes [110].

2.3.2 Region Growing Methods

Region growing is a form of segmentation starting from a seed point in the image, a single pixel that is an input into the method. The method then expands from the seed point, selecting all neighbouring pixels or voxels that fit certain criteria, typically a certain similarity with the seed, and then continues this process for all new neighbours until the region has grown all it can. For cardiac segmentation, this process could for instance segment the LV by selecting a point in the LV and then selecting nearby voxels of similar value. Region growing is part of a method used for LV segmentation in CMR images by Ghelich et al.[40]. Once the final region has been established, a surface can be fitted to the border, for instance a B-spline.


Segmentation The advantage of this method is its simplicity and ability to adapt to unusual surfaces, having no suppositions of the final region. This can also be a disadvantage, however. Another problem is the need for a seed point to start from, which must be given by another algorithm or the user. For modalities with a lot of noise and the possibility for shadows, like echocardiography, this segmentation method is probably not suitable. Noise could easily create regions that are marked wrong by the method, and the walls might not be clear enough for the region to stop at the proper place.

2.3.3 Atlas-Based Models

Atlas segmentation[53] is based on comparing the image that is to be segmented to a known image, called an atlas, which has already been segmented. A function is made to translate between the image and the atlas and using that the segmentation on the atlas can be translated to the image. This does require that the segmentation of the atlas is as good as possible, and the construction of a translation can be difficult, especially if the image and the atlas are somewhat dissimilar.

Today, multi-atlas segmentation is the preferred method[53] [59] and it has been used for segmentation of ultrasound images of the LV [109][101]. Multi-atlas segmentation is an expansion on the atlas method, where numerous atlases exist.

This means that the method can better capture variation in how an image should be segmented. It also means that the deformation between the atlas and the image should be smaller, making the translation easier. However, it can be difficult to get enough atlases to get satisfactory results in all cases, and atlas generation can be costly. In addition, the algorithm must have access to the atlases all the time.

In multi-atlas segmentation, the image is typically scored on how similar it is to each atlas, for instance by spatial distance between marked landmarks or cross-correlation. The most similar atlas is used to segment the image. It is also possible to let each pixel or voxel be segmented individually, by having it evaluated on all atlases and then doing majority voting for what class it belongs to.

The translation, or registration, between atlas and image can be difficult, and typically makes use of a deformation model which specifies what sorts of deformation are available, an objective function that measures how good the deformation is at translating between atlas and image, and an optimizer which uses the objective function to optimize the deformation.

The main advantage of this method is that it can directly draw on expert knowledge by translating previous segmentations into new knowledge. However, a library of atlases large enough to cover all cases is needed, and creating this is a time-consuming and expensive process.


2.3.4 Active Contours

Active contours [48], also called snakes, are a method for segmentation where a contour, consisting of several nodes and a topology between them, is fitted to an image by minimizing an energy expression. The nodes of the contour are movable, and change in order to minimize the energy, which consists of two parts, internal and external energy. The internal energy comes from the shape of the snake and typically optimizes for smooth curves and other properties that are desirable. The external energy depends on the image and optimizes for a good fit to the desired feature. The balance between these two factors should in theory give a final snake that fits the image well and has nice properties.

An optimization algorithm changes the snake until it converges on a minimum energy.

Much of the accuracy of active contours depends on how good the energy equations are at determining the proper shape and excluding improper ones. It also needs an initial value, and there is a possibility for local minima so that the algorithm fails to converge.

Active contours have been used for LV segmentation. Wen et al.[35] argue that 2D active contours can be used in conjunction with temporal data. 3D active contours might take a lot of computations and lack robustness, as a lot of nodes would be needed for accurate segmentation.

2.3.5 Level Sets

Level sets[21] have the model be defined as the zero set of a function f, so the model C is defined as:

C=x∈RN|f(x) = 0

Starting from an initial shapeC0, the model adapts to the image using a similar method to the energy minimization described in Section 2.3.4. A forceF is defined asF=Fi+Fe, whereFi is the internal force, enforcing regularity and other properties of the model. Fe is the external force and is calculated from the image in a way that should force the model to adapt to the image. Updating f is typically done using a differential equation, Charnoz et al.[21] expressed it in their paper as:


δt =F∥f

This method has the advantage over active contours in that topology can change during fitting, as the zero set has no set topology. However, determining a proper starting function and a force that gives the correct results can be challenging. In addition, the framework above works for continuous functions, but the pixelation of the image requires a workaround to make it work with the differential equation. Solving the differential equation requires some numerical methods.

Level sets have been used for segmenting echocardiographic images. Qin et al.[93] segmented the RV using level sets, with a mean Hausdorff error of 6.86 mm.



2.3.6 Machine Learning

Machine learning is a broad field of study. In the most general case, a machine learning algorithm refers to an algorithm that learns a task by adapting to the data provided. This could, for instance, refer to a machine learning algorithm learning to classify the subjects of images by getting a number of images and an answer key for the correct class and changing the internal values that determine the algorithm until it is capable of properly classifying the provided data. The process of changing the algorithm to better suit the task at hand is called training the model. Hopefully, the algorithm would after training be capable of correctly classifying new images it has not seen before.

There are a large amount of different machine learning algorithms, from decision trees and random forests to Bayesian networks. For image analysis, deep learning methods are very common, and this includes cardiac image segmentation[23].

Deep learning models consist of an input layer, an output layer, and several hidden layers that take the outputs from previous layers, manipulate them in some ways, and then send them to the next layer. These manipulations can to some degree be changed during the training of the model.

In general, the advantage of machine learning models is that they are not dependent on human knowledge, they learn from the data and can find correlations and connections humans have not. The main disadvantages are that huge amounts of data might be needed, and that the black-box nature of many machine learning algorithms makes it difficult to understand how the algorithm is making its decisions. It is also not guaranteed that the algorithm will learn what it needs to in order to function properly. Low-quality training data, for instance data that does not represent the real data very well, would also lead to bad models. Convolutional Neural Networks

Of the deep learning models, convolutional neural networks(CNN) are especially common for image analysis. In short, CNNs make use of convolution layers, where a convolution kernel is applied to the image, leading to a matrix of values where each value corresponds, in a way depending on the specific kernel, to one small part of the image.

For instance, a convolution kernel could be designed to find straight horizontal lines. The convolution kernel would be a 3x3 matrix and the kernel would be convoluted with each 3x3 pixel part of the input image. The result of each convolution would be a value suggesting whether that part of the image contained a straight line. This information would then be passed to the next layer, and there could be other convolution layers checking the image for other things. In an ML implementation, each of the convolution kernels would be determined as part of the training process.

The convolutional layers mean that CNNs can understand what features are present in local parts of the image, what parts of an image are important, and


understand correlations between different parts of the image. U-net[96] is a CNN that has found success in echocardiography[64]. It makes use of upsampling to increase the resolution of the image before convolution. Recurrent Neural Networks

Recurrent neural networks(RNN) are deep learning models characterized by having a memory, previous states are remembered and used in future analysis.

For the purpose of image analysis, this means the network will analyze one image, and then remember that analysis for the analysis of the next image. This is useful if there is a sequence of images that should be analyzed, and there is a strong correlation between images. This is the case for 2D+t or 3D+t echocardiographic images, where the analysis of the previous frame will help the accuracy of the next frame, as described by Chen[23]. Long short-term memory(LSTM) is one of the most popular architectures for RNN[50]. Active Shape Models

Active Shape Models(ASM)[24] are machine learning methods using a deformable model and are somewhat similar to active contours described in Section 2.3.4. A model is created with movable points and is then fitted to the training data by moving the points to proper positions. From this, a mean shape and a covariance matrix for the points can be constructed, and the largest eigenvalues are selected.

These vectors correspond to the most important variation in the model based on the training data. This gives a model

X = ¯X+V b

where X is the model for a given case, ¯X is the mean model,V is a variation matrix depending on the eigenvalues, and bis a vector that changes the model from case to case. The fitting algorithm for new data changes b according to information gathered from the data by for instance edge detection. It has been used by Gheni et al.[41] for segmenting the aortic valve, but in general it can be difficult to use due to the amount of work needed to construct training data. Generative Adverserial Networks

Generative Adverserial Networks(GAN)[42] is a technique where two ML models are pitted against each other. Originally designed for generating data that looks like the training data, one model, the generator, attempts to create fake versions of the data, while the other, the discriminator, attempts to separate the output of the first model from training data. They take it in turns to be trained, and the idea is that both models will steadily improve until the fake data is indistinguishable from the real data. These models do sometimes have problems with converging and can fail in their training, but they have been used for segmentation by replacing the generator model with a segmentation network,


Kalman Filters where the discriminator compares the output to a ground truth segmentation[23].

The segmentation network is then trained to create more plausible segmentations that the discriminator cannot separate from the real segmentations. Challenges with Deep Learning for 3D Echocardiography Segmentation

There are several challenges for using deep learning on 3D ultrasound images.

There is a high noise-to signal ratio, shadows from bones and other organs and analysis of a 3D image requires a more complex neural network.

One way to reduce the dimension problem is to make several 2D slices from the 3D image and use them as input. The 2D segmentation outputs can then be refined into 3D.[23]. This does, however, mean some loss of information.

Another possibility is to shape priors or shape constraints. This can be done through the use of auto-encoders[60]. Auto-encoders are a type deep learning algorithm that translates the data to a lower dimensional space in order to reduce dimensionality, while trying to keep as much of the important information as possible. For this purpose, the auto-encoder would try to retain the information that is most important for segmentation, while reducing the complexity of the image. Then a deep learning algorithm would segment the simplified representation of the image. This approach was invented by Oktay et al.[85] and tested on both CMR and echocardiographic images.

2.3.7 Hough Transforms

The Hough transform[77] is an algorithm for finding certain shapes, like lines or circles. This is done by a voting procedure in a parameter space. The image is searched for any location that could be part of the feature that is searched for. If, for instance, the algorithm is looking for a white circle in a black space, any light region could be part of the circle. In the parameter space, all values corresponding to a circle with centre and radius so that the circle would go through the point is given a vote. This process is repeated for the entire image, and the output is the centre and radius that got the largest amount of votes.

This process can be changed to account for the circle not being perfect.

Hough transforms, and especially the generalized Hough transform[9], can be useful for finding structures in images where there are other objects and where there is noise and deformation of the structure. However, they are typically slow. For echocardiography, Hough transforms were used for detecting the mitral valve[25]. Hough transforms are typically too slow for real-time use.This is especially true for 2D+t or 3D+t images since the segmentation of one frame can not be used in the segmentation of the next.

2.4 Kalman Filters

The Kalman filter [57] is an algorithm for estimating state variables in a system.

This means to estimate a value, such as position, size or speed for a changeable


system like a boat moving at sea. The Kalman filter could be said to be a method using both a theoretical understanding of the system and actual data acquired from the system to give an estimate of the state vector. It is useful when there is uncertainty associated both with the theoretical model and the data. It has applications in many fields, like navigation, control of manufacturing processes and prediction of the paths of celestial objects[43], and is also the fitting algorithm used in this thesis. The Kalman filter is a recursive estimator, meaning that only the estimated state from the previous time step and the current observations are used to compute the current state estimate.

The Kalman filter models the state vector at time k as:

xk =Fkxk−1+Bkuk+wk, (2.3) where

xk is the real state vector at time k

Fk models how the state changes between time steps

uk is the user input

Bk is a conversion of user input to the state vector

wk is a noise vector, modeled assumed to be a multivariate normal distribution with mean 0 and variance Qk.

The observed values the user measures from the system at time k are modeled as

zk=Hkxk+vk (2.4)


zk is the measurements at time k

Hk is the observation model, which transforms the state into observations

vk is a noise vector, modeled assumed to be a multivariate normal distribution with mean 0 and variance Rk.

Under the assumption that this accurately models the system, the Kalman filter estimates state vectorxk in a two-step process: prediction and updating.

The prediction state estimate is based on the state estimate at the previous time step, being changed based on theoretical predictions of what changes should occur over a time step. The prediction stage state vector is written as ˆxk|k−1. The update is based on the prediction state vector and uses the current measurements to create the final estimate ˆxk|k. In addition, the filter outputs a covariance matrixPk|k giving an estimated uncertainty of ˆxk|k. Figure 2.4 shows a diagram of the Kalman filter process.

Here follows the details of the prediction and updating:




Pk|k−1=FkPk−1|k−1FkT +Qk


Kalman Filters

Figure 2.4: Diagram of the standard Kalman filter. Diagram by Petteri Aimonen, distributed under Creative Commons CC0 1.0 Universal Public Domain Dedication



yk=zkHkxˆk|k−1 Sk =HkPk|k−1HkT +Rk

Kk =Pk|k−1HkTSk−1 ˆ

xk|k= ˆxk|k−1+Kky˜k Pk|k= (IKkHk)Pk|k−1

If the process and measurements covariances are known, the Kalman filter is the best possible linear estimator in a least-squares sense [52]. For situations when the process cannot be assumed to be linear around each time step, a modified version known as the extended Kalman filter can be used.

2.4.1 Extended Kalman Filter

The Kalman filter assumes that the state transition and observation models are linear. In cases where this is not a reasonable approximation, an extended Kalman filter can be used [86] [11]. The extended filter assumes that the models are differentiable instead of linear. This means thatFk andHk are Jacobian matrices taken around the state vector, updated at each time step. Aside from this, the algorithm works as before.

Due to the cyclical nature of cardiac movements, extended Kalman filters seem to be a better fit for the projects in this thesis, and are the ones used in all papers.


Figure 2.5: The first two iterations of the Doo-Sabin algorithm. Note that the blue vertices in the first figure are replaced by polygons with as many corners as the vertex was connected to and that the yellow edges are replaced by yellow four-sided faces. The red faces are replaced by red faces of the same shape, but smaller. Reproduced with permission, courtesy of Fredrik Orderud.

2.5 Segmentation Framework Overview

The segmentation used in this thesis combines parametric shape models with Kalman filters to fit the shape model to an ultrasound image. This is based on the works of Orderud[86] who created and applied this technique to the left ventricle. Further research was done by Snare [103] on applications for pocket-size transducers, by Dikici [28] on improving the edge detection used as measurements in the algorithm, and by Bersvendsen [11] on widening the uses to the aorta and RV, as well as biventricular models. This general framework is currently in use at GE Healthcare, and this section will give an overview of how it works.

2.5.1 Model

All articles in this thesis use a type of parametric shape model called a Doo-Sabin subdvision model[32]. The Doo-Sabin surface is a generalization of quadratic B-splines. It has similarities to the Catmull-Clark model[20], but has some advantages in terms of computation.

LetV0be a grid of vertices forming faces(simple 2D polygons in 3D space).

A series of new, finer grids are created by constructing new verticesVn and faces based on the previousVn−1. There are three steps to this process:

•any face will be replaced by a smaller face with the same number of edges.

This new face is called an F-face.

•any edge will be replaced by a four-sided face, in a sense fattening the edge into a face. This new face is called an E-face.

•any vertex will be replaced by a face with the same number of edges as the valence of the original vertex. For instance, in a cube, each vertex is connected to three others. Thus, each vertex is replaced by a triangle-shaped face. This new face is called a V-face. An example of the first two iterations of a Doo-Sabin process is illustrated in Figure 2.5.


Segmentation Framework Overview Each iteration of this process creates a finer grid of vertices, arriving at a limit surface when iterated an infinite number of times. A useful property of the Doo-Sabin model is that the only information needed to determine the surface is the coordinates of the vertices and the topology between them. The exact placement of each new vertex is then determined by a subdivision matrix, turning the placement of new vertices into a linear process. The subdivision matrixS is determined locally depending on topology and is multiplied by the array of local verticesVn−1and the output gives the local verticesVn of the new grid.

However, in practical applications the limit surface can be approximated by doing one step of the process and then using an analytic method rather than going through repeated steps. This process, as well as more information on the subdivision matrix, has been detailed by Orderud and Rabben[87].

Briefly, for any point on the surface, the surface is subdivided until the point can be evaluated using the basis functions for B-splines in two dimensions. This is possible when the topology around a node consists of four faces, where at least three of the faces are four-sided. This is known as the regular case. As the subdivision of any edge results in a four-sided face, any local area will be in such a case after at most two subdivisions. If the point to be evaluated is in one of the four-sided faces, evaluation can be done directly. If not, further subdivision can be done until the point is in one, or else slightly moved so that it is in one without much of a change in value. The process of subdivision is accomplished by repeated multiplication of the subdivision matrix. Multiplying out the subdivision matrix and vertice array multiplication gives an equation like a parametric surface model 2.2.

The model is made to let the control vertices be deformable, allowing for changes in the shape. In addition, global parameters allowing for a change in scale, position and rotation are used. This allows for better fitting to the ultrasound image.

2.5.2 Edge Detection

Segmentation of echocardiographic images typically involves finding an edge in the image, like the boundary between the blood and the ventricle tissue.

Because of this, analysis of the image is done in terms of edge detection. As running a detection algorithm on all of the image would take a lot of computing time, detection is done locally around the model’s current position. This means that the edge detection, like the Kalman filter, assumes that the initial model placement is not completely wrong.

Typically the model has edge detectors placed around the model surface. At each time step, the edge detectors measure the image in the normal direction from the surface, for instance 2 cm in each direction, giving an array of intensities.

An algorithm then chooses which of the positions in the array is the most likely correct feature in the image. Once this has been done for all edge detectors, the measurements are used as input in the Kalman filter’s update stage.


The edge detection algorithm can change depending on application, but a typical case is the step edge detection described by Dikici and Orderud[29]. The algorithm assumes that there are two plateaus of intensity, meaning that one part will have low intensity and one part high. The algorithm, under this assumption, estimates what position Lof the K possibilities in arrayI is the most likely border between the plateaus using the following equation:





|( 1 k+ 1








|( 1 Kk−1




Ij)−It|) (2.5)

2.5.3 Segmentation

The Kalman filter is used to estimate model placement in the image. There are three steps to the process: prediction, measuring and updating. Prediction consists of slightly regressing the model towards the initial value, assumed to be a decent estimate.

The measuring is described in Section 2.5.2. For simplicity’s sake, these measurements are assumed to be independent. The measurements are then used to update the Kalman filter.

2.5.4 Initialization of Kalman Filter

The Kalman filter is a recursive method where each new output is based on the previous. This means that the initial value of the filter is important, as a bad initial guess could mean that the filter will never converge on the real values.

In situations where the images are relatively similar, meaning that for instance the left ventricle is at roughly the same position in every acquisition, a common initialization can be applied and the algorithm can be fully automatic. If this is not possible, user input may be used to get a good initial state vector, or an initialization algorithm where a good starting value is found can be applied, as has been done by Linderoth et al.[67] and Snare et al. [102].

2.6 Comparison of Segmentation Methods

The different segmentation methods described in Section 2.3 have different strengths and weaknesses. Machine learning models can be very effective for image segmentation if proper training data is available. Convolutional networks especially give the algorithm the ability to understand what is happening in small parts of the image if it is trained properly. However, training data is often needed in both amounts and with quality that can make it difficult to make models for new applications, which would have been an issue for several of the papers in this thesis. Active shape models especially have this problem, needing samples where the shape used in the model has been fitted to the image,




Related subjects :