Dendritic spine segmentation using a convolutional neural network

(1)

T reba ll F ina l de G rau

GRAU D’ENGINYERIA ELECTRÒNICA INDUSTRIAL I AUTOMÀTICA

Dendritic spine segmentation using a convolutional neural network

TOMASZ MICHAL POLACZYK

Tutor

Dra. Yolanda González Cid

Escola Politècnica Superior

Universitat de les Illes Balears

(2)

(3)

C ^ONTENTS

Contents i

List of Figures iii

List of Tables v

Acronyms vii

Summary ix

1 Introduction 1

1.1 Context . . . 1

1.2 Image Processing . . . 2

1.3 Machine Learning . . . 3

1.4 Objectives. . . 4

2 Implementation 5 2.1 Overview . . . 5

2.1.1 Training phase . . . 5

2.1.2 Testing phase . . . 6

2.2 Data . . . 6

2.3 Spine extraction . . . 7

2.4 Network . . . 10

2.4.1 The building blocks . . . 10

2.4.2 3D U-Net . . . 12

2.5 Validation. . . 16

2.6 Software and Hardware . . . 17

3 Results 19 3.1 Training . . . 19

3.2 Predictions . . . 20

3.3 Analysis . . . 22

3.3.1 Model 2 Validation . . . 25

3.3.2 Spines from Pred1 . . . 28

3.3.3 Spines from GT . . . 34

4 Conclusions and future work 41 4.1 Future work. . . 41

(4)

A Appendix - Source code 43

Bibliography 71

Sources of external figures . . . 72

(5)

L ^{IST OF} F ^IGURES

1.1 Structure of a neuron . . . 1

1.2 Example of image segmentation. Left: original image, right: segmented image. 2 1.3 Simple neural network . . . 3

2.1 Training phase . . . 5

2.2 Testing phase . . . 6

2.3 Example image of a dendrite (left) and its ground truth (right) . . . 7

2.4 Histograms of the dimensions of the spines . . . 8

2.5 2D slice of spine and ground truth after resizing (left) and zero-padding (right) 9 2.6 Convolution with 3x3 filter size, 4x4 input size (blue) and 2x2 output size (cyan). Source: [1] . . . 11

2.7 Convolution with 3x3 filter size, 5x5 input size (blue) and 5x5 output size (cyan) using zero-padding. Source: [1] . . . 11

2.8 Deconvolution with filter size 3x3. The 3x3 input image (blue) is upsampled, and then a convolution is applied to it. The resulting 6x6 image (cyan) is shown above. Source: [1] . . . 12

2.9 Structure of the 3D U-Net. . . 13

2.10 ReLU function. The negative values are set to zero. . . 14

2.11 Sigmoid function. The output is capped to the [0, 1] interval.. . . 15

2.12 Demonstration of how the loss function affects the model. From top to bottom: original image, prediction of binary crossentropy model, prediction of dice coefficient model, ground truth. . . 15

2.13 Confusion matrix. . . 16

3.1 Training and validation loss over epochs . . . 19

3.2 Example spine. From top to bottom: original image, prediction, ground truth. 20 3.3 Inferno colormap . . . 21

3.4 Example of how the model is capable of distinguishing spines from dendrites. 21 3.5 Example spine. From top to bottom: original image, prediction, ground truth. . . 21

3.6 Example spine . . . 22

3.7 Example spine . . . 22

3.8 Example comparison of prediction vs ground truth. Green: ground truth, blue: prediction. In the image on the right, white is TP, green is FN, blue is FP and white is TN. . . 23

3.9 Barplots of the data in table 3.2 . . . 24

3.10 Barplot of the data in table 3.2 for model 2 validation . . . 25

(6)

3.11 Pred2 vs GT (validation phase). The training data is shown as a reference

(PPV_t and TPR_t). . . 26

3.12 Pred2 vs GT (validation phase). The histograms show the validation data. . 27

3.13 Pred1 vs GT (spines from Pred1) . . . 28

3.17 Pred2 vs Pred1 (spines from Pred1) . . . 32

3.18 Pred2 vs Pred1 (spines from Pred1) . . . 33

3.19 Pred1 vs GT (spines from GT) . . . 34

3.23 Pred2 vs Pred1 (spines from GT). . . 38

3.24 Pred2 vs Pred1 (spines from GT). . . 39

(7)

L ^{IST OF} T ^ABLES

2.1 Layers in an encode step . . . 13

2.2 Layers in a decode step . . . 14

2.3 Used python libraries and versions . . . 17

2.4 Hardware specifications . . . 17

3.1 Confusion matrix of validation spines (Model 2) . . . 23

3.2 Comparison of sensitivity and precision of the various models . . . 24

(8)

(9)

A ^CRONYMS

NN Neural Network

CNN Convolutional Neural Network GT ground truth

TP True Positive TN True Negative FP False Positive FN False Negative TPR True Positive Rate PPV Positive Predicted Value

(10)

(11)

S ^UMMARY

We design and train a convolutional neural network with the task of segmenting dendritic spines from grayscale 3D images. The prediction of this neural network is compared against another neural network. Using both neural networks in combination yields a better prediction.

(12)

(13)

C

HAPTER

1

I NTRODUCTION

1.1 Context

The dendrites are a tree-like structure found in a neuron, and they are the part of the neuron which receives signals from other neurons.

Figure 1.1: Structure of a neuron

In some types of neurons, the dendrites contain spines which play an important role in the functionality of the neuron. The spines help to isolate signals, which in- creases the neuron’s connectivity, and also serve as memory storage. The spine density is an important metric since a low spine density is correlated with a wide range of neuropsychiatric disorders, including autism spectrum disorders, schizophrenia and Alzheimer’s disease [2].

Understanding the complex functionality of the brain is an area of active research, so there exist initiatives like the Human Brain Project which aims to accelerate the progress in the fields of neuroscience, computing, and brain-related medicine.

(14)

Automatic dendritic spine detection and analysis has been a topic of interest in the last decade.

There have been many attempts to automate the task of analyzing dendritic spines [3] [4] . Some of them are semi-automatic tools which still require a high amount of interaction from a human. The approaches range from using 2D and 3D images, using sophisticated algorithms and machine learning.

At present, the most common tool to segment dendritic spines is the commercial IMARIS one [5]. However, it lacks from a precise detection of the real shape of the spines.

1.2 Image Processing

Image processing is the analysis and manipulation of images with the purpose of quality enhancement or information extraction.

Images can be represented as a matrix of intensity values. A 2D grayscale image can be represented with a 2D matrix, where each element represents the intensity of that pixel, usually 0 is black and 1 is white. Similarly, 3D images can be represented with a 3D matrix where each element is a voxel (a 3D pixel).

Image classification is the process of determining to which category does an image belong, and assigning the corresponding label to the image as a whole.

Image segmentation is an extension of image classification, it consists of assigning a label to each individual pixel of the image. The result is that the different types of objects present in the image are identified and separated.

Figure 1.2: Example of image segmentation. Left: original image, right: segmented image.

Figure1.2is an example of binary segmentation: there are only two labels:back- ground(black) andcat(white).

Digital image processing is the use of computer algorithms to perform image processing. In the last five years there have been many breakthroughs in the fields of classification and segmentation thanks to the use of machine learning and neural networks.

(15)

1.3. Machine Learning

1.3 Machine Learning

Machine learning is a subset of artificial intelligence which uses data examples to improve its performance on a specific task. The use of machine learning can help to solve problems for which there are no known algorithms, as the machine can learn an algorithm on its own.

When the data contains both the input and the expected output, it islabeleddata, and the learning process is considered supervised learning. Otherwise, when the provided data does not contain the expected output, it is considered unsupervised learning. It is much harder to obtain labeled data (with both input and output) because most of the time the labeling process is done manually by humans.

One of the main methods for efficient machine learning is the use of artificial neural networks. These networks, inspired by biological brains, consist of artificial neurons which are highly interconnected, and each connection has an adjustable weight. The signal strength is represented as a real number. The data travels from the input layer through the network, and is transformed at each layer. The output of the neural network is also a set of real numbers. The learning (or training) process consists of finding the optimal weights which minimize the error of the output.

Figure 1.3: Simple neural network

Figure1.3shows an example of a neural network, it is a feed-forward neural network because each layer is only connected to the immediately previous and next layers.

When the network has multiple hidden layers, it can be called a deep neural network.

Deep learning is the use of deep neural networks for machine learning. In the last years

(16)

it has been proved superior to shallow neural networks.

Convolutional Neural Networks (CNNs) are deep, feed-forward neural networks which have been performing very well in the areas of image segmentation and classification [6], [7].

1.4 Objectives

The main objectives of this final project are:

• Study the newest breakthroughs in machine learning for image processing.

• Design aCNNcapable of identifying dendritic spines from 3D images.

• Train thisCNNto achieve a good performance.

• Validate it against anotherCNN, using the output of that network to find potential spines.

• Compare the results of the two models.

(17)

C

HAPTER

2

I MPLEMENTATION

2.1 Overview

There are two neural networks that are going to be used to obtain a precise shape of the segmented dendritic spines. Model 1 is an already trained Neural Network (NN) and is able to segment dendritic spines from a 3D grayscale image. Model 2 is theNN described and developed in this project.

The implementation of Model 2 can be divided into two phases:

2.1.1 Training phase

Figure 2.1: Training phase

The neural network needs labeled data for the training process. This data was manually labeled, meaning that the expected output was indicated by a human.

During the training phase the spines are extracted from the label, which is also called ground truth (GT).

The spines and their corresponding labels are fed into the network during the training process. The output of the network is a prediction of the probability of each

(18)

voxel being a spine. There is one prediction for each spine, and the prediction has the same resolution as the spines.

2.1.2 Testing phase

Figure 2.2: Testing phase

During the testing phase instead of using the labels to extract the spines, we use the prediction from another neural network (Model 1) to localize the spines. Then the parts of the image that are possibly spines are run through the Model 2 which will make a new, independent, prediction. Model 2 does not know anything about the ground truth or the prediction of the Model 1, it makes a new prediction.

2.2 Data

When using a Deep Learning approach it is very important to have enough data. Our dataset consists of 58 grayscale 3D images of dendrites from a mouse brain. 50 of them are used for training the Model 2, and the remaining 8 are used for testing against Model 1. There are in total of 3820 spines in the first 50 images, and 959 spines in the 8 images used for testing.

The 3D images are stored in nii format. As a preprocessing step, they are normalized to a [0, 1] range.

The data is structured in folders and each folder contains two files: the 3D grayscale image and the corresponding 3D ground truth. The ground truth shows where are the spines, which is the expected output of our network. The ground truth is essential to train the network: the network makes a prediction which is compared to the ground truth, and the weights of all the neurons are slowly adjusted to reduce the difference between the prediction and the ground truth.

After training, the model should work given only the grayscale image.

These images have a 1024x1024x112 resolution.

Since we are working with 3D images, it can be hard to visualize them. But we can just look at one slice, which will be a 2D image.

In figure2.3we can see one slice of an image of a dendrite, and the corresponding ground truth. Each of the white spots is one spine, a single dendrite can have several dozens of spines.

The U-Net Neural Network, which is the architecture selected for Model 2 and will be described in section2.4, needs a constant input size. The data images all have the

(19)

2.3. Spine extraction

Figure 2.3: Example image of a dendrite (left) and its ground truth (right)

same size but they are too large. Working with 1024x1024x112 images would require a very high amount of memory, we can not do that. One alternative is to split the images into patches, and work with that smaller patches, make a separate prediction for each patch and then merge them all. That is the solution used by the Model 1, so in the Model 2 we will be using another approach: spine extraction. We will not train the network using the entire images, we will only use the spines. Since the spines are relatively small, we can process entire spines at a time.

2.3 Spine extraction

The spines are extracted using the ground truth. The 3D bounding box is a rectangular cuboid whose faces are parallel to the axis, known by the name of axis-aligned bounding box. To allow for more context, we also add some marginm=1. This means that a 10x10x10 spine will result in a 12x12x12 image, because of the margin.

But since the U-Net needs a constant input size, we must find an optimal input size, and all the spines must be transformed to that size.

To find the optimal input size, we analyze all the spines from the training data.

In figure2.4we can see the number of occurrences of the dimensions (x,y,z) of the bounding box of the spines. There are some outliers in thexandydimensions, so they are capped to 100, this is why there is a small peak atx=100 andy=100. There is also a peak atx=1,y=1,z=1, because some spines are very small. Most of the spines are aroundx=10,y=10,z=5, but there is no clear cut as the spines keep appearing until x=70,y=85,z=23.

Given the nature ofCNNs, the input size should be divisible by 2ⁿ, withnbeing the depth of the network. In our case the depth is 5, so the input size should be divisible by 32. Therefore, to maximize the capacity of the model while maintaining a sane memory usage, we will choose an input size of 64x64x32. The z dimension is smaller because

(20)

Figure 2.4: Histograms of the dimensions of the spines

(21)

2.3. Spine extraction

the raw images have a 1024x1024x112 resolution, so most spines are smaller in the z dimension.

Out of the 4019 spines found in the data, 11 of them were too large (with more than 100 voxels in any dimension), 188 were too small (with volume < 8, usually only 1x1x1), which leaves a total of 3820 spines.

But we need an input size of exactly 64x64x32, so there are two options:

• Resize the spines to the desired resolution.

• Zero padding: add black voxels around the spine.

Figure 2.5: 2D slice of spine and ground truth after resizing (left) and zero-padding (right)

Figure2.5shows a comparison of both methods. The original spine size is 16x14x4, but we add a margin of 1 to each side, making it 18x16x6. We only display one 2D slice of the 3D image, for simplicity.

When resizing the image, new voxels must be created, using interpolation. Figure 2.5resizes the image from 18x16 to 64x64 using a cubic interpolation, but there are many other methods. The biggest problem with resizing is that we must also resize the ground truth, therefore losing information about the border of the spine. We obviously also lose the size information, and spines with non symmetric sizes will be greatly distorted.

These problems don’t exist with zero-padding, where we only add black pixels around the image and avoid distorting the spine. One limitation of zero-padding is that we must crop the spines which are bigger than the input size, while they could have been resized.

Therefore, we choose zero-padding because it doesn’t modify the image, and conserves a perfectly valid ground truth.

In deep learning it is pretty usual to use data augmentation, which is a series of techniques to create new data examples from the existing ones. For example, mirroring

(22)

an image is a form of data augmentation. This can be helpful when the dataset is very small, but it is not necessary in our case. Therefore, no data augmentation was used.

2.4 Network

The selected network is based on the U-Net [8] , a deep convolutional network designed for biomedical image segmentation.

As explained by the paper authors, “The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.”

On the contracting path, at each layer the number of features is doubled and the resolution is halved. This allows to efficiently capture features, but because of the loss in resolution, an expanding path is necessary to enable a precise segmentation. The U-Net was designed to work with 2D images, so it must be modified to work with 3D images.

2.4.1 The building blocks

The U-Net uses three types of operations:

• Convolution

• Max Pooling

• Deconvolution

A convolution is an operation that transforms an image into a new one according to a convolution kernel. The kernel (or filter) is a matrix, in this case 3x3, which indicates the weights of each pixel from the old image in the new image. The convolution is applied pixel-wise, and the center of the kernel is the weight of that pixel, while the other values are the weights of the neighbor pixels. See figures2.6and2.7for a visual explanation. For example, the following filter (known as the Sobel operator) will detect horizontal lines in an image.

K=





−1 −2 −1

0 0 0

1 2 1





But the magic part of neural networks is that we don’t need to put this kernel anywhere, the network will learn this and many more kernels on its own. The parameters of all the kernels present in aCNNare learnable, they change when trying to improve the accuracy of the network. This means that in theory the network will find the optimal kernels for its specific task.

The original U-Net uses avalidpadding strategy, which means that each convolution loses one border pixel. For example, in figure2.6an 4x4 input is transformed into 2x2. Similarly, a 64x64 image would be converted into a 62x62 image.

This strategy results in many limitations, so in our model it was replaced with zero-padding, which conserves the input size. See figure2.7for a visual representation.

(23)

2.4. Network

Figure 2.6: Convolution with 3x3 filter size, 4x4 input size (blue) and 2x2 output size (cyan). Source: [1]

Figure 2.7: Convolution with 3x3 filter size, 5x5 input size (blue) and 5x5 output size (cyan) using zero-padding. Source: [1]

The other operations, max pooling and deconvolution, are used to halve and double the image resolution, respectively. Max pooling finds the maximum value in each 2x2 non-overlapping slice of the image:

A=







1 3 5 7

8 6 4 2

1 2 3 4

5 5 7 9





 B=

·8 7 5 9

¸

The input imageA, represented as a 4x4 matrix, is transformed into a 2x2 matrixB. The reverse operation where each pixel is repeated in order to fill a 2x2 area is known as upsampling:

C=

·8 7 5 9

¸ D=







8 8 7 7

5 5 9 9







Since upsampling alone isn’t very effective, in the U-Net a deconvolution is used to increase the resolution. A deconvolution, also known as up-convolution or transposed convolution, is equivalent to an upsampling followed by a convolution. It can be used to obtain more detailed results than those obtained with a simple upsampling. And

(24)

since it is a convolution after all, it also will be trained to produce the optimal result.

See figure2.8for an example of deconvolution which doubles the input size.

Figure 2.8: Deconvolution with filter size 3x3. The 3x3 input image (blue) is upsampled, and then a convolution is applied to it. The resulting 6x6 image (cyan) is shown above.

Source: [1]

2.4.2 3D U-Net

The modified 3D U-Net works on 3D images, by replacing all the 2D operations with the 3D equivalents. It useszero-padding, because the input images already usezero- padding. Other changes include the addition of batch normalization [9] and dropout [10] layers.

Batch normalization is the major improvement over the original U-Net. It is a layer which scales the outputs of the previous layer so that the mean is zero and the variance is one. This speeds up the training process by allowing a greater learning rate, reduces overfitting, provides a small regularization, an eliminates the vanishing gradient problem. It basically makes the layers more independent of each other

Overfitting happens when the network memorizes the training data instead of learning the features. An overfitted model will perform very well on the training data, but very poor on validation data (on new data that is has not seen before).

Dropout prevents overfitting. When combined with batch normalization, the effect is less notable, but the result of using batch normalization and dropout together is better than using only batch normalization or only dropout.

The resulting architecture, when using an input size of64x64x32,depth=5 and num_base_features=32is shown on figure2.9. This particular configuration has about 2 million trainable parameters.

The input, a grayscale 3D image, is shown on the top left. Then a series of operations are applied to the input image, represented by the arrows. Finally, on the top right, we have the output: a segmentation map, an image with the same resolution that the input, but instead of showing an image it shows the probability of each voxel being a spine.

The resolution displayed on each layer is the resolutionbeforethe maxpool, because that’s the resolution of the features (channels). This means that there are 32 channels with size 64x64x32, and 512 channels with size 4x4x2. On the decode path, the channels from below are first upsampled to double resolution and then combined with the channels from the image on the left, which has the same resolution as the current

(25)

2.4. Network

Figure 2.9: Structure of the 3D U-Net

image. As an example, the 256-channel 8x8x4 image is created by combining the 512- channel 4x4x2 (upsampled to a 256-channel 8x8x4) with the 256-channel 8x8x4 from the left, creating a 512-channel 8x8x4 image. Then two convolutions are applied to this image reducing the number of channels to 256. There are mainly 4 types of operations, explained later on.

One encode step is the sequential combination of the layers listed in table2.1:

Table 2.1: Layers in an encode step 3x3x3 Convolution

Batch Normalization Dropout

ReLU Activation 3x3x3 Convolution Batch Normalization ReLU Activation 2x2x2 Max Pooling

The activation function used is the Rectified Linear Unit (ReLU). It is defined as:

ReLU(x)=max(0,x) Figure2.10shows the graph of this function.

The last operation, maxpooling, is the one that reduces the resolution in half.

One decode step is the sequential combination of the layers listed in table2.2:

(26)

Figure 2.10: ReLU function. The negative values are set to zero.

Table 2.2: Layers in a decode step 3x3x3 Deconvolution ReLU Activation Concatenate 3x3x3 Convolution Batch Normalization Dropout

ReLU Activation 3x3x3 Convolution Batch Normalization ReLU Activation

The first operation, a deconvolution, also known as transposed convolution, doubles the resolution of the image. The following step, a concatenation, doubles the number of features (channels) by including the images from a previous layer. This concatenation is represented by the horizontal arrows in figure2.9. The remaining operations are similar to the ones in the encode step.

The last layer of the network is a 1x1x1 convolution + sigmoid.

A 1x1x1 convolution acts as a fully connected layer. It merges all the channels of the image (32 in our case) intonchannels, withnbeing the number of labels. n=1 because we only have one label:spine, because thebackgroundis implicit.

The sigmoid is another activation function with the property that the output will always be between 0 and 1. ReLUs do not have that property, which is useful because we represent the intensity in the images as a real number between 0 and 1.

S(x)= 1

1+e⁻^x = e^x e^x+1

Besides of the architecture, it is also important to mention the training process. We use the Adam optimizer with an initial learning rate of 10⁻⁴. The learning scheduler will reduce the learning rate when the validation loss has not improved for 10 epochs.

The batch size was chosen to be 2 images, mainly because of memory constrains.

The loss function is binary crossentropy, defined as:

(27)

2.4. Network

Figure 2.11: Sigmoid function. The output is capped to the [0, 1] interval.

Hy⁰(y) := −X

i

(y⁰_ilog(yi)+(1−y_i⁰) log(1−yi))

whereyiis the predicted probability value for sampleiandy⁰_iis the true probability for that sample. In the case of 3D images, a sample is a voxel. When the prediction equals the true value, the loss is 0. Otherwise, the loss grows to positive infinity.

This loss function was chosen in favor of Dice coefficient loss, because even though the latter allows multi-class predictions (with multiple labels) and leads to more well- defined predictions, binary crossentropy was found to perform slightly better. See figure2.12for a visual comparison.

Figure 2.12: Demonstration of how the loss function affects the model. From top to bottom: original image, prediction of binary crossentropy model, prediction of dice coefficient model, ground truth.

Weight initialization from a normal distribution withµ=0 andσ=p

2/N, which is the optimal for this kinds of networks [11], the same is used in the original U-Net.

(28)

2.5 Validation

The validation consists of two independent steps:

• First, the 3D U-Net is trained and validated using the ground truth data.

• Next, the trained 3D U-Net is validated using the prediction of Model 1 as the label. In this phase the spines will be extracted from the prediction, which means that there are going to be some false positives. The goal of the Model 2 is to identify these false positives, as well as improving the accuracy of the prediction.

The metrics used in the validation are obtained from a confusion matrix. This matrix is created by comparing the prediction with the real values, and counting the number of matches for each case. Since it is a binary metric: a voxel must be either 0 or 1, but the prediction is a probability, so it can be anywhere between 0 and 1. Therefore, the prediction must be binarized. The binarization method is a simple thresholding witht=0.5. When comparing 3D images, we work on each voxel, so the confusion matrix shows the number of voxels.

Predicted value

0 1

Actual value 0 True Negatives False Positives 1 False Negatives True Positives Figure 2.13: Confusion matrix

As shown in figure2.13, the confusion matrix shows correct guesses: True Negative (TN) (expected 0, predicted 0) and True Positive (TP) (expected 1, predicted 1). And it also shows the errors: False Positive (FP) (expected 0, predicted 1) and False Negative (FN) (expected 1, predicted 0).

The following metrics are used to evaluate the model:

• Sensitivity, or True Positive Rate (TPR):

T P R= T P T P+F N

• Precision, or Positive Predicted Value (PPV):

P PV= T P T P+F P

Both metrics should ideally be close to 1. A high sensitivity indicates that there are few false negatives, while a high precision indicates that there are few false positives.

In other words, a sensible model will detect the spine and its surroundings, without missing any part of the spine, but misidentifying some dendrites as spines. A precise model will only detect spines as spines, but it may miss some parts of them.

(29)

2.6. Software and Hardware

2.6 Software and Hardware

The entire project was implemented using the Python programming language. Using the Keras library [12], with the Tensorflow backend.

The directory structure is as follows:

# Training spines:

data/data/image1/spine.nii.gz data/data/image1/truth.nii.gz

# Testing spines:

data/new/data/prediction/image1/spine.nii.gz data/new/data/prediction/image1/truth.nii.gz data/new/data/prediction/image1/prediction.nii.gz

# Trained model 2:

results/unet_t.hdf5

# Predicions:

results/predict/

# Scripts:

draw_figures.py

draw_hist_from_csv.py evaluate_spines.py example_dendrite.py nidata.py

unet.py

The source code is available in the appendix.

A complete list of libraries and versions is available in table2.3.

Table 2.3: Used python libraries and versions

Name Version

python 3.6.5

keras 2.2.0

numpy 1.14.5

scikit-image 0.13.1

scipy 1.1.0

matplotlib 2.2.2 nibabel 2.3.0

The hardware used for executing most of the code was a PC with the following specifications:

Table 2.4: Hardware specifications CPU i7-4790k

RAM 8 GB GPU GTX 970

The neural network training process was performed using the GPU.

(30)

(31)

C

HAPTER

3

R ^ESULTS

3.1 Training

Once the model was fully designed and the parameters were optimally chosen, the training process began.

The training was stopped after 21 epochs, when the validation loss had not improved over the last 10 epochs. 21 epochs are equivalent to a total runtime of 3 hours and 21 minutes, an average of 9.5 minutes per epoch. One epoch is defined as one pass over the entire dataset.

Figure 3.1: Training and validation loss over epochs

(32)

Figure3.1shows a graph of the loss over time. The loss function is binary crossentropy, and lower is better. The graph shows both the training loss and the validation loss.

We can see how during the first epochs both the training loss and the validation loss improve drastically, but after as few as 8 epochs the process slows down. The training loss almost always improves, so if the training process would have not been stopped, the training loss would be even smaller, but the validation loss would probably be worse, because of overfitting. The validation loss fluctuates around some value, that is a clear sign that the model will not improve much more at this point.

3.2 Predictions

Before doing a mathematical analysis of the performance of the newly trained model, it is helpful to visualize the predictions and rate them qualitatively. Visualization of 3D images can be complicated, but as the spines have a small z dimension (z=16), they can be sliced along the z axis forming a series of 2D images (16 32x32 images, to be exact). Figure3.2is an example. The slices are distributed from left to right, and each vertical column of 3 images shows the same location but top is original image, middle is prediction, and bottom is ground truth. In this case there are only 8 images and not 16 because the other layers are completely black because of the zero-padding. The spine image is the same size as the ground truth but with 1 margin on each size, this can be seen on figure3.2because the ground truth is empty in the first and last layer.

Figure 3.2: Example spine. From top to bottom: original image, prediction, ground truth.

Since looking at grayscale images isn’t very efficient, we represent the images using theinfernocolormap, which uses the palette displayed in figure3.3. As usual, black represents the background and yellowish white represents the spine. This colormap is really helpful, because the predictions need to be binarized in order to evaluate the model. So any value between 0 and 0.5 meansbackgroundand anything between 0.5 and 1 meansspine. We can visually distinguish these two cases because the inferno colormap is purple for low values and orange-yellow for high values.

As seen in figure3.2, the prediction is very close to the ground truth. Figure3.4is another example, but this time it can be clearly seen how the model can distinguish spines from dendrites. In the right-most slice there is a hot spot in the image. This

(33)

3.2. Predictions

Figure 3.3: Inferno colormap

is a dendrite, and the model correctly assumes that it is not a spine, as shown in the prediction, and only highlights a very dim spine, which is not even in the ground truth.

Going left, along the z dimension, this dim spine gets brightened and coincides almost exactly with the label. But the dendrite is still dark, in this and all the other images.

Figure 3.4: Example of how the model is capable of distinguishing spines from dendrites.

All these spine examples are from the validation set, the model has not been trained using these spines. See figures3.5-3.7for more examples.

Figure 3.5: Example spine. From top to bottom: original image, prediction, ground truth.

(34)

Figure 3.6: Example spine

Figure 3.7: Example spine

3.3 Analysis

Pred1 is the prediction of Model 1, and Pred2 is the prediction of Model 2.

There are 3 groups of validations:

• Model 2 validation: this evaluates the Model 2 versus the GT. There are 50 images, with a total of 3820 spines. Out of them, the 80% was used for training, and the remaining 20% for validation. This leaves 3056 spines for training and 764 spines for validation.

• Spines from Pred1: Model 1 is used to identify possible spines, and those spines are then evaluated by Model 2. This is the testing phase, as described in section 2.1.2. There are 8 images, with a total of 1524 spines according to the Model 1.

There are 3 comparisons: Pred1 vs GT, Pred2 vs GT and Pred2 vs Pred1.

• Spines from GT: the spines will be extracted from the 8 testing images using the GT. This is an additional validation to directly compare the two models. There are 959 spines. There are 3 comparisons: Pred1 vs GT, Pred2 vs GT and Pred2 vs Pred1.

(35)

3.3. Analysis

The results can be summarized in table3.2, and the equivalent figures3.10,3.9. The metrics used (sensitivity and precision) are explained insection 2.5. Figure3.8shows an example prediction compared to the ground truth. A model with a high sensibility has zero false negatives, the spine is enclosed inside the prediction, but it may have false positives. The image on the right in figure3.8would have no green pixels. On the other hand, precision is the ratio between true positives and prediction, so a model with high precision would have no false positives, the pixels in the prediction must be part of a spine. It can however have false negatives, it may miss parts of the spine. In this case the image on the right in figure3.8would have no blue pixels.

Figure 3.8: Example comparison of prediction vs ground truth. Green: ground truth, blue: prediction. In the image on the right, white is TP, green is FN, blue is FP and white is TN.

For each case, a confusion matrix is built with the combined data of all the spines.

For example, table3.1shows the confusion matrix of the validation spines (Model validation, Validation). Because of the zero-padding, the number of true negatives is greatly incremented.

Table 3.1: Confusion matrix of validation spines (Model 2) 99576566 139068

38223 385151

Figures3.11-3.24show some statistics about the performance on individual spines.

Each validation is explained in more detail in the next sections.

(36)

Table 3.2: Comparison of sensitivity and precision of the various models Sensitivity Precision

Model validation

Training 0.9050 0.7596

Validation 0.9097 0.7347

Spines from Pred1

Pred1_GT 0.7580 0.6011

Pred2_GT 0.5444 0.6123

Pred2_Pred1 0.5922 0.8399

Spines from GT

Pred1_GT 0.6943 0.7989

Pred2_GT 0.5866 0.7668

Pred2_Pred1 0.6693 0.7604

Figure 3.9: Barplots of the data in table3.2

(37)

3.3. Analysis

3.3.1 Model 2 Validation

Figure 3.10: Barplot of the data in table3.2for model 2 validation

In figure3.12we can see the results of the training process of Model 2.

The boxplot shows the distribution of the two main metrics (sensitivity and precision) when applied individually to spines. The horizontal orange line is the median, and the green triangle is the mean. We can see how the training results are very similar to the validation results. Usually training data performs better than validation data.

When this difference is very large we may be overfitting the model. But in this case the difference is very small, probably thanks to the batch normalization and dropout layers, which greatly reduce overfitting.

There were 8 not detected spines in the training data, but 0 in the validation data:

all the validations spines were detected as being spines.

The two histograms below in figure3.12show a more detailed distribution of all the validation spines.

• Training spines: 3056

• Detected training spines: 3048

• Validation spines: 764

• Detected validation spines: 764

(38)

Figure 3.11: Pred2 vs GT (validation phase). The training data is shown as a reference (PPV_t and TPR_t).

(39)

3.3. Analysis

Figure 3.12: Pred2 vs GT (validation phase). The histograms show the validation data.

(40)

3.3.2 Spines from Pred1 See figures3.13-3.18.

Figure 3.13: Pred1 vs GT (spines from Pred1)

Now we want to evaluate the Model 2 against Model 1. We use Model 1 to extract the spines from the image. Because of this, not all the spines detected by Model 1 are real spines, some of them are false positives. We distinguishspine candidates(spines according to Model 1) andtrue spines(spines according to GT).

Here detected means that at least one voxel was considered a spine. It does not necessarily mean that the spine was correctly segmented, that is the function of Model 2.

• Spine candidates detected by Model 1: 1524

• True spines detected by Model 1: 865

• Spine candidates detected by Model 1 and Model 2: 1373

• True spines detected by Model 1 and Model 2: 841

All the detected spines are individually evaluated, and the results are shown in the figures3.13-3.14for Model 1, and figures3.15-3.16for Model 2.

(41)

3.3. Analysis

In figure3.13, the boxplot indicates the limits of each quartile: 25% of the spines are above the box, 25% of the spines are inside the box but above the median line, 25%

are inside the box but below the median line, and the last 25% are below the box.

So in Model 1, after removing the false candidate spine, 75% of the spines have a sensitivity greater than 69% (the first quartile), and 75% of the spines have a precision greater than 68%.

Figure3.15is the equivalent for Model 2, theTPRat the first quartile is lower, at 61%, but thePPVremains the same at 68%.

Figures3.17and3.18compare Model 2 with Model 1 directly. If the two models were similar, the precision and sensitivity of this comparison would be close to 1. The precision is indeed close to 1, the median being at 88%, but the sensitivity median is 72%. A low sensitivity implies that many spines detected by Model 1 will not be detected by Model 2. This refers mainly to the false positive spines: most of the false positives

(42)

will not be detected by Model 2.

(43)

3.3. Analysis

(44)

Figure 3.17: Pred2 vs Pred1 (spines from Pred1)

(45)

3.3. Analysis

Figure 3.18: Pred2 vs Pred1 (spines from Pred1)

(46)

3.3.3 Spines from GT See figures3.19-3.24.

Figure 3.19: Pred1 vs GT (spines from GT)

Now we compare the two models, but extracting the spines directly from the ground truth. This does not show the real world performance of the models, because it only takes into account how the models work on spines, only Model 2 is designed to work directly on spines, Model 1 is supposed to work on the full image.

Here is a summary of the spine detection of both models:

• Total spines: 959

• Detected by Model 1: 855

• Detected by Model 2: 928

• Detected by Model 1 and Model 2: 843

There are more spines detected by Model 2 that spines detected by Model 1. This should be backwards: Model 1 should detect all the spines plus some false positives, so Model 2 can improve that prediction by eliminating the false positives. If Model 1 does

(47)

3.3. Analysis

miss many spines from the image, they won’t be processed by Model 2 so there is no gain.

As we can see in table3.2, now Model 1 has a slightly higher precision and a higher sensitivity than Model 2. In theory the sensitivity and precision of Model 2 should be the same here and in the training data from the Model 2 validation, which is true for precision: the mean is very close to 73% (76%), but sensitivity falls from 90% to 58%, so there must be some difference in the data. Low sensitivity means that some spines remain undetected. If we repeat the measures but without considering the missed spines, the sensitivity of Model 2 goes up to 71% while precision goes down to 71% (see figure3.21). Using the same metrics, Model 2 has a sensitivity of 69% and a precision of 78%.

To summarize: the Model 2 is capable of improving the prediction of Model 1, by eliminating a 18% of the false positives detected by Model 1. Model 2 has a marginally

(48)

higher precision, but only when taking into account the false positives predicted by Model 1. In terms of sensitivity the Model 1 is better, because it makes more predictions, even when they are false positives.

(49)

3.3. Analysis

(50)

Figure 3.23: Pred2 vs Pred1 (spines from GT)

(51)

3.3. Analysis

Figure 3.24: Pred2 vs Pred1 (spines from GT)

(52)

(53)

C

HAPTER

4

C ONCLUSIONS AND FUTURE WORK

We show how a correctly trainedCNNcan achieve a good performance thanks to the current machine learning technology. Combining a model which finds spines in images of dendrites with another model which works on images of spines reduces the number of false positives. Therefore, the main objectives of the project were achieved.

4.1 Future work

Both models can be improved. Since deep learning is a young and active field, there are new breakthroughs every year.

Model 2 can be easily modified to not only detect the spines, but also classify them.

There are 4 types of spines according to their morphology. The biggest challenge would be to obtain enough data.

(54)

(55)

A

PPENDIX

A

A ^PPENDIX - S ^{OURCE CODE}

# draw_figures . py

# Draws f i g u r e s from the p r e d i c t i o n s . import numpy as np

import skimage , os from glob import glob

import matplotlib . pyplot as p l t

from skimage . u t i l . montage import montage2d

def npy_to_figure ( i ) :

path_img = ’ r e s u l t s / p r e di c t /img_%d . npy ’ % i path_pred = ’ r e s u l t s / p r e dict / pred_%d . npy ’ % i path_truth = ’ r e s u l t s / pr e di c t / truth_%d . npy ’ % i path_out = ’ r e s u l t s / f i g u r e s/%d . png ’ % i

spine_image = np . load ( path_img )

spine_image = np . squeeze ( spine_image , a x i s =3) spine_pred = np . load ( path_pred )

spine_pred = np . squeeze ( spine_pred , a x i s =3) spine_mask = np . load ( path_truth )

spine_mask = np . squeeze ( spine_mask , a x i s =3)

stacked = np . dstack ( ( spine_image , spine_pred , spine_mask ) ) stacked = np . transpose ( stacked , ( 2 , 0 , 1) )

p l t . imsave ( path_out , montage2d ( stacked , grid_shape =( 3 , 32) ) , ,→ cmap= ’ inferno ’ )

i f __name__ == "__main__" :

imgs_path = ’ r e s u l t s / p r e dic t / ’

(56)

all_images=glob ( os . path . j o i n ( imgs_path , ’ img_ * . npy ’ ) ) for i in range(len( all_images ) ) :

npy_to_figure ( i )

(57)

# draw_hist_from_csv . py

# Draws histograms from the spine metrics csv . import pandas as pd

import matplotlib . pyplot as p l t def s p l i t _ a t _ i ( x , j ) :

a = [ x [ i ] for i in range( j ) ]

b = [ x [ i ] for i in range( j , len( x ) ) ] return a , b

def e v a l u a t e _ t r a i n ( f i l t e r _ n o t _ d e t e c t e d =False ) : v a l i d a t i o n _ s p l i t _ i = 3056

# save spine r e s u l t s on csv

df = pd . read_csv ( " . / sp i n e _s co re s _ tr ai n . csv " , index_col =0) TPR = df [ ’TPR ’ ]

PPV = df [ ’PPV ’ ] ACC = df [ ’ACC ’ ] TN = df [ ’TN ’ ] FP = df [ ’FP ’ ] FN = df [ ’FN ’ ] TP = df [ ’TP ’ ]

i f f i l t e r _ n o t _ d e t e c t e d :

f i l t e r _ l i s t = s et( [ i for i in range(len(TP) ) i f TP [ i ] ==

,→ 0 ] )

print( ’ F i l t e r i n g ’ + s t r(len( f i l t e r _ l i s t ) ) + ’ not ,→ detected spines ’ )

TPR = [ x for ( i , x ) in enumerate(TPR) i f i not in ,→ f i l t e r _ l i s t ]

PPV = [ x for ( i , x ) in enumerate(PPV) i f i not in ,→ f i l t e r _ l i s t ]

ACC = [ x for ( i , x ) in enumerate(ACC) i f i not in ,→ f i l t e r _ l i s t ]

TN = [ x for ( i , x ) in enumerate(TN) i f i not in ,→ f i l t e r _ l i s t ]

FP = [ x for ( i , x ) in enumerate( FP ) i f i not in ,→ f i l t e r _ l i s t ]

FN = [ x for ( i , x ) in enumerate(FN) i f i not in ,→ f i l t e r _ l i s t ]

TP = [ x for ( i , x ) in enumerate(TP) i f i not in ,→ f i l t e r _ l i s t ]

TPR_t , TPR_v = s p l i t _ a t _ i (TPR , v a l i d a t i o n _ s p l i t _ i ) PPV_t , PPV_v = s p l i t _ a t _ i (PPV , v a l i d a t i o n _ s p l i t _ i ) TN_t , TN_v = s p l i t _ a t _ i (TN, v a l i d a t i o n _ s p l i t _ i )

(58)

FN_t , FN_v = s p l i t _ a t _ i (FN, v a l i d a t i o n _ s p l i t _ i ) FP_t , FP_v = s p l i t _ a t _ i ( FP , v a l i d a t i o n _ s p l i t _ i ) TP_t , TP_v = s p l i t _ a t _ i (TP , v a l i d a t i o n _ s p l i t _ i )

# draw histogram

p l t . f i g u r e ( f i g s i z e = (6 , 6) )

# p l t . subplot ( 3 , 1 , 1)

p l t . boxplot ( [ TPR_t , TPR_v , PPV_t , PPV_v ] , sym= ’ + ’ , whis = 1 . 5 , ,→ showmeans=True )

p l t . x t i c k s ( [ 1 , 2 , 3 , 4 ] , [ ’ TPR_t ’ , ’ TPR_v ’ , ’ PPV_t ’ , ’ PPV_v ’ ] ) p l t . gca ( ) . y a x i s . g r i d ( True )

p l t . s a v e f i g ( ’ r e s u l t s / h i s t s / train_boxplot . png ’ )

# p l t . subplot ( 2 , 1 , 1) p l t . f i g u r e ( f i g s i z e = (8 , 4) )

p l t . h i s t ( TPR_v , 100 , [ 0 , 1 ] ) # h i s t ( data , bins , range ) p l t . minorticks_on ( )

p l t . x l a b e l ( ’TPR ’ )

p l t . y l a b e l ( ’number of spines ’ ) p l t . t i t l e ( ’ S e n s i t i v i t y ’ )

p l t . s a v e f i g ( ’ r e s u l t s / h i s t s / t r a i n _ t p r . png ’ )

p l t . h i s t ( PPV_v , 100 , [ 0 , 1 ] ) # h i s t ( data , bins , range ) p l t . minorticks_on ( )

p l t . x l a b e l ( ’PPV ’ )

p l t . y l a b e l ( ’number of spines ’ ) p l t . t i t l e ( ’ Precision ’ )

p l t . s a v e f i g ( ’ r e s u l t s / h i s t s / train_ppv . png ’ ) print( ’ Training ’ )

cm = [sum( TN_t ) , sum( FP_t ) , sum( FN_t ) , sum( TP_t ) ]

s e n s i t i v i t y = 1 . 0 i f cm[ 3 ] + cm[ 2 ] == 0 e l s e cm[ 3 ] / (cm[ 3 ] + ,→ cm[ 2 ] )

precision = 1 . 0 i f cm[ 3 ] + cm[ 1 ] == 0 e l s e cm[ 3 ] / (cm[ 3 ] + cm ,→ [ 1 ] )

accuracy = (cm[ 3 ] + cm[ 0 ] ) / (cm[ 0 ] + cm[ 1 ] + cm[ 2 ] + cm[ 3 ] ) print( " S e n s i t i v i t y : " + s t r( s e n s i t i v i t y ) )

print( " Precision : " + s t r( precision ) ) print( ’ V al idati on ’ )

cm = [sum(TN_v) , sum( FP_v ) , sum( FN_v ) , sum( TP_v ) ]

(59)

print( " Precision : " + s t r( precision ) )

def e v a l u a t e _ t e s t ( i , use_gt_as_gt=True , f i l t e r _ n o t _ d e t e c t e d =False )

,→ :

i f i == 0 :

test_name = " p1_gt "

e l i f i == 1 :

test_name = " p2_gt "

e l i f i == 2 :

test_name = "p2_p1"

e l s e:

r a i s e ValueError ( ’ t e s t must be 0 , 1 , 2 ’ ) i f use_gt_as_gt :

test_name = " gt_ " + test_name

# save spine r e s u l t s on csv

df = pd . read_csv ( " . / s p i n e _ s c o r e s _ t e s t _ " + test_name + " . csv " , ,→ index_col =0)

TPR = df [ ’TPR ’ ] PPV = df [ ’PPV ’ ] ACC = df [ ’ACC ’ ] TN = df [ ’TN ’ ] FP = df [ ’FP ’ ] FN = df [ ’FN ’ ] TP = df [ ’TP ’ ]

i f f i l t e r _ n o t _ d e t e c t e d :

f i l t e r _ l i s t = s et( [ i for i in range(len(TPR) ) i f TP [ i ] ==

,→ 0 ] )

print( ’ F i l t e r i n g ’ + s t r(len( f i l t e r _ l i s t ) ) + ’ not ,→ detected spines ’ )

TPR = [ x for ( i , x ) in enumerate(TPR) i f i not in ,→ f i l t e r _ l i s t ]

PPV = [ x for ( i , x ) in enumerate(PPV) i f i not in ,→ f i l t e r _ l i s t ]

ACC = [ x for ( i , x ) in enumerate(ACC) i f i not in ,→ f i l t e r _ l i s t ]

TN = [ x for ( i , x ) in enumerate(TN) i f i not in ,→ f i l t e r _ l i s t ]

FP = [ x for ( i , x ) in enumerate( FP ) i f i not in ,→ f i l t e r _ l i s t ]

FN = [ x for ( i , x ) in enumerate(FN) i f i not in ,→ f i l t e r _ l i s t ]

TP = [ x for ( i , x ) in enumerate(TP) i f i not in

(60)

,→ f i l t e r _ l i s t ]

# draw histogram

p l t . f i g u r e ( f i g s i z e =(6 , 6) )

# p l t . subplot ( 3 , 1 , 1)

p l t . boxplot ( [ TPR , PPV ] , sym= ’ + ’ , whis = 1 . 5 , showmeans=True ) p l t . x t i c k s ( [ 1 , 2 ] , [ ’TPR ’ , ’PPV ’ ] )

p l t . gca ( ) . y a x i s . g r i d ( True )

p l t . s a v e f i g ( ’ r e s u l t s / h i s t s / t e s t _ ’ + test_name + ’ _boxplot . png ’

,→ )

p l t . h i s t (TPR, 100 , [ 0 , 1 ] ) # h i s t ( data , bins , range ) p l t . minorticks_on ( )

p l t . x l a b e l ( ’TPR ’ )

p l t . y l a b e l ( ’number of spines ’ ) p l t . t i t l e ( ’ S e n s i t i v i t y ’ )

p l t . s a v e f i g ( ’ r e s u l t s / h i s t s / t e s t _ ’ + test_name + ’ _tpr . png ’ )

p l t . h i s t (PPV , 100 , [ 0 , 1 ] ) # h i s t ( data , bins , range ) p l t . minorticks_on ( )

p l t . x l a b e l ( ’PPV ’ )

p l t . y l a b e l ( ’number of spines ’ ) p l t . t i t l e ( ’ Precision ’ )

p l t . s a v e f i g ( ’ r e s u l t s / h i s t s / t e s t _ ’ + test_name + ’ _ppv . png ’ ) print( test_name )

cm = [sum(TN) , sum( FP ) , sum(FN) , sum(TP) ]

print( " Precision : " + s t r( precision ) )

i f __name__ == "__main__" : e v a l u a t e _ t r a i n ( )

e v a l u a t e _ t e s t ( 0 ) e v a l u a t e _ t e s t ( 1 ) e v a l u a t e _ t e s t ( 2 )

(61)

# evaluate_spines . py

# Evaluate p r e d i c t i o n : save spine metrics to csv import numpy as np

import os

from glob import glob import pandas as pd

c l a s s ValidationMetrics ( ) : def _ _ i n i t _ _ ( s e l f ) :

s e l f . header = [ ’FP ’ , ’FN ’ , ’TP ’ , ’TPR ’ , ’PPV ’ , ’ACC ’ , ’TN ’

,→ ]

s e l f .TN = l i s t( ) s e l f . FP = l i s t( ) s e l f .FN = l i s t( ) s e l f . TP = l i s t( ) s e l f . SENS = l i s t( ) s e l f . PREC = l i s t( ) s e l f .ACC = l i s t( )

s e l f . v a l i d a t i o n _ c a s e s = l i s t ( ) s e l f . val_case = 0

def add_cm( s e l f , cm) :

s e n s i t i v i t y = 1 . 0 i f cm[ 3 ] + cm[ 2 ] == 0 e l s e cm[ 3 ] / (cm ,→ [ 3 ] + cm[ 2 ] )

precision = 1 . 0 i f cm[ 3 ] + cm[ 1 ] == 0 e l s e cm[ 3 ] / (cm[ 3 ] ,→ + cm[ 1 ] )

accuracy = (cm[ 3 ] + cm[ 0 ] ) / (cm[ 0 ] + cm[ 1 ] + cm[ 2 ] + cm ,→ [ 3 ] )

s e l f .TN. append (cm[ 0 ] ) s e l f . FP . append (cm[ 1 ] ) s e l f .FN. append (cm[ 2 ] ) s e l f . TP . append (cm[ 3 ] )

s e l f . SENS . append ( s e n s i t i v i t y ) s e l f . PREC . append ( precision ) s e l f .ACC. append ( accuracy )

s e l f . v a l i d a t i o n _ c a s e s . append ( s e l f . val_case ) s e l f . val_case += 1

def save ( s e l f , o u t _ f i l e ) :

# save spine r e s u l t s to csv

spine_csv = ( { s e l f . header [ 0 ] : s e l f . FP , s e l f . header [ 1 ] : ,→ s e l f . FN, s e l f . header [ 2 ] : s e l f . TP , s e l f . header [ 3 ] : ,→ s e l f . SENS, s e l f . header [ 4 ] : s e l f . PREC, s e l f . header ,→ [ 5 ] : s e l f . ACC, s e l f . header [ 6 ] : s e l f .TN} )

(62)

df = pd . DataFrame . from_records ( spine_csv , index= s e l f . ,→ v a l i d a t i o n _ c a s e s )

df . to_csv ( o u t _ f i l e )

def pixel_confusion_matrix ( y_true , y_pred ) :

# truth_0 = np . count_nonzero (~ y_true )

# truth_1 = np . count_nonzero ( y_true ) pred_0 = np . count_nonzero (~ y_pred ) pred_1 = np . count_nonzero ( y_pred )

# a s s e r t truth_0 + truth_1 == pred_0 + pred_1 == 64*64*32 background = (~ y_true ) * (~ y_pred )

i n t e r s e c t i o n = y_true * y_pred

common_0 = np . count_nonzero ( background ) common_1 = np . count_nonzero ( i n t e r s e c t i o n )

tn = common_0

fp = pred_1 − common_1 fn = pred_0 − common_0 tp = common_1

return ( tn , fp , fn , tp ) def print_confusion_matrix (cm) :

print( ’ Confusion matrix −− [ tn , fp ; fn , tp ] ’ ) print(cm[ 0 ] , cm[ 1 ] )

print(cm[ 2 ] , cm[ 3 ] )

def normalize_confusion_matrix (cm) : a l l _ b g = cm[ 0 ] + cm[ 2 ]

a l l _ s p = cm[ 1 ] + cm[ 3 ] t r y:

tn = cm[ 0 ] / a l l _ b g except ZeroDivisionError :

pass t r y:

fp = cm[ 1 ] / a l l _ s p except ZeroDivisionError :

pass t r y:

fn = cm[ 2 ] / a l l _ b g except ZeroDivisionError :

pass t r y:

tp = cm[ 3 ] / a l l _ s p except ZeroDivisionError :

(63)

pass

i f a l l _ b g == 0 : tn = 1 . 0 fn = 0 . 0 i f a l l _ s p == 0 :

fp = 0 . 0 tp = 1 . 0

return ( tn , fp , fn , tp ) def add_to_cm (cm, new_values ) :

return [cm[ i ] + new_values [ i ] for i in range( 4 ) ] def load_image_pred_mask ( i ) :

path_img = ’ r e s u l t s / pr e dic t /img_%d . npy ’ % i path_pred = ’ r e s u l t s / p r e d ic t / pred_%d . npy ’ % i path_truth = ’ r e s u l t s / p r e d i ct / truth_%d . npy ’ % i spine_image = np . load ( path_img )

spine_pred = np . load ( path_pred )

spine_pred = ( spine_pred > 0 . 5 ) * 1.0 spine_pred = spine_pred . astype ( ’ bool ’ ) spine_mask = np . load ( path_truth )

spine_mask = spine_mask . astype ( ’ bool ’ ) return spine_image , spine_pred , spine_mask def load_image_pred12_mask ( i ) :

path_img = ’ r e s u l t s / predict12 /img_%d . npy ’ % i path_pred1 = ’ r e s u l t s / predict12 / pred1_%d . npy ’ % i path_pred2 = ’ r e s u l t s / predict12 / pred2_%d . npy ’ % i path_truth = ’ r e s u l t s / predict12 / truth_%d . npy ’ % i spine_image = np . load ( path_img )

spine_pred1 = np . load ( path_pred1 )

spine_pred1 = spine_pred1 . astype ( ’ bool ’ ) spine_pred2 = np . load ( path_pred2 )

spine_pred2 = ( spine_pred2 > 0 . 5 ) * 1.0 spine_pred2 = spine_pred2 . astype ( ’ bool ’ ) spine_mask = np . load ( path_truth )

spine_mask = spine_mask . astype ( ’ bool ’ )

return spine_image , spine_pred1 , spine_pred2 , spine_mask def e v a l u a t e _ t r a i n ( ) :

imgs_path = ’ r e s u l t s / p r e dic t / ’

all_images=glob ( os . path . j o i n ( imgs_path , ’ img_ * . npy ’ ) ) absolute_cm_all_spines = [ 0 , 0 , 0 , 0]

training_cm = [ 0 , 0 , 0 , 0]

(64)

v a l i d a t i o n _ s p l i t _ i = 3056 vm = ValidationMetrics ( )

for i in range( 0 , len( all_images ) ) :

_ , prediction , truth = load_image_pred_mask ( i ) cm = pixel_confusion_matrix ( truth , prediction )

# print_confusion_matrix ( normalize_confusion_matrix (cm) ) vm. add_cm(cm)

absolute_cm_all_spines = add_to_cm ( absolute_cm_all_spines ,

,→ cm)

i f i == v a l i d a t i o n _ s p l i t _ i − 1 :

training_cm = [ x for x in absolute_cm_all_spines ] print( ’ Confusion matrix with only t r a i n i n g spines : ’ ) print_confusion_matrix ( ( absolute_cm_all_spines ) ) print( ’ Confusion matrix with a l l spines at once : ’ )

print_confusion_matrix ( ( absolute_cm_all_spines ) ) print( ’ Vali da tio n : ’ )

validation_cm = [ absolute_cm_all_spines [ i ] − training_cm [ i ] ,→ for i in range( 4 ) ]

print_confusion_matrix ( ( validation_cm ) ) vm. save ( " . / sp i n e _ sco r e s_ tr ai n . csv " ) def e v a l u a t e _ t e s t ( ) :

imgs_path = ’ r e s u l t s / predict12 / ’

all_images=glob ( os . path . j o i n ( imgs_path , ’ img_ * . npy ’ ) ) cm_p1_gt = [ 0 , 0 , 0 , 0]

cm_p2_gt = [ 0 , 0 , 0 , 0]

cm_p2_p1 = [ 0 , 0 , 0 , 0]

vma = ValidationMetrics ( ) vmb = ValidationMetrics ( ) vmc = ValidationMetrics ( )

for i in range( 0 , len( all_images ) ) :

_ , pred1 , pred2 , tr uth = load_image_pred12_mask ( i ) cma = pixel_confusion_matrix ( truth , pred1 )

cmb = pixel_confusion_matrix ( truth , pred2 ) cmc = pixel_confusion_matrix ( pred1 , pred2 )

# print_confusion_matrix ( normalize_confusion_matrix (cm) ) vma. add_cm(cma)

(65)

vmb. add_cm(cmb) vmc . add_cm(cmc)

cm_p1_gt = add_to_cm ( cm_p1_gt , cma) cm_p2_gt = add_to_cm ( cm_p2_gt , cmb) cm_p2_p1 = add_to_cm ( cm_p2_p1 , cmc)

print( ’ Pred1 compared to GT ’ ) print_confusion_matrix ( cm_p1_gt ) print( ’ Pred2 compared to GT ’ ) print_confusion_matrix ( cm_p2_gt ) print( ’ Pred2 compared to Pred1 ’ ) print_confusion_matrix ( cm_p2_p1 )

vma. save ( " . / spine_scores_test_p1_gt . csv " ) vmb. save ( " . / spine_scores_test_p2_gt . csv " ) vmc . save ( " . / spine_scores_test_p2_p1 . csv " ) i f __name__ == "__main__" :

e v a l u a t e _ t r a i n ( ) e v a l u a t e _ t e s t ( )

(66)

# example_dendrite . py

# Shows a sample d e n t r i t e import numpy as np

from skimage import img_as_float import matplotlib . pyplot as p l t t r y:

import nibabel as nib except:

r a i s e ImportError ( ’ I n s t a l l NIBABEL ’ )

# Spines : 255

# Dendrites : 150 def mask_150_255 ( img ) :

image = ( ( img == 255) * 255) . astype ( ’ uint8 ’ )

# p r i n t ( ’ Mask type : ’ , image . dtype , ’ Range : ’ , np . amin( image ) , ,→ ’ . . . ’ , np . amax( image ) , ’ Shape : ’ , image . shape )

return image

def read_image_to_array ( img_path ) :

image = nib . load ( img_path ) . get_data ( )

print( ’ Image type : ’ , image . dtype , ’ Range : ’ , np . amin ( image ) , ’ ,→ . . . ’ , np . amax( image ) , ’ Shape : ’ , image . shape )

’ ’ ’

stacked = np . transpose ( image , ( 2 , 0 , 1) ) stacked = mask_150_255 ( stacked )

p l t . imsave ( img_path + ’ . png ’ , montage2d ( stacked ) , cmap= ’ ,→ inferno ’ )

’ ’ ’

# Convert from int16 or uint8 to f l o a t 3 2 t r y :

# img_as_float r e t u r n s f l o a t 6 4

image = img_as_float ( image ) . astype (np . f l o a t 3 2 ) except ValueError :

# i f the image i s already a f l o a t , make i t f l o a t 3 2 image = image . astype (np . f l o a t 3 2 )

#Normalize Data to [ 0 , 1]

x = image

image = ( x−np . amin( x ) ) / ( np . amax( x )−np . amin( x ) )

’ ’ ’

(67)

return image

i f __name__ == "__main__" : z = 35

img = read_image_to_array ( ’ data / data / i f 6 .1.1−1 enero / spine . n i i . ,→ gz ’ )

mask = read_image_to_array ( ’ data / data / i f 6 .1.1−1 enero / tru th . n i i ,→ . gz ’ )

mask = mask_150_255 (mask) . astype ( ’ bool ’ ) p l t . f i g u r e ( f i g s i z e =(12 , 18) )

stacked_mask = np . transpose (mask , ( 2 , 0 , 1) )

p l t . imsave ( ’ example_mask . png ’ , stacked_mask [ z ] , cmap= ’ gray ’ ) p l t . f i g u r e ( f i g s i z e =(12 , 18) )

stacked_img = np . transpose ( img , ( 2 , 0 , 1) )

p l t . imsave ( ’ example_img . png ’ , stacked_img [ z ] , cmap= ’ gray ’ )

# to rgb

stacked_img = np . stack ( ( stacked_img , ) *3 , −1) img_layer = stacked_img [ z ]

img_layer [ stacked_mask [ z ] , 0] = img_layer [ stacked_mask [ z ] , 0]

,→ * 0 . 5 + 127

,→ * 0 . 7

p l t . imsave ( ’ example_mask_img . png ’ , img_layer )

(68)

# nidata . py

# Converts data from n i i format to npy format

# ready to be used by the unet import numpy as np

import skimage , os

from skimage . measure import l ab el , regionprops from skimage . u t i l . montage import montage2d from scipy import ndimage as ndi

from skimage import img_as_float import matplotlib . pyplot as p l t from glob import glob

t r y:

import nibabel as nib except:

r a i s e ImportError ( ’ I n s t a l l NIBABEL ’ )

# Binarize the p r e d i c t i o n

# Spines : 255

# Dendrites : 150 def mask_150_255 ( img ) :

image = ( ( img == 255) * 255) . astype ( ’ uint8 ’ )

# p r i n t ( ’ Mask type : ’ , image . dtype , ’ Range : ’ , np . amin( image ) , ,→ ’ . . . ’ , np . amax( image ) , ’ Shape : ’ , image . shape )

return image

def save_montage_img_mask ( path_out , image , mask) : p l t . f i g u r e ( f i g s i z e =(12 , 18) )

stacked = np . dstack ( ( image , mask) )

stacked = np . transpose ( stacked , ( 2 , 0 , 1) )

p l t . imsave ( path_out , montage2d ( stacked ) , cmap= ’ inferno ’ )

# Se t f i l t e r _ s p i n e s = F al s e when drawing histogram

def get_spines_from_image ( image , mask , f i l t e r _ s p i n e s =True ) :

# l a b e l image r e g i o n s label_image = l a b e l (mask)

props = regionprops ( label_image ) spine_images = [ ]

spine_masks = [ ] margin = 1 for r in props :

# p r i n t ( r . bbox )

bb = s l i ce _ bb ( r . bbox , margin , mask . shape ) spine_mask=mask [ bb ]

Dendritic spine segmentation using a convolutional neural network

T reba ll F ina l de G rau

GRAU D’ENGINYERIA ELECTRÒNICA INDUSTRIAL I AUTOMÀTICA