• No results found

RAM 8 GB GPU GTX 970

3.3.1 Model 2 Validation

Figure 3.10: Barplot of the data in table3.2for model 2 validation

In figure3.12we can see the results of the training process of Model 2.

The boxplot shows the distribution of the two main metrics (sensitivity and preci-sion) when applied individually to spines. The horizontal orange line is the median, and the green triangle is the mean. We can see how the training results are very similar to the validation results. Usually training data performs better than validation data.

When this difference is very large we may be overfitting the model. But in this case the difference is very small, probably thanks to the batch normalization and dropout layers, which greatly reduce overfitting.

There were 8 not detected spines in the training data, but 0 in the validation data:

all the validations spines were detected as being spines.

The two histograms below in figure3.12show a more detailed distribution of all the validation spines.

• Training spines: 3056

• Detected training spines: 3048

• Validation spines: 764

• Detected validation spines: 764

Figure 3.11: Pred2 vs GT (validation phase). The training data is shown as a reference (PPV_t and TPR_t).

3.3. Analysis

Figure 3.12: Pred2 vs GT (validation phase). The histograms show the validation data.

3.3.2 Spines from Pred1 See figures3.13-3.18.

Figure 3.13: Pred1 vs GT (spines from Pred1)

Now we want to evaluate the Model 2 against Model 1. We use Model 1 to extract the spines from the image. Because of this, not all the spines detected by Model 1 are real spines, some of them are false positives. We distinguishspine candidates(spines according to Model 1) andtrue spines(spines according to GT).

Here detected means that at least one voxel was considered a spine. It does not necessarily mean that the spine was correctly segmented, that is the function of Model 2.

• Spine candidates detected by Model 1: 1524

• True spines detected by Model 1: 865

• Spine candidates detected by Model 1 and Model 2: 1373

• True spines detected by Model 1 and Model 2: 841

All the detected spines are individually evaluated, and the results are shown in the figures3.13-3.14for Model 1, and figures3.15-3.16for Model 2.

3.3. Analysis

Figure 3.14: Pred1 vs GT (spines from Pred1)

In figure3.13, the boxplot indicates the limits of each quartile: 25% of the spines are above the box, 25% of the spines are inside the box but above the median line, 25%

are inside the box but below the median line, and the last 25% are below the box.

So in Model 1, after removing the false candidate spine, 75% of the spines have a sensitivity greater than 69% (the first quartile), and 75% of the spines have a precision greater than 68%.

Figure3.15is the equivalent for Model 2, theTPRat the first quartile is lower, at 61%, but thePPVremains the same at 68%.

Figures3.17and3.18compare Model 2 with Model 1 directly. If the two models were similar, the precision and sensitivity of this comparison would be close to 1. The precision is indeed close to 1, the median being at 88%, but the sensitivity median is 72%. A low sensitivity implies that many spines detected by Model 1 will not be detected by Model 2. This refers mainly to the false positive spines: most of the false positives

Figure 3.15: Pred2 vs GT (spines from Pred1)

will not be detected by Model 2.

3.3. Analysis

Figure 3.16: Pred2 vs GT (spines from Pred1)

Figure 3.17: Pred2 vs Pred1 (spines from Pred1)

3.3. Analysis

Figure 3.18: Pred2 vs Pred1 (spines from Pred1)

3.3.3 Spines from GT See figures3.19-3.24.

Figure 3.19: Pred1 vs GT (spines from GT)

Now we compare the two models, but extracting the spines directly from the ground truth. This does not show the real world performance of the models, because it only takes into account how the models work on spines, only Model 2 is designed to work directly on spines, Model 1 is supposed to work on the full image.

Here is a summary of the spine detection of both models:

• Total spines: 959

• Detected by Model 1: 855

• Detected by Model 2: 928

• Detected by Model 1 and Model 2: 843

There are more spines detected by Model 2 that spines detected by Model 1. This should be backwards: Model 1 should detect all the spines plus some false positives, so Model 2 can improve that prediction by eliminating the false positives. If Model 1 does

3.3. Analysis

Figure 3.20: Pred1 vs GT (spines from GT)

miss many spines from the image, they won’t be processed by Model 2 so there is no gain.

As we can see in table3.2, now Model 1 has a slightly higher precision and a higher sensitivity than Model 2. In theory the sensitivity and precision of Model 2 should be the same here and in the training data from the Model 2 validation, which is true for precision: the mean is very close to 73% (76%), but sensitivity falls from 90% to 58%, so there must be some difference in the data. Low sensitivity means that some spines remain undetected. If we repeat the measures but without considering the missed spines, the sensitivity of Model 2 goes up to 71% while precision goes down to 71% (see figure3.21). Using the same metrics, Model 2 has a sensitivity of 69% and a precision of 78%.

To summarize: the Model 2 is capable of improving the prediction of Model 1, by eliminating a 18% of the false positives detected by Model 1. Model 2 has a marginally

Figure 3.21: Pred2 vs GT (spines from GT)

higher precision, but only when taking into account the false positives predicted by Model 1. In terms of sensitivity the Model 1 is better, because it makes more predictions, even when they are false positives.

3.3. Analysis

Figure 3.22: Pred2 vs GT (spines from GT)

Figure 3.23: Pred2 vs Pred1 (spines from GT)

3.3. Analysis

Figure 3.24: Pred2 vs Pred1 (spines from GT)

C

HAPTER

4