Deep learning based CADs - Concluding remarks

Concluding remarks

12.3 Deep learning based CADs

In this thesis, we have developed two deep learning-based CADs to be used for lung and breast cancer diagnostics. Results from the lung cancer-subproject produced close to state-of-the-art performance both in detection and segmen-tation on the benchmark LIDC data set. For the breast cancer-subproject, there has never been reported performance on BC-grading for all three grades. Per-formance was especially good for the initial data set, but failed to generalize to new data. Nonetheless, this study illustrates the power and benefits of using deep learning in the field of CADs. Deep learning-based methods are easily adapted between modalities, cancer types, problems and data types.

12.3.1 Segmentation

In this thesis, we designed a state-of-the-art autoencoder-design inspired by the UNet-architecture, to produce robust segmentation performance in lung, lung nodule and breast tumor segmentation. The same networks could be easily adapted to different problems, data sets and data types. Even on small data sets, the methods produced great results, on both lung and breast tumor segmentation, and seemed to generalize well, especially clearly on breast tumor segmentation.

12.3.2 Classiﬁcation

For classification, we used a VGG16-inspired architecture for malignancy pre-diction of lung nodules and histopathological grade prepre-diction of breast tumors.

Both models performed extremely well on the initial data sets, but it is uncer-tain how well the models will perform on new data. Smart data augmentation, i.e. color augmentation, seemed to be provide better generalization, but it requires additional studies on new data sets to evaluate this further.

12.3.3 Local vs open data sets

Using open data sets, one does not always get the ideal data for training a network. An example of this was lung segmentation using the LCTSC data set.

Even though we trained a network that outperformed the traditional

intensity-12.3 D E E P L E A R N I N G BA S E D C A D S 131 based methods on the current data set, it did not generalize as well on data sets of different FOVs, due to lack of proper data (thoracic data which includes nodules in lung mask). Hence, in these cases it might be beneficial to use traditional image processing/machine vision approaches, as we observed for lung segmentation.

12.3.4 Machine learning vs traditional methods

Throughout this thesis, we have used machine learning methods for classifi-cation and segmentation, but in many cases, traditional methods are more suitable, as they might be more robust, requiring no training, and also be mem-ory efficient and fast. An example is lung segmentation. Even though we could use a machine learning method for this problem, using a simple intensity-based method provided more robust performance on data sets of different FOVs. It is typically easier to tackle new scenarios when we define the algorithm our-selves. It is tougher to train a network to be invariant to changes it has not yet seen. Thus, for insufficient data, small data sets or lack of variance, traditional methods might be a better solution than machine learning methods. As well as pre and post-processing, traditional methods can always be useful. This is seen in tissue and lung segmentation, and post-processing of lung nodules.

12.3.5 Speed

Using a single GPU during prediction, we were able to make pipelines that processed from the raw data to give an output in a matter of seconds. The slowest process we made was the sliding window prediction in BC grading, but even here the slowest process was around 2 minutes. For smaller tumors, processing time was as low as 2-4 seconds. Total processing time from the raw cellsens vsi-format to predictions with heatmaps, took about 4 seconds - 2 minutes, mostly dependent on the tumor size. Hence, processing time is feasible for pathologists, but require further advances in classification performance to be useful in practice.

Lung segmentation using the 2D-UNet approach took less than a second on the GPU, while the traditional methods took about 6-7 seconds, on the CPU. For lung nodule segmentation, sliding window predictions without overlap took approximately two minutes on the CPU, while on the GPU took approximately 10 seconds. The overall pipeline took approximately 20-25 seconds to process one full CT-scan, processing from the raw DICOM-format. Hence, processing time is feasible for radiologists.

132 C H A P T E R12 CO N C LU D I N G R E M A R K S

12.3.6 Data augmentation

For small data sets, data augmentation proved to be extremely beneficial for producing more robust and generalizable networks. This was especially clear in patch wise classification of histological grade. However, doing too strong augmentation, might degrade overall performance, as seen in the malignancy classification with heavy 3D translations, as well as with heavy/heavier HSV in patch wise classification. When generating artificial data, it is important that the new data represent the true population - in the sense that it is natural.

A result of using heavier HSV color augmentation can be seen in Figure 12.5, producing unnatural staining.

Figure 12.6:Too heavy HSV color augmentation produces unnatural stains

12.3.7 Bootstrapping

Even though bootstrapping is a fast and easy way to generate CIs, there are many cases were doing bootstrapping might not be the best idea. Even though one may assume that the measurements are drawn iid from the same distri-bution, it becomes problematic when the data set is small, there is not much variance in the data, and if there are outliers.

Figure 12.7 (a) shows the distributions of estimated DSC for breast tumor segmentation for patients in the test set. Here, most DSCs are around 0.9, but

12.3 D E E P L E A R N I N G BA S E D C A D S 133

(a)DSC (b)avDSC

Figure 12.7:Comparing distribution of DSC and avDSC found using bootstrapping

then there is a clear outlier to the left close to DSC=0.5. The distributions is clearly not symmetric (left-skewed). Therefore, to estimate the BCa interval of avDSC, we used bootstrapping, which resulted in Figure 12.7 (b). Because of this serious outlier, doing bootstrapping might produce replicates of this value, which overall results in a bootstrapped avDSC that might be way lower then the average DSC of the original sample. Perhaps there is not enough variance in the data set to say whether or not this is an actual outlier. However, after doing bootstrapping, calculating the BCa interval was the right choice as the sample distribution of avDSC was quite asymmetric. Using simple percentile intervals her might produce an interval that even goes beyond the maximum DSC of 1.

134 C H A P T E R12 CO N C LU D I N G R E M A R K S

12.4 Conclusion

Based on the work presented in this thesis, we draw the follow conclusions:

• We were successful in developing a multipotent deep learning-based CAD design which adapted well across two widely different imaging modalities and data types

• We have created two platforms for further development (breast) and testing (lung). These platforms may be used for further research by MSc and PhD students

• We were successful in developing a pipeline for processing WSI from the cellsens vsi format. The pipeline can be use in further research within the field of digital pathology

• For both subprojects, we were able to design CAD systems using deep learning, one of which, to the best of our knowledge, produced state-of-the-art performance in overall lung nodule detection and segmentation trained and evaluated on the LIDC data set

• Processing time from raw DICOM format is a few seconds. Processing time from raw vsi format is heavily depended on tumor size. The slowest reported processing time for vsi was two minutes

• For lung nodules we showed that the results can be easily visualized in CustusX, along with the predicted lung mask. For histological grade predictions we produced heatmaps to help pathologists interpret the results

In document A Step Towards Deep Learning-based CADs for Cancer Analysis in Medical Imaging (sider 148-152)