Evaluation of the SciNets library - Ground truth Predicted tumour

Ground truth Predicted tumour

6.3 Evaluation of the SciNets library

In this thesis, we introduced the SciNets library, which allowed us to systematically run a large parameter sweep for image segmentation using a U-Net architecture.

The straightforward API allowed us to shift focus away from the implementation details of the neural networks when running experiments.

Another benefit of the SciNets library is that the same parameter files can be

segmentation of organs at risk from PET/CT scans as well as HDF5 files for segmentation of rectal cancers from MRI images of approximately 200 patients, were prepared. How, neither of these experiments were run due to time limitations.

Several weaknesses of the SciNets library became apparent during and after the training. Firstly, theDataReaderclass will allways shuffle the dataset, which made it difficult to assess patient-to-patient performance. It was, however, possible as the store_outputs method stores the input-output pairs of the model as well as the the index of each such pair.

The ideal way to fix the above problem is to implement a context manager that prevents dataset shuffling. Then, the TensorFlow session be should generated within that context. Below is a demonstration of how the API of such a solution should be.

1 dataset = scinets . data . HDFDataset (

2 data_path="/ datasets / val_split_2d .h5"

3 batch_size=[ train_batch_size , val_batch_size , test_batch_size ,]

4 train_group=" train "

5 val_group="val"

6 test_group="test ",

7 preprocessor=preprocessor ,

8 is_training=is_training ,

9 is_testing=is_testing

10 )

12 with dataset . no_shuffle (): # This context is not implemented in SciNets

13 with tf. Session () as sess :

14 # Do something

Furthermore, the way the parameter logging was performed made it cumbersome to compare models using more than one performance metric (in our case, the mean Dice per slice). There is, unfortunately, no easy method to integrate multiple final performance metrics in SacredBoard.

There are, however, two solutions to this problem. One solution is to develop a similar tool as Sacred and SacredBoard. However, this is no small feat. Therefore, we recommend using Comet.ml for tracking instead. Comet.ml is a commercial service that provides a tool similar to SacredBoard, but without having to rent a server and set up a database. Furthermore, it offers the use of several evalu-ation metrics, instead of just one. Additionally, comet.ml offers an easier API for experiment logging.

product of a company. Thus, if the company goes bankrupt, or the pricing model changes, then the database of experiments might be lost. Luckily, the paid plans are currently free for academic use.

Comet.ml was not implemented as part of SciNets for three reasons. Firstly, it was launched in April 2018, four months after the development of SciNets had started, and it was, as of July 2018, not completely stable. Furthermore, as a new, commercial product with unknown lifetime, it was deemed an unstable option.

Finally, the downsides of Sacred and SacredBoard were not known before the final analysis of the results was conducted.

The generation of the final evaluation plots and tables was unnecessarily cum-bersome and automating this is clearly beneficial. SciNets should, therefore, be extended to automatically generate a wide range of tables and plots chosen by the user. These plots and tables could then be generated for each model. If a Comet.ml logger is implemented, these plots and tables could be uploaded to the Comet.mlproject corresponding to the experiments.

There is one caveat to keep in mind when automating the generation of final eval-uation tables and figures, namely, how to deal with the test set. Performances on the test set should not be generated automatically for all models, as the com-parison of models should not be performed on the basis of the test set. Thus, the best way to implement this is to have a class that generates the evaluation tables and results for a given dataset, and call this function with the validation set automatically after each model is trained.

The analysis of the experiments also revealed the need for good visualisation tools.

Adding an interactive toolkit is outside the scope of SciNets. However, extending the final final evaluation pipeline to generate guided backpropagation outputs is within the scope. Thus, for a subset of the patients, guided backpropagation visualisation can be created for separately for all connected components in the proposed segmentation map.

Another shortcoming of the library was that creating of experiment parameter files was time consuming. This was mainly an effect of the way theTensorboardLogger, particularly how the Tensorboard image logger, was implemented was implemen-ted. Problems with the image logger arose when different windowing and channel settings were used, thereby changing the number of input channels and requir-ing different loggers. This meant that the dataset parameters and the logger parameters were coupled, so simply creating log files with all possible parameter

One way to combat the problem with the image loggers is to create a new log type.

The new logger would not take a parameter that represents which input channel to visualise, but rather make one Tensorboard image log per input channel. By implementing such a logger, it would be trivial to generate all combinations of parameter configurations automatically.

The experiment phase revealed certain problems that caused poor GPU utilisation (20%) and had to be corrected. The first problem was that bicubic interpolation was used after upconvolutions to ensure that the output of the upconvolutional layer and the input to the corresponding maxpooling layer were of the same size.

This is necessary because a max pooling followed by an upconvolution will have dimensions that differ by one if the input to the max pooling layer is odd. After profiling the code, it became apparent that TensorFlow’s bicubic interpolation layer was unable to run on a GPU. By using bilinear interpolation layers instead, the GPU utilisation increased markedly.

The other performance bottleneck came from how the datasets were loaded from disc. Two separate methods were tested to reduce the time taken to load; keeping the entire dataset in RAM and fetching the next batch while computing the current batch gradient. There were no noticeable performance gains by keeping the whole dataset in RAM. Prefetching is therefore recommended.

In document Deep learning for automatic delineation of tumours from PET/CT images (sider 197-200)