• No results found

4.2 Preliminary Experiments

4.2.2 Federated Stochastic Gradient Descent

4.2.3.2 FedAvg with CNN

During the second part of the experiment, we trained the convolutional neural network described in Sec-tion 3.2.4.3 using the federated learning algorithm FedAvg. In addiSec-tion to training this model using the non-IID data distribution that was used when experimenting with the artificial neural network, we also wanted to test the convolutional neural network in regards to the two other data distributions described in Section 3.2.3.

Non-IID Distribution

The following results illustrate the performance of the CNN model applied to the non-IID data distribu-tion of the dataset. The distribudistribu-tion is described in Secdistribu-tion 3.2.3.

Metrics

Test Accuracy: 96.2%

Training Accuracy: 99.3%

Test Loss: 0.20

Training Loss: 0.02

Training Time: 687 s

Table 16: Accuracy, loss and training time for the FedAvg experiment with the CNN model trained on Non-IID data. The metrics presented in this table indicate a well-performing model with a relatively high training time. Compared to the metrics for the centralized learning experiment described in Table 6, the CNN model trained with FedAvg performed slightly worse and had a higher training time.

Classification Report

Class Precision Recall F1-Score Support

Normal 0.99 0.97 0.98 18118

Supra Ventricular 0.61 0.84 0.71 556

Ventricular 0.87 0.94 0.90 1448

Fusion 0.49 0.90 0.64 162

Unknown 0.97 0.99 0.98 1608

Table 17: Classification report for the FedAvg experiment with the CNN model trained on Non-IID data.

The table describes the precision, recall, F1-Score and support values for the experiment. The F1-scores il-lustrated in this table show a model that performed extremely well on every class, except theFusion class where it performed slightly worse. Compared to the F1-score for the centralized learning experiment de-scribed in Table 7, the CNN model trained with FedAvg performed similarly on every class.

N S V F U

Predicted label N

S V F U

True label

0.97 0.02 0.01 0.01 0.00

0.12 0.84 0.03 0.00 0.01

0.02 0.01 0.94 0.02 0.01

0.04 0.00 0.06 0.90 0.01

0.01 0.00 0.01 0.00 0.99

0.0 0.2 0.4 0.6 0.8

Figure 43: Confusion matrix for the FedAvg experiment with the CNN model trained on Non-IID data.

The confusion matrix shows a clear diagonal indicating a high rate of true positives and true negatives in the model. The diagonal is slightly more evident for the CNN model with FedAvg compared to the CNN model with centralized learning. The confusion matrix for the CNN model with centralized learning is il-lustrated in Figure 31.

0 5 10 15 0.6

0.65 0.7 0.75 0.8 0.85 0.9 0.95

1 variable

Training Accuracy Validation Accuracy

Epoch

Accuracy

Figure 44: Graph illustrating the training and validation accuracy of the FedAvg experiment with the CNN model trained on Non-IID data. From the graph one can observe that the training and validation accuracy converges, indicating that the model did not overfit on the training data.

0 5 10 15 0

0.2 0.4 0.6 0.8 1 1.2 1.4

1.6 Training Loss

Validation Loss

Epoch

Loss

Figure 45: The graph illustrates the training and validation loss of the FedAvg experiment with the CNN model trained on Non-IID data. The graph illustrates that the validation and training loss converges.

Uniform Distribution

The following results illustrate the performance of the CNN model applied to a uniform data distribution of the dataset. The distribution is described in Section 3.2.3.

Metrics

Test Accuracy: 95.2%

Training Accuracy: 98.5%

Test Loss: 0.20

Training Loss: 0.05

Training Time: 598 s

Table 18: Accuracy, loss and training time for the FedAvg experiment with the CNN model trained on the uniform data distribution. The metrics shown in this table describe a well-performing model with a rela-tively high training time. Compared to the CNN model trained with FedAvg on the Non-IID data distri-bution described in Table 16, the model performed a bit worse. However, the training time was lower for the uniform data distribution.

Classification Report

Class Precision Recall F1-Score Support

Normal 0.99 0.95 0.97 18118

Supra Ventricular 0.52 0.84 0.64 556

Ventricular 0.86 0.94 0.90 1448

Fusion 0.41 0.88 0.56 162

Unknown 0.97 0.98 0.96 1608

Table 19: Classification report for the FedAvg experiment with the CNN model trained on uniform data distribution. The table describes the precision, recall, F1-Score and support values for the experiment.

From the table one can observe that the F1-score is high for every class, except theFusion class where it is slightly lower. Compared to the CNN model trained with FedAvg and the Non-IID data distribution, the performance of the model trained on uniform data was worse for every class.

N S V F U

Predicted label N

S V F U

True label

0.95 0.02 0.01 0.01 0.00

0.12 0.84 0.03 0.01 0.00

0.02 0.01 0.94 0.02 0.00

0.04 0.01 0.07 0.88 0.00

0.01 0.00 0.00 0.00 0.98

0.0 0.2 0.4 0.6 0.8

Figure 46: Confusion matrix for the FedAvg experiment with the CNN model trained on the uniform data distribution. The confusion matrix illustrates a clear diagonal indicating that the model had few false pos-itives and few false negatives.

Class Distribution

The following results illustrate the performance of the CNN model applied to the class distribution of the dataset. This distribution is described in Section 3.2.3. The class distribution only allows for 5 clients in total due to there only being 5 classes. This also means that there will only be 5 participating clients per round. This is the only experiment that used a different number of clients from what is described in Table 13.

Metrics

Test Accuracy: 9.4%

Training Accuracy: 99.9%

Test Loss: 3.95

Training Loss: 0.000314

Training Time: 564 s

Table 20: Accuracy, loss and training time for the FedAvg experiment with the CNN model trained on class distributed data. The metrics shown in this table describe a model that performed extremely poorly.

It also has a relatively high training time. Both compared to the Non-IID data distribution (Table 16) and the uniform data distribution (Table 18), this model performed much worse.

Classification Report

Class Precision Recall F1-Score Support

Normal 0.00 0.00 0.00 18118

Supra Ventricular 0.00 0.00 0.00 556

Ventricular 0.06 0.81 0.11 1448

Fusion 0.06 0.46 0.11 162

Unknown 0.66 0.50 0.57 1608

Table 21: Classification report for the FedAvg experiment with the CNN model trained on class dis-tributed data. The table describes the precision, recall, F1-Score and support values for the experiment.

From the table one can observe that the F1-scores are extremely low for every class, expect theUnknown class where it is decent. This indicates that the model had low class precision and low recall.

N S V F U Predicted label

N S V F U

True label

0.00 0.00 0.93 0.05 0.01

0.00 0.00 0.97 0.02 0.01

0.00 0.00 0.81 0.08 0.11

0.00 0.00 0.54 0.46 0.01

0.00 0.00 0.49 0.01 0.50

0.0 0.2 0.4 0.6 0.8

Figure 47: Confusion matrix for the FedAvg experiment with the CNN model trained on class distributed data. The figure shows a line straight down the middle of the confusion matrix, illustrating that the model classified nearly every ECG recording asVentricular beats. This indicates a high false positive and false negative rate.