3.2 Papers
3.2.7 Paper VII
Two-Stage Transfer Learning for Heterogeneous Robot Detection and 3D Joint Position Estimation in a 2D Camera Image using CNN
Using the transfer learning approach to re-train a multi-objective CNN to a new robot type was an efficient method. However, the problem remained that the net-work was adapted to a new robot and forgot the previously-trained robot types.
In this work, we develop a method to include new robot types into the multi-objective CNN without forgetting previously-trained information, making it capa-ble of recognising all the robots at once.
A new robot was added: Franka Emika Panda. It has 7 DoF and plain white-black appearance, which is even more difficult to distinguish in front of white or grey walls, making it a more difficult challenge. The training dataset consisted of a collection of frames containing all the robots: Universal Robots, Kuka and Franka Panda. It allowed the network to train to include a new robot type, while still being able to recognise previously-known robots.
The transfer learning approach was modified to a two-stage model, shown in
Figure 3.16. Stage 1 adjustment consisted of the outer layers of the CNN, and the
training was continued until no more improvement in the loss was detected. Once
Conv 32F Filter Size: 3x3
Dilation 2x2
Input Robot Mask
Conv 32F Filter Size: 3x3
Dilaton 5x5
Conv 64F Filter Size: 3x3
Dilation: 5x5
Conv 128F Filter Size: 2x2
Dilation: 3x3
Conv 32F Filter Size: 3x3
Dilation 3x3
Conv 32F Filter Size: 3x3
Dilation 5x5
Conv 32F Filter Size: 3x3
Dilation 3x3
Max Pooling 2x2
Max Pooling 2x2
Max Pooling 2x2 Max Pooling
2x2
Joint 3D Coordinates
Robot Type
Robot Base Position
FC1024 Layers
Sum FC1024
FC512 Layers
Stage 1 TL Adjustment Stage 2 TL Adjustment
Figure 3.16. The multi-objective CNN with a two-stage transfer learning. The network is taught in two stages using the transfer learning approach. In stage 1, the parameters for all the layers, besides the final ones marked in blue arefrozenand the system is trained until there is no more improvement. Afterwards, in stage 2, the parameters of the CNN layers marked in red, together with all the stage 1 layers, are adjusted during the training.
the training settles, Stage 2 starts where more layers are unlocked, and the neural network trains further to improve the detection accuracy. This approach allows to keep the training process still short but adjusts more parameters in the CNN to adapt it to more distinct visual appearances of the robots.
A new training dataset collection method was developed to avoid overfitting.
By adding motion capture markers on the camera, we were able to precisely track the vision sensor and move it freely around the space capturing the robot in a large variety of backgrounds and changing lighting conditions. This more diverse data ensures that the CNN is learning the right features of a robot instead of figuring out how to identify the background, which would prove difficult recognition in real-world scenarios.
Compared to the previously presented work in Paper VI, the detection
accu-racy of the current two-stage transfer learning approach achieved similar accuaccu-racy
with a joint position error of 2.46cm vs 3.12cm and slightly worse accuracy for a
robot mask estimation: 97% vs 98% in the previous work. Full training time of
the multi-objective CNN for UR robots took 60 hours vs 10 hours in the current
work to include the detection of Kuka and Franka Panda robots based on two-stage
0 2000 4000 6000 8000 10000 12000
IteratLons
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Loss
Loss over traLnLng LteratLons TF: 6tage 2 TF: 6tage 1
(a) Loss function against the number of train-ing iterations. Stage 1 and stage 2 of our two-stage transfer learning approach are marked by background colours on the graph.
1.0 1.5 2.0 2.5 3.0 3.5 4.0
DistDnFe fURP tKe CDPeUD
1 2 3 4 5 6
(UURU (FP)
JRint (stiPDtiRn (UURU vs DistDnFe tR tKe CDP U5
.ukD )UDnkD 3DndD
(b) Errors for 3D position estimation of robot joints depending on the camera distance from the robot. Results are grouped by the robot type.
Figure 3.17. Evaluation of the two-stage transfer learning method using the test dataset.