Robust Image Alignment - Detection and Tracking of Point Features 61

6. Detection and Tracking of Point Features 61

6.6. Improvements

6.6.3. Robust Image Alignment

If intensity discrepancies are added over a whole patch area as in equation (6.13), it is assumed that the patch is completely planar, that there are no occlusions and that there is a global linear change of illumination. Neither of these assumptions is always true in real life scenarios. Patches are not always totally part of a planar surface, areas of the scene can be occluded by interacting persons or by other objects, and spotlights or reflections cannot be modeled by a global linear lighting model.

Therefore the detection of outliers during the template tracking is worth considering.

Hager and Belhumeur [35] use a robust estimator function for the detection of occlusions.

They modify the error function by solving a robust optimization problem of the form =^X

ρ(I(g(x;p)−T(x))), (6.22)

where ρ is one of a wide variety of robust estimator functions [122]. Instead of the estimator function ρ the same problem can also be expressed with a weighting matrix M(x). The authors also used morphological operators to remove outliers of the weighting matrix and showed that with this robust optimization approach a more stable tracking is possible under partial occlusion of the template. An evaluation of different estimator functions has been carried out by Theobald et. al. [107].

Baker and Matthews [4] also used a weighting matrix for a more stable and more efficient minimization. Stability comes from only taking pixels with a high confidence into account, and efficiency results from the fact that computation costs can be reduced if only the most reliable pixels are selected.

They also present an iteratively re-weighted least squares algorithm for the inverse com-positional approach With the weighting matrix M(x) the problem can be expressed by

=^X

6.6. Improvements

where the matrix H is computed by H =^X

After every iteration the weighting matrix M(x) is estimated according to the current residual.

The drawback of this iteratively re-weighted approach is the loss of efficiency. If the weights are re-estimated in every iteration, the matrix H⁻¹ cannot be precomputed any more, but must also be calculated in every iteration.

Ishikawa et al. [48] avoid the re-computation of the whole Hessian by subdividing the image into a grid of blocks and pre-compute Hessian for every grid element. Blocks of outliers are determined and the sum is only calculated over valid block elements. This approach is more efficient, since the sum over every block element can be pre-computed.

The iteratively re-weighted least squares approach is only beneficial, if really large tem-plates are tracked, e. g. a whole face of a person as in [35]. For the purpose of tracking feature points for camera pose estimation, it is more advantageous to use smaller tem-plates, because many industrial scenarios seldom consist of large planar surfaces, and if patches are smaller, more features can be tracked with the same computational costs.

Therefore in our tracking system we use a larger set of small templates and reject a feature patch completely if the tracking has failed. For efficiency reasons we rather recomputed the weights only after a successful tracking step, because the inverse of theH-matrix only needs to be calculated once per frame and not in every iteration of the feature tracking step.

To achieve lighting invariance we integrate a weighting matrix into the illumination in-variant method of the inverse compositional approach. The term to minimize for the robust alignment can then be written as

M(x) [(λT(g(x;∆p)) +δ−I(g(x,p))]². (6.26)

The parameter update can be similarly computed as in Section 6.5.1. With the vector h(x) of equation (6.17) the new parameter vector q can be computed by

q= ^X

Our goal for a robust feature tracking is not to track a template under partial occlusion and extreme lighting variation, but to acquire a valid area of the patch, which is a stable representation of the patch, and to use only those areas for the alignment. If some areas of a template, for example, are not part of the planar surface, these pixels should always be regarded as outliers and not contribute to the template alignment.

With a given weighting matrixM(x), which assigns every pixel an influence value for the minimization result, the computation of the parameter vector increment is only slightly more expensive.

6. Detection and Tracking of Point Features

(a)

(b) (c)

(d) (e)

Figure 6.5.: Illustration of the mask generation. In (a) the scenario can be seen, (b) shows the currently extracted patch feature, (c) the incremental mean, (d) the variance and (e) the mask of the patch.

(a)

(b) (c)

(d) (e)

Figure 6.6.: Another example of the mask generation in an industrial scenario. Again the current patch (b), the incremental mean (c), the variance (d) and the mask of a patch (e) can be seen. The mask clearly distinguishes pixels from the planar surface from the background.

To simplify the computation we do not use a weighting matrix, but a binary mask to select the pixels, which are taken into account for the feature tracking. Pixels where the value of the binary mask is 0 are not regarded at all. The overall computational cost can therefore be decreased if a binary mask is used and many pixels are masked out.

A binary mask can also be used to integrate only over those areas of a patch which are located inside of an image. If a patch is extracted out of an image, it happens especially at higher pyramid levels that the template is not completely located inside the current image, and some parts do not contain any valid intensity information. By setting the mask values M(x) of these pixels to 0, only valid pixels are taken for the template tracking.

6.7. Camera Tracking Applications with Point Features

In document Efficient Line and Patch Feature Characterization and Management for Real-time Camera Tracking (sider 84-87)