Improving the performance of a single classifier

Chapter 3 Face Detection using an a contrario approach

3.3 Improving the performance of a single classifier

In the preceding Section we have shown, empirically, that the distribution of detection values for a single strong classifier tends to a Gaussian law when the number of features used by the classifier is large. Moreover, the parameters of this law (its mean and standard deviation) are different for each image. This empirical observation contradicts the way the detection threshold is chosen in the classical Viola–Jones detection scheme: the same threshold, fixed in the learning stage and computed with (3.3), is used for all the images. Note that this fixed threshold is optimal globally, though a local adjustment could improve the detector’s performance. What we propose is to adapt the threshold to the particular distribution of detection values associated to each image.

Before detailing the method to adaptively select the detection threshold let us remark that the true positives of the detection process (i.e., the subimages containing the actual faces to be

Chapter 3. Face Detection using an a contrario approah | 41

detected) have, in general, a very high detection value. This is to be expected provided that the classifier is discriminant enough (i.e., it is formed by a large number of weak classifiers).

Figures 3.4 and 3.6 display the histograms of detection values for two images containing faces and for classifiers with 200 features and 80 features, respectively. The red dots indicate the detection values for the faces in the image. Observe that they are located in the far right end of the distribution. Moreover, in Figure 3.6 the position of the default detection threshold T computed with formula (3.3) is also displayed. It is clear from this figure that the use of the default detection threshold would produce a large number of false positives. We describe in the following paragraphs a method which permits us to reduce the number of false positives of a single classifier by computing a detection threshold adapted to the distribution of detection values.

Figure 3.5. Evolution of the values of r_K (3.7) (average over all the images in the FDDB dataset) for increasing values of K (5, 10, 20, 40, 80, 200).

Following the a contrario detection principle we test the presence of a face in a subwindow against a noise or a contrario model where the face is not present. This is equivalent to performing the following hypothesis test:

H0 (null hypothesis): the subimage does not contain a face H1 (alternative hypothesis): the subimage contains a face

The acceptation/rejection of H0 depends on a rejection threshold θ and the level of significance α of the test is defined as

𝛼 = 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻₀|𝐻₀ 𝑖𝑠 𝑡𝑟𝑢𝑒) = 𝑃(𝜗_𝑑𝑒𝑡> 𝜃|𝐻₀ 𝑖𝑠 𝑡𝑟𝑢𝑒) = = 𝑃 (

𝑎𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝑠𝑢𝑏𝑖𝑚𝑎𝑔𝑒 𝑎𝑠 𝑓𝑎𝑐𝑒 |

𝑡ℎ𝑒 𝑠𝑢𝑏𝑖𝑚𝑎𝑔𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛

𝑎 𝑓𝑎𝑐𝑒 ) = 𝑃(𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒) where 𝝑_𝒅𝒆𝒕 is the detection value associated to the subimage, computed from (3.2).

Figure 3.6. Input image and its histogram of detection values for an 80-features classifier. The red dots indicate the detection values for the subwindows actually containing a face. T is the default detection threshold of the classifier.

By applying the observations from the previous Section, we may assume a Gaussian distribution of the detection values for the null hypothesis (i.e., the distribution of detection values for the nonfaces subwindows is Gaussian). This allows us to compute the level of significance in closed form. The mean µ and standard deviation σ of this Gaussian can be estimated from the empirical values of the histogram. We are assuming here that just a small fraction of the subwindows in any image, if any, corresponds to actual faces. Therefore, the actual distribution of detection values for the whole image corresponds, roughly, to the distribution of values under the null hypothesis.

We first write the rejection threshold θ as a function of µ and σ: θ = θs = µ + sσ, where s is a parameter. Then α can be expressed in terms of s:

𝛼 = 𝑃(𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)

= 𝑃(𝜗_𝑑𝑒𝑡> 𝜃_𝑠|𝐻₀∼ 𝑁(𝜇, 𝜎²)) =1

2𝑒𝑟𝑓𝑐 (𝜃_𝑠− 𝜇

√2𝜎 ) =1

2𝑒𝑟𝑓𝑐 ( 𝑠

√2) (3.8)

where N denotes the Gaussian probability density function and erfc is the complementary error function. Note that θs is an adaptive threshold, since it depends on the detection statistics (µ and σ) of the input image.

Chapter 3. Face Detection using an a contrario approah | 43

3.3.1 Setting the detection threshold of the detector.

Equation (3.8) permits us to control the probability of false positives of a strong classifier.

Table 3.1 displays this probability for the histogram in Figure 3.2, bottom right, and for

Table 3.1. Probability of false positives computed with (3.8) for the histogram in Figure 3.2, bottom right, and for different values of the parameter s. The probability of a false positive for the default detection threshold is also displayed.

A question that arises is which optimum value of the parameter s guarantees such a low value of probability that no false positives are observed in the image. To answer this question we need first to establish the relation between the probability of false positives and the actual number of observed false positives. This relation is straightforward: if the number of tested subwindows in the image is N, then the expected number of false positives, NFP, can be computed as

NFP = N × P (False positive). (3.9)

Table 3.2 displays the expected number of false positives for different values of the detection threshold for the image in Figure 3.2, top right, and a 200-features classifier. For this example N

= 5170933 and the values of P (False positive) are the ones in Table 3.1. The observed number of false positives is also shown. Note the similarity between the expected and actual values, which confirms the validity of the Gaussian model in this case.

A criterion for the selection of the detection threshold is to compute the value of θs that guarantees a value of NFP below some predefined upper bound NFPmax. Combining (3.9) and (3.8) we obtain the value of the detection threshold as

𝜃 = 𝜇 + √2𝑒𝑟𝑓𝑐⁻¹(2

𝑁𝑁𝐹𝑃_𝑚𝑎𝑥) 𝜎 (3.10)

Figure 3.7 shows the result of applying this criterion to the image in Figure 3.6, left, using a detector of 200 features, for increasing values of NFPmax (1, 5, 10). Note that in the figure no postprocessing was used to display the results, and all the subwindows above the estimated detection threshold are displayed. In the next Section we shall discuss how to group together similar detections and display a single detection rectangle per face. As expected, when NFPmax

is set to 1 no false positives are detected, but some faces are missed by the detector. As NFPmax

increases more false positives appear but also more faces are detected.

Θ NFP (estimated) NFP (observed)

T (default, equation (3.3)) 12416.35 12101

θ_4.0 = µ + 4σ 163.76 183

θ_5.0 = µ + 5σ 1.48 2

Table 3.2. Estimated and observed number of false positives for a 200-features classifier applied to the image in Figure 3.2, top right.

Figure 3.7. From left to right and from top to bottom: detections with NFPmax 1, 5, and 10, using a 200 features detector.

In document Facial detection and expression recognition applied to social robots (sider 40-44)