Paper 11: Estimating the depth uncertainty in three-dimensional

5.11.1 Synopsis

Visual interaction in three-dimensional virtual space can be achieved by esti-mating objects depth from the ﬁxations of the left and right eyes. The current depth estimation methods, however do not account for the presence of noise in the data. To address this problem we note that any measured ﬁxation point is a member of a statistical distribution deﬁned by the level of noise in the measure-ment. We thus propose a new numerical method that provides a range of depth values based on the uncertainty in the measured data. The main contribution of this paper is a new method to estimate the depth uncertainty in a virtual environment. This is explicitly linked to the research contributionC5.

Chapter 6 Discussion

This chapter concludes the dissertation with an overview of the results obtained from the research papers, and the main research direction for future work.

6.1 Validating the visual saliency model

0 0.2 0.4 0.6 0.8 1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Saliency value

Probability

fixated non fixated

(a) Probability histogram

0 0.2 0.4 0.6 0.8 1

0 20 40 60 80 100

Saliency value

Relative probability

fixated non fixated

(b) Relative probabilities

Figure 6.1: Probability histograms and relative probabilities for the ﬁxated and non-ﬁxated regions for an average observer. X-axis shows the saliency values obtained by using the visual saliency algorithm (Itti, Koch, & Niebur, 1998).

In papers 4 and 6, we performed an experiment using linear discrimina-tion analysis to try to separate between the saliency values obtained from the model by (Itti, Koch, & Niebur, 1998) for locations that received ﬁxations and those that received no ﬁxations. The data was based on a subset of the images and corresponding ﬁxations obtained by (Judd, Ehinger, Durand, & Torralba, 2009), where we used 200 landscape images and all the ﬁfteen observers. In the experiment, we deﬁned a ﬁxated area as a square region of dimensions 100 by 100 pixels where the center was located at the ﬁxation point. Non-ﬁxated areas were chosen randomly from parts of the image that had a region of a 100

by 100 pixels without any ﬁxations By collecting the values returned by the saliency algorithm local to those regions into two matrices we were able to use discrimination analysis to determine whether the data of the two matrices is separable. In ﬁgure 6.1(a), we show the probability histograms of the ﬁxated and non-ﬁxated regions for all the observers. Here, the histogram is normalized such that the area under the curve is one. We note that the separation between the two sets is not ideal but rather we ﬁnd a considerable overlap between the two histograms speciﬁcally in the middle range. We further note that there is a clear separation between the two sets for regions of the images that received no ﬁxations indicating that the method is good at predicting non-salient regions of the images. At a value of 0.3 the classiﬁcation of the two sets is random.

To gain better insight into the ability of the algorithm to separate the image regions into ﬁxated and non-ﬁxated, we plotted the relative probabilities of the histograms. For the non-ﬁxated histogram, the relative probabilities were ob-tained by dividing the area under the non-ﬁxated probability histogram curve of a speciﬁc bin i of the histogram by the area under the ﬁxated histogram curve for the same bin. For the relative probability of the ﬁxated histogram the reciprocal value was calculated. This curve is plotted in ﬁgure 6.1(b) where we observe that for low salience values the separation of non-ﬁxated regions is ideal and that the extent of the separation declines to a level that is random.

We also note that the separation of the highly salient regions, is nearly ideal.

Based on this we can conclude that the saliency algorithm by (Itti, Koch, &

Niebur, 1998) is good in predicting non-salient and highly salient regions but its performance drops in the middle range.

6.2 Proposed group based asymmetry algorithm

In papers 5 and 6, we set about unifying the mathematical description of saliency in a single metric. Based on the knowledge gained from research in image processing where it has been shown that the dihedral groupD4can be used to encode edges and contrast which are the main current descriptions of saliency, we chose to devise an algorithm that represents the level of saliency in an image region by virtue of the transformations of D4. In our experiment, we used a receiver operating characteristic (ROC) curve to compare the performance of the proposed method with that of (Itti, Koch, & Niebur, 1998). For the analysis, we used ﬁxations data from 200 images and ﬁfteen observers. We found that the proposed group based asymmetry (GBA) algorithm results in an AUC value of 0.81 which is better than that achieved with the visual saliency algorithm by (Itti, Koch, & Niebur, 1998) which gives AUC of 0.77. Based on the results, we conclude that the transformations pertaining to the dihedral group D4 are a good metric to estimate salient image regions. In ﬁgure 6.2, we oﬀer a visual comparison between the two algorithms, we show the ﬁxations map, and the saliency maps obtained from the proposed GBA algorithm and the visual saliency algorithm by (Itti, Koch, & Niebur, 1998) for an example image. We can see that the maps from both the algorithms are quite similar.

In fact both of them return the region containing the boat at the center as salient, which is also in agreement with the ﬁxations map. The performance of the proposed GBA algorithm is compared with other state-of-the-art saliency models in the next section.

(a) Image from the database by (Judd, Ehinger, Durand, & Torralba, 2009)

(b) Fixations Map

Niebur, 1998)

(d) Group based Asymmetry Map(GBA)

Figure 6.2: Comparison of visual saliency algorithms, both algorithms return the region containing the boat at the center as salient, which is also in agreement with the ﬁxations map obtained from the eye ﬁxations data.

6.3 Proposed robust metric for the evaluation

In document Towards three-dimensional visual saliency (sider 55-59)