Research questions and contributions - Towards three-dimensional visual saliency

Based on the discussion in the previous section, the main objectives of this thesis can be summarized in the form of ﬁve main research questions as:

R1 Is the classic visual saliency algorithm by (Itti, Koch, & Niebur, 1998) a good classiﬁer for salient and non-salient image regions?

R2 Can salient image regions be calculated in a novel way? How can we calculate saliency for a three-dimensional scene?

R3 How can we perform a meaningful statistical analysis of the ﬁxations data from diﬀerent images and observers? Can we use the statistical informa-tion obtained from the analysis to create a robust metric for judging the performance of the saliency models?

R4 How can we use the salient image locations to design an algorithm that compresses an image such that the compressed image is nearly identical to the original?

R5 How can we estimate depth from the ﬁxations in a three-dimensional vir-tual scene? How is the depth information useful in the context of three-dimensional visual saliency?

The relations between the research questions and research papers are shown in table 3.1.

Table 3.1: Relations between research papers and research questions.

R1 R2 R3 R4 R5

Paper 1 •

Paper 2 •

Paper 3 •

Paper 4 •

Paper 5 •

Paper 6 • •

Paper 7 •

Paper 8 •

Paper 9 •

Paper 10 •

Paper 11 •

The major contributions of this research eﬀort are:

C1 A novel method to inspect the performance of the classic visual saliency algorithm by (Itti, Koch, & Niebur, 1998) in separating the salient and non-salient image regions.

C2 A visual saliency model that calculates salient image regions in a novel way, i.e., by using the transformations pertaining to the dihedral group D₄. The proposed model performs better than the saliency model by (Itti, Koch, & Niebur, 1998), and it is among one of the four best models in the literature. In addition, the proposed model can be extended to calculate saliency in three-dimensional virtual scenes.

C3 A new method for the statistical analysis of the eye ﬁxations data from diﬀerent images and diﬀerent observers. Based on the analysis, a new robust metric is proposed that can be used for the evaluation of the visual saliency algorithms.

C4 A novel algorithm that compresses an image based on the salient locations predicted by the visual saliency algorithm. The compressed images do not exhibit visual artifacts and they appear to be very similar to the originals.

C5 A new method for estimating the depth in a three-dimensional virtual scene by using the ﬁxations from both eyes. As a part of future work, we intend to use the depth information obtained by showing the observer a virtual scene to create a three-dimensional ﬁxations map, which can be used as the ground truth for the evaluation of three-dimensional saliency algorithms.

The research contributionsC1toC5and the research questionsR1toR5 have one-to-one correspondence.

Chapter 4 Research methodology

For the research work in this thesis, we employed the methodology of design science research. According to (Iivari, 2007), design science research has been practiced in the disciplines such as Computer Science, Software Engineering and Information Technology for decades without explicitly naming it. Studies by (Iivari, 2007; Hevner & Chatterjee, 2010) suggest that by using the method-ology of design science research computer scientists have developed new ar-chitectures for computers, new programming languages, new compilers, new algorithms, new data and ﬁle structures, new data models, new database man-agement systems, and more.

Design science research as discussed by (Hevner, March, Park, & Ram, 2004), consists of creating novel artifacts, i.e., something new that does not exits in nature, and using it to understand a natural or man-made phenomenon (Vaish-navi & Kuechler, 2004). In this way, it is quite useful for vision studies, where new algorithms or statistical methods are frequently used to analyze diﬀerent aspects of human vision. In the general methodology of design science research as shown in ﬁgure 4.1, the process begins with theAwareness of Problem and terminates withConclusion. We discuss the various steps of the design science research methodology and how they were used for this research.

Awareness of problem

The ﬁrst step in this process is the awareness of an interesting problem in the given ﬁeld. This can come from developing an understanding of the relevant ﬁeld by using sources such as scientiﬁc literature or new industrial developments.

The output of this stage is a proposal for a new research project, and in our case this PhD project.

Suggestion

In this step, to analyze the problem and provide possible solutions, either new methods are created or methods are employed from existing literature in a new way. Based on the employed methods, a tentative design is suggested. In this thesis, the formulation of the research questions associated with all the papers and the proposed methods suggested to investigate them, constituted this step.

Awareness of Problem

Suggestion

Development

Evaluation

Conclusion

Proposal

Tentative Design

Artifact

Performance Measures

Results Knowledge

Flows

Process

Steps Outputs

Circumscription

Figure 4.1: The general research methodology of design science research, from (Vaishnavi & Kuechler, 2004).

Development

During the development step, the tentative design is evolved to completion. This is achieved by using techniques relevant to the construction of an artifact. In our case, the technique used was algorithm development and a number of algorithms were developed using Matlab/C++ to answer the research questions.

Evaluation

The artifact created from the previous step is expected to behave in a cer-tain way. In this step, the deviation from the expected behavior is measured using quantitative or qualitative methods and the results are analyzed to con-ﬁrm or contradict the hypothesis. In case the initial hypothesis is too broad, the knowledge gained here is fed back to the ﬁrst step as depicted by circum-scription arrow, such that the hypothesis is modiﬁed based on an improved understanding of the problem. In this thesis, the algorithms developed were evaluated using well known methods from the literature, for example, linear

discrimination analysis, singular value decomposition, and receiver operating characteristic curve. As shown in table 4.1,Paper 1to Paper 8were eval-uated using a publicly available dataset by (Judd et al., 2009). Paper 9 to Paper 11were evaluated by recording the data from the eye tracking exper-iments performed at Sør-Trøndelag University College (HiST). Five observers took part in the experiments.

Research papers Eye tracking data Paper 1

publicly available dataset by (Judd et al., 2009) Paper 2

Paper 3 Paper 4 Paper 5 Paper 6 Paper 7 Paper 8 Paper 9

eye tracking experiments performed at HiST Paper 10

Paper 11

Table 4.1: Research papers and eye tracking datasets.

Conclusion

This is the ﬁnal step of the research eﬀort. Even though the results obtained might still stray from the expected behavior, but they are considered good enough for improving the understanding of the problem. The knowledge gained here is expected to contribute towards future research projects. In our case, this is highlighted by the contributions from the research papers and this thesis.

Chapter 5 Summaries of research papers

5.1 Paper 1: Analysis of eye ﬁxations data

5.1.1 Synopsis

In this paper, we analyzed eye ﬁxations data obtained from 15 observers and 1003 images. When studying the correlation matrix constructed based on the ﬁxations data of one observer viewing all images, it was observed that 23 percent of the data can be accounted for by one eigenvector. This ﬁnding implies a repeated viewing pattern that is independent of image content. The examination of this pattern revealed that it was highly correlated with the center region of the image. Next, we analyzed the correlation matrix based on the ﬁxations data across diﬀerent observers viewing the same image. We found a higher agreement across diﬀerent observers than across diﬀerent images with a single observer. The agreement between diﬀerent observers suggested that part of the viewing mechanism is indeed image dependent. We looked at the images that showed large correspondence between observers that comes from image features. From the results, we observed that the images with clear top-down features such as faces, people, and text ranked higher in correspondence between observers. Images that were more complex, ranked lower in correspondence between viewers. This analysis suggested that there was a stronger agreement on images with so-called top-down features and a weaker agreement on complex images such as landscapes, buildings, and street views. The main contribution of this paper can be outlined as a new method to perform the statistical analysis of the ﬁxations data. This is strongly linked to the ﬁrst part of the research contributionC3.

5.2 Paper 2: A robust metric for the evaluation

In document Towards three-dimensional visual saliency (sider 45-51)