• No results found

What the eye did not see–a fusion approach to image coding

Authors: Ali Alsam, Hans Jakob Rivertz, and Puneet Sharma.

Full title: What the eye did not see–a fusion approach to image coding.

Published in: ISVC 2012, Advances in Visual Computing, Lecture Notes in Computer Science (LNCS), Springer-Verlag Berlin Heidelberg.

What the eye did not see–a fusion approach to image coding

Ali Alsam, Hans Jakob Rivertz, and Puneet Sharma Department of Informatics & e-Learning(AITeL)

Sør-Trøndelag University College(HiST) Trondheim, Norway

email: er.puneetsharma@gmail.com

Abstract. The concentration of the cones and ganglion cells is much higher in the fovea than the rest of the retina. This non-uniform sampling results in a retinal image that is sharp at the fixation point, where a person is looking, and blurred away from it. This difference between the sampling rates at the different spatial locations presents us with the question of whether we can employ this biological characteristic to achieve better image compression. This can be achieved by compressing an image less at the fixation point and more away from it. It is, however, known that the vision system employs more that one fixation to look at a single scene which presents us with the problem of combining images pertaining to the same scene but exhibiting different spatial contrasts.

This article presents an algorithm to combine such a series of images by using image fusion in the gradient domain. The advantage of the algorithm is that unlike other algorithms that compress the image in the spatial domain our algorithm results in no artifacts. The algorithm is based on two steps, in the first we modify the gradients of an image based on a limited number of fixations and in the second we integrate the modified gradient. Results based on measured and predicted fixations verify our approach.

1 Introduction

From the very beginning of photography, cameras were designed and iteratively improved with the aim of mimicking the human visual system. From this per-spective, a camera is thought of as a machined eye–a device that is sensitive to il-lumination. Equally, we normally think of algorithms such as white-balancing [1], adaptation [2] and tone mapping [3] as being similar to the biological processes of the vision system.

A camera is of course not a human visual system. The two are different in a number ways some of which are relevant to the work presented in this article. Primarily, while digital camera manufacturers are striving to produce devices with progressively higher resolution, the human brain has evolved to be efficient, i.e. use less information to reach greater conclusions. Thus while the camera sensor has a uniform number of pixels per unit area, the human eye has

a much higher resolution in the fovea which is the center part of the retina [4]. It is well known that the fovea is responsible for our central, sharpest vision while the cone distribution in the rest of the retina results in blurred vision [4].

In the process of exploring a scene, the brain directs the eyes to different spatial locations. At those locations, known as fixations the eyes pause and gather the visual information [5]. Due to the concentration of photo-receptors at the fovea, we can think of each pause as the time taken to capture an image that is sharp at the fixation point and blurred away from it. Given that the average distribution per unit area and spatial location of the cones in the retina is known, it is possible to model the spatial contrast of the retinal image at each fixation.

For a given scene, the number of fixations and their locations vary. The question of whether fixations are guided by image features has been addressed extensively in vision research; and some conclusions are widely accepted. Specif-ically, experiments have shown that for a given image, people tend to look at the same regions [6, 7], they tend to look at the central part [8, 7] and that certain image attributes such as luminance and colour contrasts tend to attract fixa-tions [9, 10]. Furthermore, fixafixa-tions can be measured using eye trackers and the experimental data shows conclusively that for a general image the human visual system employs more than one fixation [6].

Based on a given digital image and a number of measured or predicted fixa-tions, we can model the foveation effect, i.e a sharp region at the fixation point and blurring away from it. The result of such a model would be a number of images with different spatial contrast. As an example, see figure 1 where we have modeled the foveation effect based on 3 different fixations. Given such an image series we might wonder how the vision system integrates the different foveation results into a seamless visual experience; and subsequently how we can design signal processing algorithms that offer such functionality.

In this article, we present an algorithm which integrates a number of differ-ently foveated images in the gradient domain. The algorithm starts by calculating the gradients of the input image. Having done that a number of fixation loca-tions are used to calculate the corresponding foveated gradients. Here we use the foveation function described by Geisler and Perry [11]. As a second step, the gradients are combined using the fast colour to gray algorithm by Alsam and Drew [12]. The Alsam and Drew algorithm [12] combines the gradients from n channels into a single gradient by arguing that the maximum horizontal and vertical differences over all the channels result in the maximum contrast. Thus the gradient fusion step is guaranteed to result in a gradient where the maxi-mum differences pertaining to the fixations locations are maintained. As a final step the resultant gradient is integrated using the modified Frankot-Chellappa-algorithm [13] proposed by Alsam and Rivertz [14].

The need for a fast algorithm to combine foveated images is best motivated in the image compression domain where improvements in statistically based image compression, i.e. methods that are based on data analysis have long slowed down. The use of human vision steered compression is seen by researchers as the

most promising path toward further improvements. In this regard, the algorithm presented in this article can be used as part of an image compression pipeline with very promising results. From our initial tests, we have noticed that the algorithm results in reduced storage requirements without the added artifacts associated with frequency based compressions in the wavelets domain.

Like other foveation driven algorithms, our method is dependent on accurate estimation of the fixation points. Thus in our experimental section, we present results based on measured fixation data as well as predictions based on the visual saliency algorithm by Itti et al. [15].

(a) Foveated image 1 (b) Foveated image 2 (c) Foveated image 3

Fig. 1.Figures show the foveated images for three fixations, here the fixation points are represented as red dots.

2 The filter and the integration.

Experiments for measuring the contrast sensitivity of the human eye have been carried out [16, 17]. Based on these experiments, the contrast threshold has been modeled through the function

CT(f, θ) =CT0 exp

α fθ+θ2 θ2 .

Here, f is the spatial frequency measured in degrees,θ is the retinal eccentric-ity.CT0 is the minimal contrast threshold,θ2 is the half-resolution eccentricity constant, andαis the spatial frequency decay constant. The values used in [18]

areα= 0.106,θ2= 2.3, andCT0= 1/64.

Given a normalized gray scale imagez0:Ω→[0,1]. Denote its width byw, measured in pixels. An observer views the image from a distanced, measured in

pixels. The maximal spatial frequency of the image is given byfd=4 arctanw w 2d. If ris the distance measured in pixels from a fixation point, thenθ(r) = arctanr

d. The gradient∇z0 is modified by setting the its magnitude to zero if its is less thanCT(fd, θ) for some of the fixation points.

We make a new contrast threshold function based onf=fdand the fixation points, (x1, y1),(x2, y2), . . . ,(xn, yn).

CT(x, y) = min(CT1(x, y), CT2(x, y), . . . , CTn(x, y)), whereCTk(x, y) =CT

fd, θ

(x−xk)2+ (y−yk)2

,k = 1,2, . . . , n. This step is equivalent to the Alsam and Drew method [12].

The direction of the original and modified gradients are ˆu = ∇z0/|∇z0|. The length of the new gradient is|∇z|=|∇z0|ifCT(x, y)<|∇z0|, otherwise

|∇z|= 0. We now reconstruct the contrast by using the integration method of Alsam and Rivertz [14] where we minimize the functional:

W(z) =λ

Ω

|z−z0|2 dx dy+

Ω

|zx−p|2+|zy−q|2 dx dy.

This minimization results in an image whose gradients are as close as possible to (p, q), under the constraint that the luminance is close to the original image.

The imagez in the Fourier domain can be taken as Z(u, v) =λZ0−i(uP+vQ)

λ+u2+v2 ,

whereP andQcorrespond to the Fourier transforms ofp, andq.

3 Results

To test the proposed method, we used images and corresponding fixations data from the study by Judd et al. [6]. The results for two images and the associated fixations are shown in figures 2 to 3. In the left column the foveated images for three fixations are shown. Here, the fixation points are represented as red dots. In agreement with the predicted results for the application of the contrast function by Wang and Bovik [18], we notice that the regions around the fixation points are sharper than the rest. The images in the right column show the original image, the result obtained by combining the foveated images using the proposed method, and the difference between the result and the original image. We notice that the result image is sharp in the regions corresponding to the three fixation points, we further notice that the image represents a good approximation of the original with greater differences in the parts that the observer deemed to be less salient. Here we remark that the difference between the original and the result can be optimized by controlling theλparameter defined in the previous section.

In figure 4, the left column contains the foveated images obtained by using the first three salient points from the visual saliency algorithm by Itti et al. [15]

and the right column contains the original image, the result obtained by using the proposed method, and the difference between the result and the original image. For this experiment, we notice that the results are very similar to those obtained for the first test image. We underline, however, that the choice of fixation locations and the number of salient regions is clearly related to the results that we obtain, where the higher the number of fixations and the more spread they are in the image plane the closer the result is going to resemble the original.

Finally, in figures 5(a) to 5(f), we show the bitrates obtained by saving the original image and corresponding result image in JPEG format with different quality values, ranging from 10 to 100 based on six different images. Here we no-tice that for the same compression quality the new images require lower storage space. Given that the foveation function reduces the high frequency elements of the original image, we can argue that this result is not surprising. The advan-tages of this approach are, however, more subtle than a simple removal of high frequency elements- we have removed high frequencies locally- in regions where the foveation function predicts that the observer couldn’t see with the sharp part of their vision.

4 Conclusion

This article presents an algorithm to combine a series of differently foveated images pertaining to an identical scene. This is achieved by using image fusion in the gradient domain. The advantage of the algorithm is that unlike other algorithms that compress the image in the spatial domain our algorithm results in no artifacts. The algorithm is based on two steps, in the first we modify the gradients of an image based on a limited number of fixations and in the second we integrate the modified gradient. Results based on measured and predicted fixations verify our approach. The need for a fast algorithm to combine foveated images is best motivated in the image compression domain where improvements in statistically based image compression, i.e. methods that are based on data analysis have long slowed down. The use of human vision steered compression is seen by researchers as the most promising path toward further improvements. In this regard, the algorithm presented in this article can be used as part of an image compression pipeline with very promising results. From our initial tests, we have noticed that the algorithm results in reduced storage requirements without the added artifacts associated with frequency based compressions in the wavelets domain.

References

1. Chikane, V., Fuh, C.S.: Automatic white balance for digital still cameras. Journal Of Information Science and Engineering22(2006) 497–509

2. Hurley, J.B.: Shedding light on adaptation. Journal of General Physiology119 (2002) 125–128

3. Qiu, G., Guan, J., Duan, J., Chen, M.: Tone mapping for hdr image using opti-mization a new closed form solution. In: ICPR 2006. 18th International Conference on Pattern Recognition. Volume 1. (2006) 996 –999

4. Cormack, L.K.: Computational models of early human vision. In: Handbook of Image and Video Processing. Elsevier Academic Press (2005) 325–345

5. Rajashekar, U., van der Linde, I., Bovik, A.C., Cormack, L.K.: Gaffe: A gaze-attentive fixation finding engine. IEEE Transactions on Image Processing17(2008) 564 –573

6. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: International Conference on Computer Vision (ICCV). (2009)

7. Alsam, A., Sharma, P.: Analysis of eye fixations data. In: Proceedings of the IASTED International Conference, Signal and Image Processing (SIP 2011). (2011) 342–349

8. Tatler, B.W.: The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions.

Journal of Vision7(2007) 1–17

9. Itti, L., Koch, C.: Computational modelling of visual attention. Nature Reviews Neuroscience2(2001) 194–203

10. Meur, O.L., Callet, P.L., Barba, D., Thoreau, D.: A coherent computational ap-proach to model bottom-up visual attention. IEEE Transactions on Pattern Anal-ysis and Machine Intelligence28(2006) 802–817

11. Geisler, W.S., Perry, J.S.: A real-time foveated multiresolution system for low-bandwidth video communication. In: SPIE Proceedings. Volume 3299. (1998) 1–13 12. Alsam, A., Drew, M.S.: Fast colour2grey. In: 16th Color Imaging Conference:

Color, Science, Systems and Applications, Society for Imaging Science & Technol-ogy (IS&T)/Society for Information Display (SID) joint conference. (2008) 342–346 13. Frankot, R.T., Chellappa, R.: A method for enforcing integrability in shape from shading algorithms. IEEE Transactions on Pattern Aanalysis and Machine Intel-ligence10(1988) 439–451

14. Alsam, A., Rivertz, H.J.: Constrained gradient integration for improved image contrast. In: Proceedings of the IASTED International Conference, Signal and Image Processing (SIP 2011). (2011) 13–18

15. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(1998) 1254–1259

16. Banks, M., Sekuler, A., Anderson, S.: Peripheral spatial vision: limits imposed by optics, photoreceptors, and receptor pooling. J. Opt. Soc. Am. A 8 (1991) 1775–1787

17. Arnow, T.L., Geisler, W.S.: Visual detection following retinal damage: Predic-tions of an inhomogeneous retino-cortical model. In: Human Vision and Electronic Imaging. Proceedings of SPIE. Volume 2674. (1996)

18. Wang, Z., Bovik, A.C.: Embedded foveation image coding. IEEE Transactions on Image Processing10(2001) 1397–1410

(a) Foveated image 1

(b) Foveated image 2

(c) Foveated image 3

(d) Original image

(e) Result

(f) Difference

Fig. 2. In the left column the foveated images for three fixations are shown. Here, the fixation points are represented as red dots. The images in the right column show the original image, the result obtained by combining the foveated images using the proposed method, and the difference between the result and the original image. We notice that the result image is sharp in the regions corresponding to the three fixation points, we further notice that the image represents a good approximation of the original with greater differences in the parts that the observer deemed to be less salient. In the difference image, the dark regions indicate the locations where the differences are higher.

(a) Foveated image 1

(b) Foveated image 2

(c) Foveated image 3

(d) Original image

(e) Result

(f) Difference

Fig. 3. In the left column the foveated images for three fixations are shown. Here, the fixation points are represented as red dots. The images in the right column show the original image, the result obtained by combining the foveated images using the proposed method, and the difference between the result and the original image. We notice that the result image is sharp in the regions corresponding to the three fixation points, we further notice that the image represents a good approximation of the original with greater differences in the parts that the observer deemed to be less salient. In the difference image, the dark regions indicate the locations where the differences are higher.

(a) Foveated image 1

(b) Foveated image 2

(c) Foveated image 3

(d) Original image

(e) Result

(f) Difference

Fig. 4. In the left column the foveated images obtained by using first three salient points from the visual saliency algorithm by Itti et al. [15] are shown. Here, the fix-ation points are represented as red dots. The images in the right column show the original image, the result obtained by combining the foveated images using the pro-posed method, and the difference between the result and the original image. We notice that the result image is sharp in the regions corresponding to the three fixation points, we further notice that the image represents a good approximation of the original with greater differences in the parts that the observer deemed to be less salient. In the dif-ference image, the dark regions indicate the locations where the differences are higher.

(a) image 1 (b) image 2

(c) image 3 (d) image 4

(e) image 5 (f) image 6

Fig. 5.Figures show the bitrates for saving the original image and corresponding result image in JPEG format with different quality values, ranging from 10 to 100 based on six different images. Here we notice that for the same compression quality the new images require lower storage space.

A.8 What the eye did not see–a fusion approach to image coding (extended)

Authors: Ali Alsam, Hans Jakob Rivertz, and Puneet Sharma.

Full title: What the eye did not see–a fusion approach to image coding.

Published in: International Journal on Artificial Intelligence Tools.

International Journal on Artificial Intelligence Tools Vol. 22, No. 6 (2013) 1360014 (13 pages)

c World Scientific Publishing Company DOI: 10.1142/S0218213013600142

WHAT THE EYE DID NOT SEE A FUSION APPROACH TO IMAGE CODING

ALI ALSAM, HANS JAKOB RIVERTZ and PUNEET SHARMA Department of Informatics & e-Learning(AITeL)

Sør-Trøndelag University College(HiST) Trondheim, Norway

er.puneetsharma@gmail.com

Received 15 January 2013 Accepted 14 July 2013 Published 20 December 2013

The concentration of the cones and ganglion cells is much higher in the fovea than the rest of the retina. This non-uniform sampling results in a retinal image that is sharp at the fixation point, where a person is looking, and blurred away from it. This difference between the sampling rates at the different spatial locations presents us with the question of whether we can employ this biological characteristic to achieve better image compression. This can be achieved by compressing an image less at the fixation point and more away from it. It is, however, known that the vision system employs more that one fixation to look at a single scene which presents us with the problem of combining images pertaining to the same scene but exhibiting different spatial contrasts.

This article presents an algorithm to combine such a series of images by using image fusion in the gradient domain. The advantage of the algorithm is that unlike other algorithms that compress the image in the spatial domain our algorithm results in no artifacts. The algorithm is based on two steps, in the first we modify the gradients of an

This article presents an algorithm to combine such a series of images by using image fusion in the gradient domain. The advantage of the algorithm is that unlike other algorithms that compress the image in the spatial domain our algorithm results in no artifacts. The algorithm is based on two steps, in the first we modify the gradients of an