Bayesian patch-based methods - Multiscale noise estimation and removal for digital images

Bayesian patch-based methods give an optimal formulation under the assumption that the patches similar to a given image patch follow a stochastic model. Given a noiseless patch P of the ideal noiseless image u, and Gaussian model and assuming a Gaussian noise model with inde-pendence of pixel values, the probabilityP( ˜P|P)(the probability of observing a noisy patchP˜ with the prior information that the noiseless version of the noisy patch is P) can be obtained directly.

Thanks to Bayes' formula, it is possible to obtain P(P|P)˜ , the probability of the noiseless patchP knowing that the noisy patch P˜ has been observed. The goal is to deduce P from P˜ by nding P which maximizesP(P|P)˜ . This chapter concludes with a closed formula that gives a restored patch Pˆ₁ from an observed noisy patchP˜, using the covariance matrix CP˜ ofP˜.

1. Obtaining a restored patch Pˆ1 from an observed noisy patch P˜

In this chapter, we will review the Bayesian patch-based method applied to denoising, under the additive white Gaussian noise model. The Bayesian principle is coupled with a Gaussian (or a mixture of Gaussians) model for noiseless patches. Given uthe noiseless ideal image andu˜ the noisy image corrupted with Gaussian noise of standard deviationσso that

(5) u˜=u+n,

the conditional distribution P(˜u|u)is

(6) P(˜u|u) = 1

(2πσ²)^M² exp−||u−u||˜ ² 2σ²

whereM is the total number of pixels in the image.

In order to compute the probability of the original image given the degraded one, P(u| ˜u), we need to introduce a prior on u. In the rst models [30], this prior was a parametric image model describing the stochastic behavior of a patch around each pixel by a Markov random eld, specied by its Gibbs distribution. A Gibbs distribution for an image utakes the form

P(u) = 1

Zexp(−E(u)/T),

whereZ andT are constants andE is called the energy function and writes E(u) = X

C∈C

VC(u),

131

where C denotes the set of cliques associated to the image and V_C is a potential function. The maximization of the a posteriori distribution writes by Bayes formula

Arg max

u P(u|u) = Arg max˜

u P(˜u|u)P(u), which is equivalent to the minimization of−logP(u|u)˜ ,

Arg min

ku−uk˜ ²+2σ² T E(u).

This energy writes as a sum of local derivatives of pixels in the image, thus being equivalent to a classic Tikhono regularization [30, 88].

Recent Bayesian methods have abandoned as too simplistic the global patch models formulated by an a priori Gibbs energy. Instead, the methods build local non parametric patch models learnt from the image itself, usually as a local Gaussian model around each given patch, or as a Gaussian mixture. The term patch model is now preferred to the terms neighborhood or clique previously used for the Markov eld methods. In the nonparametric models, the patches are larger, usually 8×8, while the cliques were often conned to3×3 neighborhoods. Given a noiseless patchP ofuwith dimensionκ×κ, andP˜an observed noisy version ofP, the same model gives by the independence of noise pixel values

(7) P( ˜P|P) =c·exp −kP˜−Pk² 2σ²

whereP andP˜ are considered as vectors withκ²components,||P||denotes the Euclidean norm of P,σ² the variance of the Gaussian noise, andcis the normalizing constant. KnowingP˜, our goal is to deduce P by maximizingP(P|P˜). Using Bayes' rule, we can compute this last conditional probability as

(8) P(P|P) =˜ P( ˜P|P)P(P)

P( ˜P) .

P˜ being observed, this formula can in principle be used to deduce the patch P maximizing the right term, viewed as a function of P. This is only possible if we have a probability model for P, and these models will be generally learnt from the image itself, or from a set of images. For example [89] applies a clustering method to the set of patches of a given image, and [90] applies it to a huge set of patches extracted from many images. Each cluster of patches is thereafter treated as a set of Gaussian samples. This permits to associate to each observed patch its likeliest cluster, and then to denoise it by a Bayesian estimation in this cluster. Another still more direct way to build a model for a given patch P˜ is to group the patches similar to P˜ in the image. Assuming that these similar patches are samples of a Gaussian vector yields a standard Bayesian restoration [47, 91]. We shall now discuss this particular case, where all observed patches are noisy.

Why Gaussian? As usual when we dispose of several observations but of no particular guess on the form of the probability density, a Gaussian model is adopted. In the case of the patchesQ similar to a given patch P, the Gaussian model has some pertinence, as it is assumed that many contingent random factors explain the dierence between Q andP: other details, texture, slight

lighting changes, shadows, etc. The Gaussian model in presence of a combination of many such random and independent factors is heuristically justied by the central limit theorem. Thus, for good or bad, assume that the patches Qsimilar to P follow a Gaussian model with (observable, empirical) covariance matrix CP and (observable, empirical) meanP. This means that

(9) P(Q) =c·exp

−(Q−P)^tC⁻¹_P (Q−P) 2

From (6) and (8) we obtain for each observedP˜ the following equivalence of problems:

max

This expression does not yield an algorithm. Indeed, the noiseless patchP and the patches similar toP are not observable. Nevertheless, we can observe the noisy versionP˜and compute the patches Q˜similar toP˜. An empirical covariance matrix can therefore be obtained for the patchesQ˜ similar to P˜. Furthermore, using (5) and the fact thatP and the noise nare independent,

(10) CP˜=CP +σ²I; EQ˜=P .

Notice that these relations assume that we searched for patches similar toP˜at a large enough distance, to include all patches similar to P, but not too large either, because otherwise it can contain outliers. Thus the safe strategy is to search similar patches in a distance slightly larger than the expected distance caused by noise. If the above estimates are correct, our MAP (maximum a posteriori estimation) problem nally boils down by (10) to the following feasible minimization problem:

Dierentiating this quadratic function with respect to P and equating to zero yields P−P˜+σ²(CP˜−σ²I)⁻¹(P−P) = 0.˜

Thus we have proved that a restored patchPˆ₁can be obtained from the observed patchP˜ by the one step estimation

(11) Pˆ1= ˜P+CP˜−σ²IC⁻¹P˜ ( ˜P−P˜), which resembles a local Wiener lter.

Remark 1. It is easily deduced that the expected estimation error is E||P−Pˆ1||²=T r

C⁻¹P + I σ²

−1# .

Chapter 7 will examine two deriving patch-based denoising algorithms from variants of (11).

The rst question when looking at this formula is obviously how the matrix CP˜can be learnt from the image itself. Each method proposes a dierent notion to learn the patch model.

Of course, other, non Gaussian, Bayesian models are possible, depending on the patch density assumption. For example [11] assumes a local exponential density model for the noisy data, and gives a convergence proof to the optimal (Bayes) least squares estimator as the amount of data increases.

In document Multiscale noise estimation and removal for digital images (sider 131-135)