A comparison of the local Gaussian
correlation and the local dependence function
Christopher James Brokstad June 2022
Master’s thesis in Actuarial Science, Department of Mathematics,
University of Bergen
.
Abstract
Correlation is a method to measure the relation between two or more vari- ables. In this thesis, a method of measuring correlation and a method of measuring dependence are used. These two methods, are the local Gaussian correlation and the local dependence function. The goal was to build a bridge between these two measurements. The hypothesis is that for a bivariate nor- mal density, both methods will locally approximate the densities correlation coefficient. The local dependence function is a measure of dependence and it has to be transformed to give the function of the correlation. However, there was not a clear connection between the two methods. Instead of the local dependence function, the precision matrix was utilised. The precision matrix provides an opportunity to find the correlation coefficient locally for the bivariate normal density. Thus, a bridge between the local Gaussian cor- relation and the correlation estimate from the precision matrix can be built.
To obtain the correlation estimate from the precision matrix, it has to be transformed to its inverse, the covariance matrix. However, while a connec- tion was made, some details remain unclear. This is observed, with the local Gaussian correlation being defined for the range of the density. While the estimated correlation for the precision matrix has areas that are undefined for certain densities. The function of the estimated correlation from the pre- cision matrix is discussed, to explain why some areas are undefined and why the estimate takes the form it does for different densities. The two methods also produced differing correlation estimates for given areas. In this thesis, the same set of test case densities are used, as the ones in the introductory paper for the local dependence function (Jones, 1996), and its preliminary paper (Doksum et al.,1994). As the local Gaussian correlation is an empiri- cal method of measuring correlation, it needs data. For this thesis there is no observed data, therefore instead simulated data are used in the analyses. The correlation estimate from the precision matrix, on the other hand requires the densities to be known. To further explore the precision matrix’s correlation estimate, two novel methods are explored; the Box-Cox transformation and the Gaussian kernel estimates, but they require further work. In conclusion, while a bridge is constructed between the local Gaussian correlation and the precision matrix’s correlation estimate, more work is needed to establish a clearer connection.
i
.
Acknowledgments
I would like to thank my supervisor Professor Hans J. Skaug for being patient, interesting discussion and help guiding me throughout this Master’s thesis.
I would like to thank senior consultant Kristine Lysnes for answering my questions related and unrelated to the master’s project. I would also like to thank my friends and family for their support during the writing of this thesis.
ii
.
Table of Contents
1 Introduction 1
2 Correlation 3
3 Local Gaussian Correlation 7
4 Local Dependence Function 10
5 Experiments 11
6 Variance 23
7 Results and Discussion 26
7.1 Pear . . . 26
7.2 Twisted Pear . . . 30
7.3 Cauchy . . . 31
7.4 Transformed Normal Density . . . 33
8 Box-Cox Transformation 34 9 Kernel Smoother 40 10 Kernel Estimates 44 10.1 Contours . . . 44
10.2 Double Derivatives . . . 49 11 Conclusions and Future Perspective 58
Appendices 65
Appendix A Example of the Precision matrix’s Correlation
Function 65
Appendix B Examples of C++ Code 68
Appendix C R-code 70
iii
.
List of Figures
1 Simulated datapoints from the function Y = sin(3x) + . . . . 4
2 The Pear density, simulated observations, precision matrix’s correlation estimate and the local Gaussian correlation map. . 16
3 The twisted Pear density, simulated observations, precision matrix’s correlation estimate and the local Gaussian correla- tion map. . . 18
4 The Cauchy density, simulated observations, precision ma- trix’s correlation estimate and the local Gaussian correlation map. . . 20
5 The transformed normal density, simulated observations, pre- cision matrix’s correlation estimate and the local Gaussian correlation map. . . 22
6 Variance estimates for the Pear . . . 24
7 Variance estimates for the twisted Pear . . . 24
8 Variance estimates for the Cauchy density . . . 25
9 Variance estimates for the transformed normal density . . . . 25
10 Outlier variance estimates for the Pear density . . . 28
11 The Pear densities σˆ negative values . . . 29
12 The correlation estimates from the local dependence function for the Cauchy distribution . . . 32
13 Correlation estimates using the precision matrix for the Box- Cox transformed Cauchy density for2 λ values . . . 35
14 Box-Cox transformed Cauchy density for 5 different λ values. . 37
15 Correlation estimates using the precision matrix for the Box- Cox transformed Cauchy density for5 different λ values . . . . 39
16 Four examples of kernel functions . . . 43
17 Bivariate Gaussian kernel estimates for the Pear, twisted Pear, Cauchy and transformed normal density. . . 46
18 Bivariate Gaussian kernel estimates of the transformed normal density for3 different h values . . . 48
19 The correlation estimates from the precision matrix for the bivariate Gaussian kernel estimates of the densities. . . 53
20 Histograms of ρ(x, y)ˆ estimates for 5 different h levels for bi- variate Gaussian kernel estimates of the Cauchy density . . . . 54
21 Contours ofρ(x, y)ˆ for the bivariate Gaussian kernel estimates of Cauchy density at 5 different h values. . . 56
iv
List of Figures
22 Undefined areas for ρ(x, y)ˆ for the bivariate Gaussian kernel estimates of the Cauchy density for 2different h values. . . 57 23 Local Gaussian correlation maps of a bivariate normal density
for 4different sample sizes. . . 61
v
.
Table of Abbreviations
Table of Abbreviations
AISB - Asymptotic Integrated Variance AIV - Asymptotic Integrated Square Bias
AMISE - Asymptotic Mean Integrated Square Error ARMA - Auto-regressive Moving Average
i.i.d - Independent and Identically Distributed IQR - InterQuartile Range
MISE - Mean Integrated Square Error MSE - Mean Square Error
vi
.
1 Introduction
1 Introduction
Correlation is the measure of the relation between two or more variable and is a commonly studied subject as it can be important to understand how variables in a sample relate to each other. The local Gaussian correlation, is a measure of correlation, and measures the correlation locally for areas of the data. It does this by approximating areas to Gaussian densities and the correlation estimates comes from these approximations. My introduction to local Gaussian correlation was through my bachelor thesis (Brokstad,2020).
This bachelor thesis provided a general overview of the local Gaussian cor- relation. In addition, to using the local Gaussian correlation to investigate the relation between COVID-19 data and its impact on the index stock for Oslo Børs. The local Gaussian correlation’s primary use has been for inves- tigating the stock market. An example of this is the paper by Støve et al.
(2014), where they use the local Gaussian correlation to investigate different financial crises. Another example of this isNguyen et al.(2020) that looks at correlation before and after economic crises. Some work has also been con- ducted in other areas of statistics. Such as Jordanger and Tjøstheim (2021) paper where they use the local Gaussian correlation to inspect upper and lower tails of spectral densities. Similarly, Berentsen et al. (2014) used the local Gaussian correlation to examine copula models and their characteristics.
The other method used is the local dependence function. The local de- pendence function was introduced in Jones (1996). The local dependence function is a measure of dependence. To be able to find an estimate of the correlation using it, it has to be transformed. The areas of use for the depen- dence function are less specific than the area of use for the local Gaussian correlation. There are papers such as Gupta et al.(2010) that proved, when the local dependence function is always equal or more than 0 the density is totally positive of the second order. Jones and Koch (2003) introduces de- pendence maps, with the goal of making the local dependence function easier to interpret.
The goal of this thesis is try and build a bridge between the local Gaus- sian correlation and the local dependence function. This bridge can be found if they can both locally approximate the same correlation as the correlation coefficient for a bivariate normal density. As there was not, a clear connection between the local Gaussian correlation and the local dependence function.
1
1 Introduction
Instead, the precision matrix was utilised over the local dependence function.
The precision matrix is the inverse of the covariance matrix. In order to get the correlation coefficient, the precision matrix has to be transformed to the covariance matrix. From the covariance matrix, it is possible to get the cor- relation coefficient. As the estimated correlation obtained from the precision matrix can locally approximate to the correlation coefficient for the bivariate normal density, the bridge between it and the local Gaussian correlation can be built. However, some details remain unclear. One of the problems is that the local Gaussian correlation is defined over the entire area of the density, while the estimated correlation from the precision matrix has areas that are undefined. In addition, there are certain areas for the different densities that the local Gaussian correlation and the estimated correlation from the pre- cision matrix give differing estimates. The reason for why they may differ and why there are undefined areas is discussed further in this thesis. For the implementation of the local Gaussian correlation the package lg (Otneim, 2019) for the programming language R is used. For finding the precision matrix the packageTMB, is used (Kristensen et al.,2016). There is also the implementation of the Box-Cox transformation and the bivariate Gaussian kernel estimate in relation to the precision matrix. However they are only briefly used so they are not as extensively studied as the regular precision matrix. Finally a discussion about the overall results and what problems remain with the precision matrix’s estimated correlation.
2
2 Correlation
2 Correlation
Correlation is the interaction between two variables and how they affect each other. Generally the most used form of correlation is Pearson’sρthatPearson (1896) introduced, and its formal definition is
ρ= E(XY)−E(X)E(Y)
σXσY , (1)
where E(XY) is the expected value for X times Y. E(X) is the expected value for X and E(Y)is the expected value for Y. σX and σY are the stan- dard deviations for theX andY variables, respectively. Theρvalue is within the range of [−1,1]. Positiveρvalues indicate a positive correlation between the two variables. Positive correlation is the positive association between the variables, so as X increases in value, Y increases in value as well. For the negative correlation, as one of the variables increases in value the other variable will decrease in value. The strength of this association is described by how close or equal ρ is to 1 or−1. A ρ value of 0.9 indicates a stronger correlation then a ρ value of 0.2. For ρ = 0, it indicates that there is no linear correlation between the variables.
Although ρ has been given for its population form, it can also be approxi- mated empirically by replacing the expected value and variance with their empirical forms. The empirical forms are the sample mean and sample vari- ance. Then for a given dataset with the observations (X1, Y1), ...,(Xn, Yn), the mean for the observations X and Y are given by X¯ and Y¯, the sam- ple standard deviations are given by sX and sY. The empirical version of Pearson’s ρ is
r =
Pn
i=1(Xi−X)(Y¯ i−Y¯) pPn
i=1(Xi −X)¯ 2pPn
i=1(Yi−Y¯)2. (2)
For both the population variant and the empirical variant, a singular value of the correlation ρ is found for all of the data.
3
2 Correlation
Figure 1: 1 000 simulated datapoints for the function Y = sin(3x) +, where is independent and identically distributed noise.
While Pearson’s ρ gives us a good overview of the relation between the x and y variable, it does have its weak points. For certain datasets we are more interested in specified areas correlation than the whole data’s correla- tion coefficient There are at least two big disadvantages for Pearsonsρ. One of the two problems is that ρ is sensitive to outliers. For example for a hy- pothetical dataset of (1,3),(4,6),(5,2),(13,9), the correlation is r = 0.8. If we removed (13,9)from the dataset, then instead r = 0.038. Obviously the proposed hypothetical dataset is small so the ρ value is more susceptible to changes for this dataset, than for a dataset of sizen = 1000for example. But even for that larger hypothetical dataset, theρ value would still be sensitive to outliers. The reason is that both the formula for the empirical (2) and the population variant (1) include measurements of the mean. Outlier values will have bigger impacts on the mean value then data within the normal range.
4
2 Correlation
The second problem is the fact that Pearson’s ρ is set up to measure linear correlation between the two variables. Thus for nonlinear correlation, it will not describe the association between the two variables well. An example discussed in Tjøstheim et al. (2022) of this issue that ρ is for Y = X2 +. The reason Pearson breaks down for this example is that it is not able to capture the nonlinear association between the two variables. As Y and X will be negatively correlated for negative values of X. Whereas for positive values of X there will be a positive correlation between the two variables.
Similarly, another example of nonlinearity that ρ will under perform on is Y = sin(3x)+as seen in figure 1 whereis identical independent distributed (i.i.d) noise.
There are methods to reduce susceptibility for outliers for ρ. An easy so- lution is to exclude or remove outliers from the dataset. There are other methods such as transforming or altering the formula for how ρis calculated, to make it less susceptible to this problem. Building on the previous exam- ple, a quick and dirty solution would be to only use data points within a given percentile. There are also more complex alters such as Spearman’s or Kendall’s methods which are also outlined in Tjøstheim et al.(2022). Spear- man’s rank correlation looks at the observations by ranking. While Kendall’s correlation coefficient τ looks at the difference between concordant pairs and discordant pairs. Concordant pairs are pairs where for i < j, Xi > Xj and Yi > Yj or Xi < Xj and Yi < Yj. A discordant pair would entail either Xi
or Yi having a smaller value than either Xj or Yj. However both of these variations will still have problems with describing the relation of nonlinear data.
So far, the paper has mainly presented potential weaknesses for ρ. Yet it still is the most commonly used method of measuring correlation. As pre- viously stated ρ gives an overview of the whole dataset’s relation unlike the local correlations which is discussed later. In addition, ρ is very simple to compute as its components are simple to find, such as the standard devia- tion and the mean. There are also other benefits of ρ such as being able to describe the relation of Xt and Xt+h for linear time series models such as Autoregressive-moving average (ARMA) models (Tjøstheim et al.,2022).
Another group of models that ρ is important for, is the multidimensional normal density and similar densities from the same family. As one of the pa- rameters used to describe the density is actually Pearson’s ρ which appears 5
2 Correlation
in the covariance matrix. ρis also useful in linear regression models. For the form of Y = α+βX +, where is zero-mean i.i.d noise. Then β can be given as (Tjøstheim et al., 2022):
β =ρσY
σX. (3)
For the linear regression model if σX,σY and ρare unknown, we can instead use their empirical counterparts {sX, sY, r} to get a similar result.
6
3 Local Gaussian Correlation
3 Local Gaussian Correlation
The local Gaussian correlation is method of measuring correlation by locally approximating a Gaussian density to a dataset. For a given datapoint z = (x, y). The approximated density will have the running variables (v1, v2).
It will also have µ1(z) and µ2(z) as the local mean vectors and σ12(z) and σ22(z)will be the local variance functions. ρ(z)is the covariance value for the density. Then for the datapoint (x, y), the approximated density is
ψ(v,µ1(z), µ1(z), σ12(z), σ22(z), ρ(z))
= 1
2πσ1(z)σ2(z)p
1−ρ2(z)
×exp
− 1 2
1 1−ρ2(z)
(v1−µ1(z))2 σ21(z)
−2ρ(z)(v1−µ1(z))(v2−µ2(z))
σ1(z)σ2(z) +(v2−µ2(z))2 σ22(z)
.
(4)
From the approximation, the most important parameter is ρ(z). It is from the parameter ρ(z), the local Gaussian correlation gets its name from. The dataset itself does not have to have a Gaussian density, as the local Gaussian correlation will approximate a local area of the dataset to a Gaussian density.
If the dataset is actually Gaussian distributed. Then for a given density f, the local Gaussian density will approximate tof’s density for any value off.
Although for the Gaussian data, as the local Gaussian correlation approxi- mates the datapoints in a given area there can be multiple possible densities that it could approximate to (Tjøstheim et al.,2013). Thus to find the best fit for the approximation, a penalty function is used. The penalty function utilised for the local Gaussian correlation is a locally weighted Kullback- Leibler distance between f and ψ. As the local Gaussian correlation is built on the work ofHjort and Jones (1996). Many of the functions and equations for the local Gaussian correlation’s penalty function, are derived from that paper. In that paper (Hjort and Jones, 1996), they present how to find a local dependence measurement using local likelihood. However unlike the lo- cal Gaussian correlation they do not specify what family the approximating density fˆwill have. While for the local Gaussian correlation thefˆtakes the form of ψ.
7
3 Local Gaussian Correlation
The penalty function is given as q=
Z
Kh(v−z)[ψ(v, θ(z))−logψ{v, θ(z)}f(v)]dv. (5) In the penalty function Kh is a product kernel containing
Kh(v−z) = (h1h2)−1K(h−11 (v1−x))K(h−12 (v2−y)). (6) For Kh, the bandwidth ish= (h1, h2) (Tjøstheim et al., 2013). Also
θ(z) = (µ1(z), µ2(z), σ21(z), σ22(z), ρ(z)). (7) Then to minimize θ(z) for the penalty function
Z
Kh(v−z) ∂
∂θjlog(ψ(v, θ(z)))[f(v)−ψ(v, θ(z))]dv= 0 j = 1, ...,5.
(8)
For the density f, where (X1, ..., Xn) and (Y1, ..., Yn) are i.i.d observations and Zi = (Xi, Yi). To get an estimate forθ(z)for a fixedz, we maximize the local log likelihood (Tjøstheim et al., 2013).
L(Z1, ..., Zn, θb(z)) =n−1X
i
Kh(Zi−z) logψ(Zi, θh(z))
− Z
Kh(v−z)ψ(v, θh(z))dv,
(9)
Kh is a kernel function as described previously for the penalty function (6).
From here, one can obtain the derivative ∂L/∂θj (Tjøstheim et al., 2013),
∂L
∂θj =n−1X
i
Kh(Zi−z) ∂
∂θj log{ψ(Zi, θh(z))}
− Z
Kh(v−z) ∂
∂θj log{ψ(v, θh(z))}ψ(v, θh(z))dv
→ Z
Kh(v−z) ∂
∂θj log{ψ(v, θh(z))}[f(v)−ψ(v, θh(z))]dv,
(10)
∂L/∂θj is found by using the law of large numbers, and the assumption that E[Kh(Zi−z) logψ(zi, θb(z))]<∞. (11) 8
3 Local Gaussian Correlation
Then if ∂L/∂θj = 0, we can find the maximum likelihood estimatesθbb (Tjøs- theim et al., 2013). From minimizing θ(z) for the penalty function we can get
Z
Kh(v−z)α(v)dv =α(z) + 1 2
2
X
i=1 2
X
j=1
σK2
i,j
∂2α(zi, zj)
∂zi∂zj bibj +o(bTb). (12) In the equation, σK2i,j =R
ci1cj2K(c1, c2)dc1dc2. Wherec=A−1i (z−ai). Ai = Σ1/2i , where Σ is the covariance matrix and ai =µi, for further explanation see Tjøstheim et al.(2013). The important part is
α(v) = ∂
∂θjlog{ψ(v, θh(z))}[f(v)−ψ(v, θh(z))], (13) for θT = [θ1, ..., θ5], which means that as b → 0 then at the same time f(z)−ψ(z, θh(z))→0. For f1(z) =f2(z)within a neighbourhood then for a certain bandwidthb0 it is possible to getθ(f1, z) = θ(f2, z)(Tjøstheim et al., 2013). For creating the subsequent figures, the Rpackage lg(Otneim,2019) is used. The reason is it finds the local Gaussian correlation estimates for datasets. In addition as there is no real data, the data is simulated using the Markov chain Monte Carlo method.
9
4 Local Dependence Function
4 Local Dependence Function
Another method for measuring local dependence is described inJones(1996) paper "The local dependence function". The local dependence function Jones describes takes the form of
γ(x, y) = ∂2
∂x∂y log(f(x, y)). (14)
As the local dependence function is the double derivative of the log density, it is not restricted to being values within the range of −1 and 1. This is unlike Pearson’s ρ or the local Gaussian correlation. In Jones’s paper, they also present multiple properties for the function γ(x, y) (Jones, 1996). One of those properties is that γ(x, y) is finite everywhere. A different property is that if X and Y are independent then and only then is γ = 0. Another important property is that, when we are looking at γ(x, y) for a bivariate normal density. Then the γ function is constant(Jones, 1996) and should have the form of
γ(x, y) = ρ
(1−ρ2). (15)
Another property is that for a stronger correlation between x and y, the γ(x, y) function will begin to increase exponentially towards ±∞. This can be seen in equation (15). As ρfor the bivariate normal density goes towards
±1 then the denominator of the equation (15) will go towards0.
10
5 Experiments
5 Experiments
As the local dependence function is defined as a function of ρ for bivariate normal densities. Then by solving the equation, ρ can instead be described as a function of γ.
γ(x, y) = ρ (1−ρ2) γ(x, y)(1−ρ2) = ρ
ρ2γ(x, y) +ρ+γ(x, y) = 0.
(16)
From the inverse transformation we get the two possible values for ρ, ρ1 = −1 +p
1 + 4γ(x, y)2 2γ(x, y) ρ2 = −1−p
1 + 4γ(x, y)2 2γ(x, y) .
(17)
From experimentation using real ρvalues within the range [−1,1]and using the γ(x, y) function. ρ1 returns the real ρ values. This is also discussed in the introductory paper for the local dependence function (Jones, 1996). ρ2 on the other hand will return values that are outside the range ofρ’s defined area of [−1,1]. However while Pearsons ρ can be equal to 0, ρ1 6= 0. This can be seen by using equation (17), and (15). If ρ = 0 then the γ function from equation (15) is also equal to 0. The problem occurs in equation (17) asρ1 hasγ in the denominator and we cannot divide by0. Although ρ1 6= 0, it does approach the value 0as the γ function goes towards 0from both the positive and negative sides. This is shown by using L’Hôpital’s rule
f(γ) = −1 +p
1 + 4γ2 g(γ) = 2γ
f(0) = 0 g(0) = 0
f0(γ) = 4γ
p1 + 4γ2 g0(γ) = 2.
Then the resulting equation is
γ→0lim 1 2
4γ
p1 + 4γ2 = 0. (18)
11
5 Experiments
AlthoughJones (1996) states that "it is constant if f is the bivariate normal density, and then takes the valueρ/(1−ρ2)whereρis the Pearson correlation coefficient;". The statement can be proven to be false. By deriving the mixed partial derivative of the logarithm of a bivariate normal density f, the γ function takes the form of
γ(x, y) = ∂2
∂x∂ylog(f(x, y)) = ρ
1−ρ2 × 1
σxσy. (19) The reason that the γ function takes the form of the above equation (19) instead of the previously given form in the equation (15), is because the bivariate normal density is defined as
f(x, y) = 1 2πσxσyp
1−ρ2
×exp
− 1 2
1 1−ρ2
(x−µx)2 σ2x
−2ρ(x−µx)(y−µy)
σxσy + (y−µy)2 σy2
.
(20)
From the definition (20), the only part that contains both axand ayvariable is the part of
−2ρ(x−µx)(y−µy)
σxσy . (21)
Thus when we try to find the local dependence function for the density, the denominator with σx and σy is not derived away. Just to further exemplify this, if equation (19) and equation (17) are used to check that ρ1 =ρ. Then γ for a bivariate normal density with σx = σy = 1 will take the form given in equation (15) and all possible ρ1 values will be equal to ρ. While if one or both of the σ are different than 1 then ρ1 6=ρ. This conclusion is further supported by Jones (1998), that it is for a standard bivariate normal density that γ takes the form given in equation (15). Using the given definition of the local dependence function for the Pear which is a transformed normal density, the γ function is
γ(x, y) = ρ
1−ρ2 × 3x2
σxσy. (22)
The Pear density was included in Doksum et al. (1994) and can be seen in figure 2a. For the transformed normal density seen in figure 5a, the γ takes 12
5 Experiments
the form of
γ(x, y) = ρ
1−ρ2 × y σxσy
. (23)
As shown there are some problems with trying to build a connection between the local dependence function and the local Gaussian correlation. Therefor my supervisor Professor Skaug proposed the use of the precision matrix in- stead. The precision matrix for the bivariate normal density can be defined as
1 (1−ρ2)σx2
−ρ (1−ρ2)σxσy
−ρ (1−ρ2)σxσy
1 (1−ρ2)σy2
=
− ∂2
∂x2logf(x, y) − ∂2
∂x∂ylogf(x, y)
− ∂2
∂x∂ylogf(x, y) − ∂2
∂y2logf(x, y)
.
(24) The precision matrix’s inverse is the covariance matrix, which for a bivariate normal density is given as
"
σx2 ρσxσy ρσxσy σ2y
#
. (25)
As the covariance matrix is defined to have a ρ value in it, it can be solved to extract the ρ value. This is done by taking either (1,2) or (2,1) as they contain ρσxσy from matrix (25). Then by dividing (2,1)by the square roots of (1,1) and (2,2) for the equation (25) as they contain σx2 and σ2y. Then the only variable left is ρ. This approach is generalized for other densities so that an estimate of ρ can be obtained. By going through the steps outlined above, the estimate is referred to as
ˆ
ρ(x, y). (26)
This estimate will always be ρ(x, y) =ˆ ρ for the bivariate normal density.
Thus a connection between it and the local Gaussian correlation can be found as they both locally approximate the correlation coefficient for the bivariate normal density. Again likewise with the local Gaussian correlation,
ˆ
ρ(x, y) was found in R, this time by using the TMB package (Kristensen et al.,2016) which allows one to compileC++files inR. The reason why the TMB package is useful, is because it can return the double derivative values 13
5 Experiments
for functions. It can also return the derivatives value if that is needed. The one thing to note is that the package does not give the explicit form of the derivatives or double derivatives. So to find out how the double derivatives actually look has to be done by hand.
14
5 Experiments
(a) Density.
(b)10 000 Simulated observations from the density.
15
5 Experiments
(c) ρ(x, y)ˆ estimates from the precision matrix.
(d) Local Gaussian correlation map for the simulated data.
Figure 2: The density of N(x, y,10,1.55,102,0.7752,0.75) where the density is a transformed normal density from (U, V) wherex=U1/3, y=V and simulated observations of it. a) contour, b) simulated observations from the density, c) estimated correlation from the precision matrix and d) local Gaussian correlation map.
16
5 Experiments
(a) Density.
(b)10 000 Simulated observations for the density.
17
5 Experiments
(c) ρ(x, y)ˆ estimates from the precision matrix.
(d) Local Gaussian correlation map for the simulated data.
Figure 3: The density of twisted Pear f(x, y) =f(x)f(y|x), where f(x) =N(1.2,(1/3)2), f(y|x) = N(µ(x), σ2(x)),
µ(x) = (x/10) exp(5−(x/2)), σ2(x) = [(1 + 0.5x)/3]2 and simulated observations from it. a) contour, b) simulated observations, c) precision matrix and d) local Gaussian correlation.
18
5 Experiments
(a) Density.
(b)100 000Simulated observations from the density.
19
5 Experiments
(c) ρ(x, y)ˆ estimates from the precision matrix.
(d) Local Gaussian correlation map for the simulated data.
Figure 4: The Cauchy density and simulated observations from it. a) contour, b) simulated observations from the density, c) estimated
correlation from the precision matrix and d) local Gaussian correlation map.
20
5 Experiments
(a) Density.
(b)100 000simulated observations for the density.
21
5 Experiments
(c) ρ(x, y)ˆ values from the precision matrix.
(d) Local Gaussian correlation map for the simulated data.
Figure 5: The density of transformed bivariate normal density N(x, y,4,2,52,22,−0.27) wherex=U + 1, Y =√
V −2and simulated observations of it. a) contour, b) simulated observations from the density, c) estimated correlation from the precision matrix and d) local Gaussian correlation map.
22
6 Variance
6 Variance
The different densities for ρ(x, y)ˆ from equation (26) are displayed in figures 2c − 5c. With the exception of the ρ(x, y)ˆ for the transformed bivariate normal distribution, the other figures 2c − 4c have areas that are white.
These white areas occur becauseρ(x, y)ˆ is imaginary. The reason the estimate can be imaginary, is because it gets variance estimates for σx2 and σ2y and takes the square root of them. Thus where these estimates are negative, the resulting ρ(x, y)ˆ is imaginary. So while, estimating the variance is not the primary focus of the thesis, there is some values to analysing them. These estimates for the different densities are shown in figures 6, 7, 8 and 9. These estimates are taken from sequentially increasingxandyvalues instead of the sample. So as the sequences are dependent on the areas they are used over, the estimated variances will likewise also be dependent on those areas. Thus these estimates have to at least be taken with a bit of skepticism. In terms of the different estimates, only the estimates for the transformed normal density which corresponds to figure 9 have all of its values within the range of the histogram. This is demonstrated in figure 5c, as it is the only one that has no white areas. The other figures 6, 7 and 8 have excluded 5% or less of the variance estimates. The only exception is figure 6a, which is missing 68 309 observations out of 1 000 000. In terms of the negative values that are estimated there is some information to glean. For example the subfigures 8a and 8b have a reasonable amount of estimations that are negative, which at a surface level seems to correspond to why there is so much white area in figure 4c. Another point to note is the fact that for the different figures, most of the estimates fit within the range of [−1,1]. While most of the values not included are just outside of this range, there are also occurrences of extreme estimated values in at least the thousands if not more. This will be further expanded upon in the next section.
23
6 Variance
(a)σˆ2x (b)σˆy2
Figure 6: Variance estimates for the Pear density. a) the variance estimate of X and b)variance estimate of Y.
(a)σˆ2x (b)σˆy2
Figure 7: Variance estimates for the twisted Pear density. a) the variance estimate of X and b) variance estimate of Y.
24
6 Variance
(a)σˆ2x (b)σ2ˆ y
Figure 8: Variance estimates for the Cauchy density. a) the variance estimate of X and b) variance estimate of Y.
(a)σˆ2x (b)σˆy2
Figure 9: Variance estimates for the transformed bivariate normal density.
a) the variance estimate of X and b) variance estimate of Y.
25
7.1 Pear 7 Results and Discussion
7 Results and Discussion
One of the main results from figures 2, 3, 4 and 5 is that both the local Gaussian correlation and the ρ(x, y)ˆ estimate from equation (26) indicate if the correlation is positive or negative for the given areas. With the figures 3c and 3d seeming to be the most similar. The other main result is thatρ(x, y)ˆ is within the range of [−1,1]. Compared to the local Gaussian correlation the estimateρ(x, y)ˆ is a function, so there is a smoother transition from one area to another. While for the local Gaussian correlation, these areas are less interconnected, so there is the possibility of an area of negative correlation occurring in a wider area that is positively correlated. The actual function for ρ(x, y)ˆ can be complex because of it using the double derivatives. As some of the chosen densities do not reduce when derived, particularly the twisted Pear. With the local Gaussian correlation, it is harder to grasp why certain areas return the correlation estimates, they do. As the package lg does the estimations for you. Another thing to note is that the transformed normal density and the Cauchy density have more simulated observations than the twisted Pear and the Pear density. This was done because as one can see in figures 4b and 5b, the simulations extend outside of the range of the distribution. As the correlation estimates look at the same range as the densities, some extra observations were simulated to make sure that approximately the same amount of observations are in the restricted ranges.
7.1 Pear
The first Pear transformed normal density is fromDoksum et al.(1994). The Pear density is a transformed normal density shown in figure 2c, where x, y are transformed from (U, V). For the transformation x =U1/3, y = V and the density is given as
3x2 2πσxσyp
1−ρ2 exp
− 1 2(1−ρ2)
(x3−µx)2 σ2x
−2ρ(x3−µx)(y−µy)
σxσy +(y−µy)2 σy2)
.
(27)
For this density,µx =σx = 10, µy = 1.55, σy = 0.775andρ= 0.75. Generally the correlation estimates in figure 2d seem to have approximated well to the 26
7.1 Pear 7 Results and Discussion
realρ. While the estimates in figure 2c are less accurate at estimating the real value of ρ. For ρ(x, y), there seems to be no correlation in area between theˆ two tops of the densities seen in figure 2a. In addition, while not observable there is a straight line at x= 0 that is undefined. The reason it is undefined is because the double derivative ofxincludes2/xas part of the equation. On the other hand, there is a clear undefined area approximately aroundxfor the values of(1,3)andyfor the values of(2,4)in figure 3c. Along the edge of this undefined region is where the strongest correlation estimates occur as well as where the strongest variance estimates occur. These variance estimates are displayed in figure 10. These variance estimates being the positive and negative thousands. The reason for these negative approximations forσˆx2 and ˆ
σy2 are because they are given by σˆ2x =
1
0.7752(1−0.752)
/k
σˆy2 =
1 2(1−0.752)
3x4−12x
10 − 9x(y−1.55) 7.75
! + 2
x2
/k,
where k is given as
k =
1 2(1−0.752)
3x4−12x
10 −9x(y−1.55) 7.75
! + 2
x2
1
0.7752(1−0.752)
− 4.5x2 15.5(1−0.752)
!2
.
From the denominator of σˆx2 we can easily set it up the inequality for ≤ 0.
Solving the inequality for σˆ2x, the end result is
39.75x6−232.258x3y−120x3+ 350≤0, (28) Which explains the undefined area in figure 2c.
27
7.1 Pear 7 Results and Discussion
(a)σˆ2x
(b)σˆ2y
Figure 10: Outlier variance estimates for the Pear density. Red line are negative values, while the black line are positive values. a) estimates for X and b) estimates for Y.
28
7.1 Pear 7 Results and Discussion
Figure 11: Negative values corresponding to the inequality for equation (28).
29
7.2 Twisted Pear 7 Results and Discussion
7.2 Twisted Pear
The twisted Pear density is from the introductory paper for the local depen- dence function (Jones, 1996), and originally used in Doksum et al. (1994).
The twisted Pears density is given as:
f(x) = 1 σx√
2π ×exp
− 1 2
x−µx σx
2
f(y|x) = 1
√2πσ(x)exp
− 1 2
y−µ(x) σ(x)
2
f(x, y) = f(y|x)×f(x).
(29)
Where the functions µ(x)and σ(x) are µ(x) = x
10exp(5−x 2) σ(x) =
1 + 0.5x 3
,
(30)
and σx = 1/3 and µx = 1.2. In addition, the correlation coefficient ρ is not given. In terms of the estimates, this is the closest that ρ(x, y)ˆ and the local Gaussian correlation get for the given densities. As the correlation trend seems to be a very strong positive correlation along the left tail for figures 3c and 3d. Then towards the rightmost end of the figures, the correlation decreases in value and in the case of ρ(x, y)ˆ it gets into the negatives. So there seems to be a nonlinear dependence between the variables x and y.
30
7.3 Cauchy 7 Results and Discussion
7.3 Cauchy
The Cauchy density used is the bivariate form and takes the form of f(x, y) = 1
π(1 +x2+y2)3/2. (31) This is the same density as the one in the introductory paper for the local de- pendence function (Jones,1996). The most prominent feature of the Cauchy densities are that they do not have defined variance. So both σx2 and σy2 are undefined, this means that the spread of observations can be the entire range of(−∞,∞). For the given densities, the Cauchy density is the only one that
∂2/∂x2 and∂2/∂y2 mirror each other. This can be seen by taking the double derivatives
∂2
∂x2 logf(x, y) = 3(x2−y2−1) (x2+y2+ 1)2
∂2
∂y2 logf(x, y) = 3(y2−x2−1) (x2+y2+ 1)2
∂2
∂xylogf(x, y) = 6xy (x2+y2+ 1)2.
(32)
The only difference being whether the numerator contains 3(x2−y2−1)or 3(y2 −x2 −1). This results in figures 8a and 8b mirroring each other. We can show this by finding the covariance matrix using the derivatives. The covariance matrix then takes the form of
x2+y2+ 1 3(x2+y2−1) ×
"
y2−x2−1 −2xy
−2xy x2−y2−1
#
. (33)
As we can see (1,1) and (2,2) also mirror each other for the equation (33).
With the difference being whether there is a minus in front of x2 or in front ofy2. We can take this further and see why the defined areas for ρ(x, y)ˆ is circular for the Cauchy density. ρ(x, y)ˆ is then given as
ˆ
ρ(x, y) =
−2xy(x2+y2+ 1) 3(x2 +y2−1) (
s
(y2−x2−1)(x2+y2+ 1) 3(x2+y2 −1) )×(
s
(x2−y2 −1)(x2+y2+ 1) 3(x2+y2−1) )
.
(34) 31
7.3 Cauchy 7 Results and Discussion
Figure 12: The correlation estimates from the local dependence function for the Cauchy distribution.
From equation (34), the restrictions end up being x2+y2 6= 1,
x4 ≤(y2−1)2, x4+ 1≥2x2+y4.
(35)
Thus resulting in the circular form. The other thing to note about ρ(x, y)ˆ is that when both variables have the same sign the defined areas in figure 4c are positive. While when the signs differ, there is negative estimated correlation. This seems to line up with figure 4d, although it does have some areas that break this trend. Interestingly, figure 4d resembles the local dependence function more as shown in figure 12 as the estimated correlation for the areas are more similar. It should be noted that figure 12 has the ρ estimates that are calculated from theγfunction, using the method described in equation (17) from the experiments chapter.
32
7.4 Transformed Normal Density 7 Results and Discussion
7.4 Transformed Normal Density
For the final density, it seems that both the local Gaussian correlation and the ρ(x, y)ˆ indicate that there is a negative correlation on the positive side of the y-axis and positive correlation along the negative side of y. This is more clearly shown in figure 5c than in figure 5d, as there is a clearer transition for the estimate ρ(x, y). The local Gaussian correlation, on the other hand hasˆ a few areas that contradict the overall trend. In addition, only the top and bottom areas showing weak correlations. The middle area has correlation estimates close to 0. The reason for some areas contradicting the overall trend could be because of the areas being under sampled or, alternatively because of the weak negative correlation not being captured. Though the overall correlation trend seems to be, the further away from the y-axis the stronger correlated the variables are. So in a way pushing, the observations towards the axis. This may seem a bit counter intuitive when looking at the density displayed in figure 5a, but the highest probability areas have maximum probability of 0.03.
33
8 Box-Cox Transformation
8 Box-Cox Transformation
The ρ(x, y)ˆ from equation 26 ends up with a small defined area. A possi- ble way to mitigate this is by transforming the density, using the Box-Cox transformation is utilized. The Box-Cox transformation is a power transfor- mation that makes the data look more normally distributed. The Box-Cox transformation that G. Box and D. Cox introduced in their1964 paper (Box and Cox,1964). The transformation is defined as
f(x, y)(λ)=
f(x, y)λ−1
λ (λ6= 0), log(f(x, y)) (λ= 0),
(36)
and for the two parameter transformation (Box and Cox, 1964)
f(x, y)(λ)=
(f(x, y)−λ2)λ1 −1
λ1 (λ1 6= 0), log(f(x, y) +λ2) (λ1 = 0).
(37)
For both of these transformations, there are restrictions. For equation (36) the restrictions is that f(x, y) > 0, otherwise when λ = 0 then f(x, y)(λ) is imaginary. For equation (37), the restriction is similar, which is that f(x, y) >−λ2. Of these two transformations the uni parametric method is used. Thus using the Box-Cox transformation the precision matrix takes the form of
− ∂2
∂x2f(x, y)(λ) − ∂2
∂x∂yf(x, y)(λ)
− ∂2
∂x∂yf(x, y)(λ) − ∂2
∂y2f(x, y)(λ)
. (38)
In addition, as λ → 0 then f(x, y)(λ) will go towards log(f(x, y)). This is shown in figures 15a as it almost identical to figure 4c. Another thing to note is that ρ(x, y)ˆ estimates for the Box-Cox transformed density are no longer bound to the range of [−1,1]. For the density, as λ increase then the probability decreases and becomes more homogeneous as seen in figure 14.
This is similar for the estimate ρ(x, y), as there is a decrease in estimatedˆ value as λ increases as figure 13 shows. The exception to this is λ = 0.001.
As one can see in figure 15, the strongest correlation is estimated outside of the circle, in the diagonal areas. it seems like as λ decreases, the diagonal 34
8 Box-Cox Transformation
(a)λ= 0.1 (b)λ= 10
Figure 13: ρ(x, y)ˆ values for the Box-Cox transformed Cauchy density for 2 λ values. a) λ= 0.1 and b) λ= 10.
areas thin down and increase in value. Whileλincreases, the circle decreases and the diagonals increase. Additionally although hard to see, the edges of the diagonal areas are where the strongest estimated correlation occur. This is further exemplified in figure 15b.
35
8 Box-Cox Transformation
(a)λ= 0.001
(b)λ= 0.1
(c) λ= 1
36
8 Box-Cox Transformation
(d)λ= 5
(e) λ= 10
Figure 14: Box-Cox transformed Cauchy density for 5 different λ values. a) λ = 0.001, b) λ = 0.1, c) λ= 1, d) λ= 5 and e) λ= 10.
37
8 Box-Cox Transformation
(a)λ= 0.001
(b)λ= 0.1
(c) λ= 1
38
8 Box-Cox Transformation
(d)λ= 5
(e) λ= 10
Figure 15: Correlation estimates using the precision matrix for the Box-Cox transformed Cauchy density for 5different λ values. a)λ= 0.001, b)
λ = 0.1, c) λ= 1, d) λ= 5 and e) λ= 10.
39
9 Kernel Smoother
9 Kernel Smoother
Kernels are a statistical application with multiple uses. The application this paper will focus on is kernel smoothers. As the name implies kernel smoothers are used to transform or smooth the data within specified areas using kernels. In terms of how the data is transformed, there are a multitude of different kernel functions. The chosen examples are; the uniform kernel, triangle kernel, Gaussian kernel and Epanechnikov kernel. These examples are seen in figure 16. The different functions in the figure are; the uniform kernel function is (Ivanka, 2012)
K(x) = 1
2I[−1,1](x), (39)
where I is an indicator function taking on the form of I[−1,1](x) =
(1 if x∈[−1,1],
0 otherwise. (40)
The triangle kernel function has the form of
K(x) = (1−|x|)I[−1,1](x). (41) The Gaussian kernel function is given as
K(x) = 1
√2π exp(−1
2x2). (42)
and finally the Epanechnikov kernel takes the form of K(x) = 3
4(1−x2)I[−1,1](x). (43)
As previously mentioned these examples are just a few of many more kernel functions. They all serve the purpose of transforming data within given areas.
For the kernel smoothers, the bandwidthhis included. his a parameter that allows one to control how smooth the transformed data is. So for example the Gaussian kernel smoother with the addition of the parameter h is
K(x−Xi
h ) = exp
−(x−Xi)2 2h2
, (44)
40
9 Kernel Smoother
where Xi is our ith observation for the data. The variables x and h are parameters that we can set to smooth out our observations. Choosing the optimal kernel smoother is done by using the Mean Integrated Square Er- ror (MISE) and the Asymptotic Mean Integrated Square Error (AMISE).
AMISE and MISE are extensions of the Mean Square Error (MSE), and are accuracy measurements. Which instead of taking summation over the area, they instead integrate over the area. MISE is given as an integration over the area of the data. So MSE is given as (Ivanka,2012)
M SE[ ˆf(x, h)] =
n
X
i=1
( ˆf(x, h)−f(x))2, (45)
f is the given density for the observed data. fˆ is the sum of the kernel estimates for the data. The kernel density estimate was introduced by Parzen and Rosenblatt in their 1956 paper (Murray,1956) and takes the form of
fˆ(x, h) = 1 nh
n
X
i=1
K
x−Xi
h
, (46)
in the equation Xi is the ith observed data out of the dataset (X1, ..., Xn).
For the above equation, K(x−Xi
h )are the different kernels for the different given areas. Finally MISE is given as:
M ISE[ ˆf(x, h)] = Z
M SE[ ˆf(x, h)]dx, (47) and AMISE is given as:
AM ISE = 1 nh
Z
K(x)2dx+ 1 4
Z
x2K(x)dx×h4 Z
(f00(x))2dx. (48) In AMISE, the first part is the Asymptotic Integrated Variance (AIV) and the second part is the Asymptotic Integrated Square Bias (AISB) so then AM ISE =AIV +AISB (Ivanka, 2012) and MISE likewise can be defined as
M ISE( ˆf(x, h)) =AM ISE( ˆf(x, h)) +o{ 1
nh+h4}. (49)
41
9 Kernel Smoother
In terms of minimizing the MISE and AMISE score, one tries to find the optimal bandwidth. The optimal bandwidth can be found by solving the derivative of AMISE. This differential equation is set up as
∂
∂hAM ISE( ˆf(x, h)) =− 1 nh2
Z
K(x)2dx +
Z
x2K(x)dx×h3 Z
(f00(x))2dx= 0.
(50)
Furthermore the optimal bandwidth h takes the form of
hAM ISE =
R K(x)2dx 1/5
R x2K(x)dx 2/5
R f00(x)2dx
1/5n−1/5. (51)
Luckily there are simpler ways to approximate the optimal h value. Specif- ically for the Gaussian kernel. For the Gaussian kernel, there are rule of thumb estimates such as Scott’s rule of thumb (Scott, 2015) and Silverman’s rule of thumb (Silverman, 1986). Silverman’s rule of thumb and Scott’s rule of thumb are similarly defined with the only big difference being the constant they use. Silverman’s rule of thumb bandwidth is
h = 0.9 min{ˆσ, IQR/1.34}n−1/5, (52) and Scott’s rule of thumb is
h= ˆσn−1/(d+4). (53)
For both of these estimates, σˆ is the empirical standard deviation, n is the length of the dataset, IQR is the interquartile range and dis the amount of dimensions for the dataset. These two rules of thumb can also be up scaled to also work for multivariate kernels. For the implantation of them in R, both functions are built in. So Silverman is bw.nrd0and Scott is bw.nrd.
So far the kernel functions have been implemented for univariate data. How- ever, they can also be expanded to being multivariate. In this paper the only multivariate kernel function used, is the bivariate Gaussian kernel. The
42
9 Kernel Smoother
(a) Uniform kernel function. (b) Triangle kernel function.
(c) Gaussian kernel function. (d) Epanechnikov kernel function.
Figure 16: Four examples of kernel functions.a) Uniform kernel function, b) Triangle kernel function, c) Gaussian kernel function, d) Epanechnikov kernel function.
bivariate Gaussian kernel is given as K
x h,y
h
= 1
√
2πhexp −1
2 x2
h
× 1
√
2πhexp −1
2 y2
h
, f(x, y, h) =ˆ 1
n
n
X
i=1
K
x−Xi
h ,y−Yi h
,
(54)
where x and y are chosen values, and Xi and Yi are the observations from the dataset with i={1, ..., n}.
43
10.1 Contours 10 Kernel Estimates
10 Kernel Estimates
Similar to what was done in the Box-Cox transformation. One can instead of using f(x, y) for the precision matrix, use the bivariate Gaussian kernel estimated density f(x, y, h). Then the precision matrix takes the form ofˆ
− ∂2
∂x2
fˆ(x, y, h) − ∂2
∂x∂y
f(x, y, h)ˆ
− ∂2
∂x∂y
f(x, y, h)ˆ − ∂2
∂y2
fˆ(x, y, h)
. (55)
Just as before the same steps are taken. Then the resulting correlation estimate, is given as
ˆ
ρK(x, y). (56)
The practical reason for doing this, is because for real datasets the densities that produce them may be unknown. Thus resulting in the need to estimate the densities.
10.1 Contours
To give an overview over the different fits for the estimates, the subfigures 17a and 17d from figure 17 have approximated the form of the real densities closely. Specifically subfigure 17a is almost identical to subfigure 2a. The probabilities of the Gaussian kernel estimate of the Cauchy density are too low compared to the real Cauchy’s probabilities. This also occurs for the fit of the twisted Pear, as the probability in the center of the density is too low. Just to exemplify how the change in bandwidth changes the bivariate Gaussian kernel estimated densityfˆ, the transformed normal density is used.
The figure 18 shows that as the bandwidth decreases towards0then the bands become less smooth. While for higher bandwidths the smoothness increases.
In addition, the probabilities increase for lower bandwidths and decrease for higher bandwidths.
44
10.1 Contours 10 Kernel Estimates
(a)h= 0.119
(b)h= 0.08
45
10.1 Contours 10 Kernel Estimates
(c) h= 0.093
(d)h= 0.28
Figure 17: Bivariate Gaussian kernel estimates using the rule of thumb bandwidths. a) Pear density for h= 0.119, b) twisted Pear for h= 0.08, c) Cauchy for h= 0.093 and d) transformed normal density forh= 0.28.
46
10.1 Contours 10 Kernel Estimates
(a)h= 0.28
(b)h= 0.5
47
10.1 Contours 10 Kernel Estimates
(c) h= 1
Figure 18: Bivariate Gaussian kernel estimate of the transformed normal density for 3different h values. a)h= 0.28, b) h= 0.5and c) h= 1.
48
10.2 Double Derivatives 10 Kernel Estimates
10.2 Double Derivatives
While the actual fˆapproximations we get might look similar. The estimated correlation maps tell a different story. Even from a cursory glance the subfig- ures of 19 are vastly different from the ρ(x, y)ˆ estimates from figures 2c, 3c, 4c and 5c. Of these, only figure 19a has a resemblances to the actual ρ(x, y)ˆ from figure 2c. Similarly to the Box-Cox transformation, the ρˆK(x, y) values from equation (56) are not limited to the range of [−1,1]. But just like with the contours as the bandwidth increases, the spectrum of which ρˆis defined decreases back within the [−1,1] as seen in figure 20. Based on figure 20, h values of 1 and above, result in ρˆK(x, y) being defined within the [−1,1]
range. Generally from figure 19, the stronger correlation estimates seem to occur within localised areas. For figure 19a this occurs in the small fang like area at(1,4). For figures 19c and 19d there are specks of high correlation.
To observe the impact of the smoothing parameter h, the bivariate Gaus- sian kernel estimated Cauchy density was chosen. The results of this are shown in figures 20 and 21. As h, increases the range of ρˆK(x, y) decreases.
As seen in figure 21e the ρˆK(x, y) are so minuscule that the estimate are essentially equal to 0. On the other side forh = 0.001 in figure 21a there are only specks of correlation in between a larger area of undefined values. As h increases the amount of undefined areas decrease, as the difference between figures 21a and 21d is palpable. Although both figures 21c and 21d seem to show similar trends to figure 4c. As whenxandyare positive, the estimated correlation is also positive. While when x and y have differing signs, the estimated correlation is negative. What figure 21 has shown is that ρˆK(x, y) has to consciously implemented for the bivariate Gaussian kernel estimates.
As low of h values return a picture that can be too chaotic to read. While high values ofhbecome too homogeneous to discern. In addition, to the fact that sub optimal h values for the fit, may give better correlation estimates.
To understand why there are undefined areas for the estimated ρˆK(x, y) is harder than the function for ρ(x, y)ˆ from equation (26), as the function for
ˆ
ρK(x, y)is more obtuse . The log double derivatives of the bivariate Gaussian
49