# Introduction to Information

N/A
N/A
Protected

Share "Introduction to Information"

Copied!
54
1
0
Vis mer ( sider)

Fulltekst

(1)

### Introduction to Information Theory

Mateu Sbert

University of Girona, Spain Tianjin University, China

(2)

### • Information measures

• entropy, conditional entropy

• mutual information

Mateu Sbert 2

(3)

Mateu Sbert 3

(4)

Mateu Sbert 4

(5)

### "information" from the communication point of view

• information is uncertainty

• information source is modeled as a random variable or a random process

• probability is employed to develop the information theory

• information to be transmitted is digital

• Shannon's work contains the first published use of "bit"

Mateu Sbert 5

(6)

### Information Measures (1)

• Random variable X taking values in an alphabet X

• Shannon entropy H(X), H(p): uncertainty, information, homogeneity, uniformity

• information associated with x: -log p(x); base of logarithm:

2; convention: 0 log 0 = 0; unit: bit: uncertainty of the toss of an ordinary coin

H(X) = - p(x)

x

### å

ÎX log p(x) º - p(xi) i=1

n log p(xi)

x1

x2

xn

p(x) =

= x},p(X

=

p(x),x Î

Mateu Sbert 6

(7)

### Information Measures (2)

• Properties of Shannon entropy

• binary entropy:

£ H(X

£

Mateu Sbert 7

(8)

### Information Measures (3)

H(0.010, 0.020, 0.030, 0.800, 0.080, 0.030, 0.020, 0.010) = 1.211

H(0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125) = 3.000 H(0.001, 0.002, 0.003, 0.980, 0.008, 0.003, 0.002, 0.001) = 0.190

H(0.200, 0.050, 0.010, 0.080, 0.400, 0.010, 0.050, 0.200) = 2.314

2 3 4 5 6 7 8

1 1

0 0.5

p

x

Mateu Sbert 8

(9)

### Information Measures (4)

• Discrete random variable Y in an alphabet Y

• Joint entropy H(X,Y)

• Conditional entropy H(Y|X)

H(Y | X) = p(x)H(Y | x)

xÎX

### å

= - p(x) p(y | x)log p(y | x)

yÎY xÎX

### å

= - p(x,y)log p(y | x)

yÎY xÎX

### å

H(X,Y) = - p(x,y)log p(x,y)

y

ÎY x

ÎX

y1

y2

yn

p(y) =

Y = y}

Mateu Sbert 9

(10)

### Information Channel

• Communication or information channel X → Y

### p ( x,y) = p(x)p(y | x) = p(y ) p(x | y)

p(x1) p(x2)

...

p(xn)

p(y1 | x1) p(y2 | x1) ... p(ym | x1) p(y1 | x2) p(y2 | x2) ... p(ym | x2)

... ... ... ...

p(y1 | xn) p(y2 | xn) ... p(ym | xn)

p(y1) p(y2) ... p(ym)

p(y) = p(x)p(y | x)

xÎX

p(y | x)

xÎX

=1

p(X) p(Y|X)

p(Y) p(X) p(Y)

p(Y|X)

Bayes' rule

p(Y|x)

Mateu Sbert 10

(11)

### Information Measures (5)

• Mutual information I(X;Y): shared information, correlation, dependence, information transfer

I(X;Y) = H(Y) - H(Y | X) = p(x,y)

yÎY xÎX

### å

log p(p(xx,)p(y)y)

= p(x) p(y | x)

yÎY xÎX

log p(p(yy)| x)

Mateu Sbert 11

(12)

### Information Measures (6)

• Relationship between information measures

H(X|Y)

I(X;Y)

H(Y|X)

H(X) H(Y)

H(X,Y)

£ H(X

Y

£ H(X

H(X,Y

= H(X)+ H(Y

X)

### I(X;Y ) £ H(X )

Yeung's book: Chapter 3 establishes a one-to-one correspondence between Shannon's information measures and set theory. A number of examples are given to show how the use of information diagrams can simplify the proofs of many results in information theory.

Mateu Sbert 12

(13)

Mateu Sbert 13

(14)

### Information Measures (8)

• Normalized mutual information: different forms

• Information distance

I(X;Y) H(X,Y)

I(X;Y)

max{H(X),H(Y)}

I(X;Y)

min{H(X),H(Y)}

I(X;Y) H(X) + H(Y)

H(X

Y

+ H(Y

X

H(X|Y)

I(X;Y)

H(Y|X)

H(X) H(Y)

H(X,Y)

Mateu Sbert 14

(15)

### Relative Entropy

• Relative entropy, informational divergence, Kullback-Leibler

distance DKL(p,q): how much p is different from q (on a common alphabet X)

• convention: 0 log 0/q= 0 and p log p/0=∞

• DKL(p,q)>=0

• it is not a true metric or "distance" (non-symmetric, triangular inequality is not fulfilled)

I(X;Y)=DKL(p(X,Y),p(X)p(Y))

DKL(p,q) = p(x)

x

ÎX log q(p(xx))

Mateu Sbert 15

(16)

### Mutual Information

I(X;Y) = H(Y) - H(Y | X) = p(x,y)

yÎY xÎX

### å

log p(p(xx,)p(y)y)

= p(x) p(y | x)

yÎY xÎX

log p(p(yy)| x)

DKL(p,q) = p(x)

x

ÎX log q(p(xx))

I(X;Y

= DKL

p(X,Y

p(X

p

Y

Mateu Sbert 16

(17)

### Mutual Information Decomposition

• Information associated with x

I(X;Y) = p(x) p(y | x)

yÎY xÎX

### å

log p(p(yy| )x) = p(x)(H(Y) - H(Y | x))

xÎX

### å

I1(x;Y) = p(y | x)log p(y | x) p(y)

yÎY

I2

x;Y

= H(Y

- H(Y

### |

x)

I3(x;Y) = p(y | x)I2(X;y)

y

ÎY

I(X;Y) = p(x)

xÎX

Ik(x;Y)

k = 1,2,3

[DeWeese]

[Butts]

Mateu Sbert 17

(18)

### Mutual Information Decomposition

I(X;Y) = H(Y)- H(Y | X) = H(Y)- p(x)H(Y | x)

xÎX

= p(x)

x

### å

ÎX (H(Y)- H(Y | x))

= p(x,y)

yÎY xÎX

### å

log p(p(x)x,p(y)y) = p(x) p(y | x)

yÎY xÎX

### å

log p(p(yy)| x)

I1(x;Y) = p(y | x)log p(y | x) p(y)

yÎY

I2

x;Y

= H(Y

- H(Y

x)

Mateu Sbert 18

(19)

### Inequalities

• Data processing inequality: if X  Y Z is a Markov chain, then

No processing of Y can increase the information that Y contains about X, i.e., further processing of Y can only increase our

• Jensen's inequality: a function f(x) is said to be convex over an interval (a,b) if for every x1, x2 in (a,b) and 0<=λ<=1

f

x1 +

-

x2

£

f

x1

+

-

f

x2

Mateu Sbert 19

(20)

### Jensen-Shannon Divergence

• From the concavity of entropy, Jensen-Shannon divergence

JS(p(x1

p(xn

p(Y

x1

p(Y

xn

= I(X;Y

[Burbea]

Mateu Sbert 20

(21)

### Information Channel, MI and JS

• Communication or information channel X → Y

JS(p(x1

p(xn

p(Y

x1

p(Y

xn

= I(X;Y

Mateu Sbert 21

### X Y

p(x1) p(x2)

...

p(xn)

p(y1 | x1) p(y2 | x1) ... p(ym | x1) p(y1 | x2) p(y2 | x2) ... p(ym | x2)

... ... ... ...

p(y1 | xn) p(y2 | xn) ... p(ym | xn)

p(y1) p(y2) ... p(ym)

p(X) p(Y|X)

p(Y) p(X) p(Y)

p(Y|X)

p(Y|x)

(22)

### Information Bottleneck Method (1)

• Tishby, Pereira and Bialek, 1999

• To look for a compressed representation of X which maintains the (mutual) information about the relevant variable Y as high as possible

## Y

Mateu Sbert 22

(23)

### Information Bottleneck Method (2)

• Agglomerative information bottleneck method:

clustering/merging is guided by the minimization of the loss of mutual information

• Loss of mutual information

• The quality of each cluster is measured by the Jensen-Shannon divergence between the individual distributions in the cluster

I(X;Y) - I( ˆ X ;Y) =

p( ˆ x )JS(p(x1) /p( ˆ x ),..., p(xm) / p( ˆ x ); p(Y | x1),..., p(Y | xm))

where p( ˆ x ) = p(xk)

k=1

m

[Slonim]

I(X;Y

³ I( ˆ X

Y

Mateu Sbert 23

(24)

### Information Channel and IB

• Communication or information channel X → Y

I(X;Y) - I( ˆ X ;Y) =

p( ˆ x )JS(p(x1) / p( ˆ x ), p(x2) / p( ˆ x ); p(Y | x1), p(Y | x2))

p( ˆ x ) = p(x1)+ p(x2)

Mateu Sbert 24

### X Y

p(x1) p(x2)

...

p(xn)

p(y1 | x1) p(y2 | x1) ... p(ym | x1) p(y1 | x2) p(y2 | x2) ... p(ym | x2)

... ... ... ...

p(y1 | xn) p(y2 | xn) ... p(ym | xn)

p(y1) p(y2) ... p(ym)

p(X) p(Y|X)

p(Y) p(X) p(Y)

p(Y|X)

p(Y|x)

(25)

### Example: Entropy of an Image

• The information content of an image is expressed by the Shannon entropy of the (normalized) intensity histogram

• The entropy disregards the spatial contribution of pixels

Mateu Sbert 25

(26)

### Example: Image Partitioning (1)

• Information channel X → Y defined between the intensity histogram and the image regions

X

### X Y

p(X) p(Y) p(Y|X)

bi = number of pixels of bin i; rj = number of pixels of region j N = total number of pixels

Mateu Sbert 26

(27)

### Y

information bottleneck method

X

information gain

H(X) = I(X;Y)+ H(X |Y)

at each step, increase of I(X;Y) = decrease of H(X|Y)

I(X;Y)- I( ˆ X ;Y) = p( ˆ x )JS(p(x1) /p( ˆ x ),p(x2) /p( ˆ x );p(Y | x1),p(Y | x2))

Mateu Sbert 27

(28)

### Example: Image Partitioning (3)

0.1; 13; 0.00

1; 234238; 89.35 0.9; 129136; 49.26 0.8; 67291; 25.67 0.7; 34011; 12.97 0.6; 15316; 5.84 0.0; 5597; 2.14 0.4; 1553; 0.59

0.3; 330; 0.13 0.2; 64; 0.02

MIR = I( ˆ X ;Y)

I(X;Y) ; number of regions ; % of regions

Mateu Sbert 28

(29)

### Entropy Rate

• Shannon entropy

• Joint entropy

• Entropy rate or

information density

x1 x2 x3 x4 x5 x6 x7 L

Mateu Sbert 29

(30)

### Continuous Channel

• Continuous entropy

• Continuous mutual information

Ic(X,Y) is the least upper bound for I(X,Y)

• refinement can never decrease I(X,Y)

Hc(X) = - p

S (x)log p(x)dx

Ic(X,Y) = p

S

### ò

S (x,y)log p(p(x)x,p(y)y) dxdy

Mateu Sbert 30

(31)

### Viewpoint metrics and applications

Mateu Sbert

University of Girona, Spain Tianjin University, China

(32)

### Viewpoint selection

• Automatic selection of the most informative viewpoints is a very useful focusing mechanism in visualization

• It can guide the viewer to the most interesting information of the scene or data set

• A selection of most informative viewpoints can be used for a virtual walkthrough or a compact representation of the information the data contains

• Best view selection algorithms have been applied to computer graphics domains, such as scene understanding and virtual

exploration, N best views selection , image-based modeling and rendering, mesh simplication, molecular visualization, and camera placement

• Information theory measures have been used as viewpoint metrics since the work of Vazquez et al. , see also [Sbert et al. 2009]

Mateu Sbert 32

(33)

### The visualization pipeline

DATA ACQUISITION DATA PROCESSING DATA RENDERING

Reconstruction

Classification

Voxel model

Simulation, modeling, scanning Filtering, registration, segmentation

Direct volume rendering

Mateu Sbert 33

(34)

### Direct volume rendering (DVR)

classification maps primitives to graphical attributes

usually absorption + emission optical model

compositing integrates samples with optical properties along viewing rays

Transfer function definition

Local or global illumination

Both realistic and illustrative rendering

Mateu Sbert 34

(35)

### Viewpoint selection

• Evaluation of viewpoint quality based on the visibility of extracted isosurfaces or interval volumes.

• Use as viewpoint metrics the average of viewpoint entropies for the extracted isosurfaces.

Mateu Sbert 35

(36)

### Viewpoint selection

Best and worst views of interval volumes extracted from a data set containing simulated electron density distribution in a hydrogen atom

Mateu Sbert 36

(37)

### Viewpoint selection

• Best view selection: use entropy of the projected visibilities distribution

• Representative views: cluster views according to Jensen-Shannon similarity measure

Mateu Sbert 37

(38)

### Viewpoint selection

Best (two left) and worst (two right) views of tooth data set

Four representative views

Mateu Sbert 38

(39)

39

### Viewpoint selection

• Quality of viewpoint v, u(v), is a combination of three values

Mateu Sbert 39

(40)

### Viewpoint selection

• Semantics-driven view selection. Entropy, between other factors, used to select best views.

• Guided navigation through features assists studying the correspondence between focus objects.

Mateu Sbert 40

(41)

• How a viewpoint sees the voxels

• Mutual information

𝐼 𝑉; 𝑍 = 𝑝 𝑣 𝑝 𝑧 𝑣 log 𝑝 𝑧 𝑣

𝑧∈𝒵 𝑝 𝑧

𝑣∈𝒱

= 𝑝 𝑣 𝐼 𝑣; 𝑍

• Viewpoint mutual information (VMI) 𝑣∈𝒱

𝐼 𝑣; 𝑍 = 𝑝 𝑧 𝑣 log 𝑝 𝑧 𝑣

𝑧∈𝒵 𝑝 𝑧

### Visibility channel

𝑝 𝑣1 𝑝 𝑧1 𝑣1 𝑝 𝑧2 𝑣1 ⋯ 𝑝 𝑧𝑚 𝑣1 𝑝 𝑣2 𝑝 𝑧1 𝑣2 𝑝 𝑧2 𝑣2 ⋯ 𝑝 𝑧𝑚 𝑣2

𝑝 𝑣𝑛 𝑝 𝑧1 𝑣𝑛 𝑝 𝑧2 𝑣𝑛 ⋯ 𝑝 𝑧𝑚 𝑣𝑛 𝑝 𝑧1 𝑝 𝑧2 𝑝 𝑧𝑚 𝑝 𝑉

𝑝 𝑍

𝑝 𝑍 𝑉

### V Z

𝑝 𝑉 𝑝 𝑍

𝑝 𝑍 𝑉

𝑝 𝑣 = 𝑣𝑖𝑠 𝑣 𝑣𝑖𝑠 𝑖

𝑖∈𝒱 𝑝 𝑧 𝑣 =𝑣𝑖𝑠 𝑧 𝑣

𝑣𝑖𝑠 𝑣

𝑝 𝑧 = 𝑝 𝑣 𝑝 𝑧 𝑣

𝑣∈𝒱

viewpoints voxels

• Viola et al. 2006, Ruiz et al. 2010

Mateu Sbert 41

(42)

• How a voxel “sees” the viewpoints

• Mutual information

𝐼 𝑍; 𝑉 = 𝑝 𝑧 𝑝 𝑣 𝑧 log 𝑝 𝑣 𝑧

𝑣∈𝒱 𝑝 𝑣

𝑧∈𝒵

= 𝑝 𝑧 𝐼 𝑧; 𝑉

• Voxel mutual information (VOMI) 𝑧∈𝒵

𝐼 𝑧; 𝑉 = 𝑝 𝑣 𝑧 log 𝑝 𝑣 𝑧

𝑣∈𝒱 𝑝 𝑣

### Reversed visibility channel

𝑝 𝑧1 𝑝 𝑣1 𝑧1 𝑝 𝑣2 𝑧1 ⋯ 𝑝 𝑣𝑚 𝑧1 𝑝 𝑧2 𝑝 𝑣1 𝑧2 𝑝 𝑣2 𝑧2 ⋯ 𝑝 𝑣𝑚 𝑧2

𝑝 𝑧𝑛 𝑝 𝑣1 𝑧𝑛 𝑝 𝑣2 𝑧𝑛 ⋯ 𝑝 𝑣𝑚 𝑧𝑛 𝑝 𝑣1 𝑝 𝑣2 𝑝 𝑣𝑚 𝑝 𝑍

𝑝 𝑉

𝑝 𝑉 𝑍

### Z V

𝑝 𝑍 𝑝 𝑉

𝑝 𝑉 𝑍

𝑝 𝑣 𝑧 =𝑝 𝑣 𝑝 𝑧 𝑣 𝑝 𝑧

viewpoints voxels

• Ruiz et al. 2010

Mateu Sbert 42

(43)

### VOMI map computation

Volume dataset

Classified

data Ray casting

Visibility histogram

for each viewpoint

Probabilities

computation VOMI map

Transfer function

+

0

Mateu Sbert 43

(44)

Mateu Sbert 44

(45)

### • Interpret VOMI as ambient occlusion

𝐴𝑂 𝑧 = 1 − 𝐼 𝑧; 𝑉

Simulate global illumination

Realistic and illustrative rendering

Color ambient occlusion

𝐶𝐴𝑂𝛼 𝑧; 𝑉 = 𝑣∈𝒱 𝑝 𝑣 𝑧 log 𝑝 𝑣 𝑧𝑝 𝑧 1 − 𝐶𝛼 𝑣

### • Interpret VOMI as importance

Modulate opacity to obtain focus+context effects emphasizing important parts

### • “Project” VOMI to viewpoints to obtain informativeness of each viewpoint

𝐼𝑁𝐹 𝑣 = 𝑧∈𝒵 𝑝 𝑣 𝑧 𝐼 𝑧; 𝑉

Viewpoint selection

Mateu Sbert 45

(46)

### VOMI as ambient occlusion map

Original Ambient Occlusion,

Stewart 2003 Obscurances,

Iones et al. 98 VOMI

Mateu Sbert 46

(47)

### VOMI applied as ambient occlusion

Stewart 2003 VOMI

Mateu Sbert 47

(48)

### Color ambient occlusion

CAO map CAO map with contours

CAO maps with contours and color quantization

Mateu Sbert 48

(49)

### Opacity modulation

Original Modulated to emphasize skeleton Original Modulated to emphasize ribs

Mateu Sbert 49

(50)

Mateu Sbert 50

Min VMI

Max VMI Min INF

Max INF

Min VMI

Max VMI Min INF

Max INF

(51)

### References

• T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley, 1991, 2006

• R.W. Yeung. Information Theory and Network. Springer, 2008

• M.R. DeWeese and M. Meister. How to measure the information gained from one symbo., Network: Computation in Neural Systems, 10, 4, 325-340, 1999

• D.A. Butts. How much information is associated with a particular stimulus?. Network: Computation in Neural Systems, 14, 177-187, 2003

• J. Burbea and C.R. Ra. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28, 3, 489-495, 1982

• Noam Slonim and Naftali Tishby. Agglomerative Information Bottleneck. NIPS, 617-623, 1999

Mateu Sbert 51

(52)

### References

• Imre Csiszár and Paul C. Shields. Information Theory and Statistics: A Tutorial. Communications and Information Theory, 1, 4, 2004

• Pere P. Vazquez, Miquel Feixas, Mateu Sbert, and Wolfgang Heidrich.

Viewpoint selection using viewpoint entropy. In Proceedings of

Vision, Modeling, and Visualization 2001 , pages 273-280, Stuttgart, Germany, November 2001.

• M. Sbert, M. Feixas, J. Rigau, M. Chover, I. Viola. Information Theory Tools for Computer Graphics. Morgan and Claypool Publishers, 2009

• Bordoloi, U.D. and Shen, H.-W. (2005). View selection for volume rendering. In IEEE Visualization 2005 , pages 487-494

• Ji, G. and Shen, H.-W. (2006). Dynamic view selection for time- varying volumes. Transactions on Visualization and Computer Graphics , 12(5):1109-1116

Mateu Sbert 52

(53)

### References

• Mühler, K., Neugebauer, M., Tietjen, C. and Preim, B. (2007).

Viewpoint selection for intervention planning. In Proceedingss of Eurographics/ IEEE-VGTC Symposium on Visualization, 267-274

• Ruiz, M., Boada, I., Feixas, M., Sbert, M. (2010). Viewpoint

information channel for illustrative volume rendering. Computers &

Graphics , 34(4):351-360

• Takahashi, S., Fujishiro, I., Takeshima, Y., Nishita, T. (2005). A feature driven approach to locating optimal viewpoints for volume

visualization. In IEEE Visualization 2005 , 495-502

• Viola, I, Feixas, M., Sbert, M. and Gröller, M.E. (2006). Importance- driven focus of attention. IEEE Transactions on Visualization and Computer Graphics , 12(5):933-940

Mateu Sbert 53

(54)

Mateu Sbert 54

Referanser

RELATERTE DOKUMENTER

Secondly, Neo-constructionist approaches have attempted to reduce the amount of stored syntactic information, in favour of configurational approaches where notions such as

It also contains information regarding the data category, and can be used to evaluate whether a given data type is Point Data or Line

The bothanical garden is one block away.At the same time the site is in a domestic situation with urban blocks of living in typical Oslo &#34;bygårder&#34; (urban blocks)

From the point of view of Information Theory (IT) [4, 7], the discretization of a scene into patches in the radiosity method can be understood as the encoding or compression of

The sampling method in the decremental approach can be expressed as a view selection problem and the optimized views imply a kind of best view which is representative of the

We propose a data- driven approach where the best view selection problem is formulated as a classification and feature selection problem; First a 3D model is described with a set

To support simultaneous filtering of a shared view of a data set, each data point representation must contain information regarding the filter status of each collaborator. A data

There had been an innovative report prepared by Lord Dawson in 1920 for the Minister of Health’s Consultative Council on Medical and Allied Services, in which he used his

The strong reverberation close to the coast results in low target SNR and high probability of false alarm rate inflation, which strongly reduces the performance of four of the

However, in average over the image, the effect of coregistration error will then be larger than the photon noise. An increase in detector pixel size, keep- ing the same imaging

In Chapter 5, Norway’s role in previous international arms reduction processes is discussed, leading to an outline of a possible role for Norway as an NNWS in a future

A UAV will reduce the hop count for long flows, increasing the efficiency of packet forwarding, allowing for improved network throughput. On the other hand, the potential for

The name indicates that the source is in position 304, the sensor in position 306, and that the measured time series of the pressure is from the detonation with file number

The difference is illustrated in 4.23, and as we see, it is not that large. The effect of applying various wall treatments is of course most apparent in the proximity of the wall.

3 The definition of total defence reads: “The modernised total defence concept encompasses mutual support and cooperation between the Norwegian Armed Forces and civil society in

Only by mirroring the potential utility of force envisioned in the perpetrator‟s strategy and matching the functions of force through which they use violence against civilians, can

From a global population perspective, this epidemic in weight gain and its sequelae, such as metabolic syndrome, diabetes type 2, hypertension, cardiovascular diseases, and

It is not directly a filter operation but a way of changing the (assumed) faults appearances. A selection of morphological operation can be applied to enhance visualization. Best

We determine the intended selection based on the drawn lasso by first computing a selection mask (a weighted 2D selection shape) and mapping the selection information back into

A fitting phase were we use the Expectation Maximisation algo- rithm to represent the input re-radiation dataset as a Gaussian Mixture Model.. We describe this part

Things go wrong under all three contingencies of selection, and they may need to be put right by explicit design. Breeding practices have long represented a kind of intervention

The selection of cultural information to be included in a bilingual dictionary should be guided by the following cognitive and contrastive principle: a bilingual translation

September 27, 2013, 32nd DOT, Münster (DE), accompanied by a poster, “Breaking the Grounds for an Etymological Dictionary of Arabic” ♦ “Kleiner

Velg ditt språk

Nettstedet vil bli oversatt til språket du velger.

Foreslåtte språk for deg:

Annet: