Mathematical properties of complex networks

Acomplex network is a graph with structural features that are not present in simple networks such as lattices or random graphs. These structural features are neither purely regular nor purely random and include a vertex degree dis-tribution following something that looks like a power law, a high clustering coefficient, community structure and hierarchical structure. These features and others are described in the following. We start with two properties of complex networks that have received much attention lately: the scale-free and power law property and the small-world property.

Scale-free networks

A network is said to bescale-free if its degree distribution follows a power law, at least asymptotically as the number of vertices grows. The degree of a vertex is the number of edges adjacent to it. The degree distributionP(k)of a network is the fraction of vertices in the network with degreek, i.e. P(k) = ⁿ_n^k, wheren is the total number of vertices in the network andnk is the number of vertices

with degree k. If P(k) follows a power law, then it is approximately equal to k^−γ for large values of k, where the parameter γ typically is in the range (2,3). The World Wide Web and email networks are examples of real world networks with degree distributions following power laws. A degree distribution resembling a power law has a large number of vertices of low degree and a small number of vertices with high degree, sometimes referred to ashubs. The corresponding function plotting the degree distribution looks something like the one in Figure 2.1.

P(k)

Figure 2.1: This is how a graph following a power law looks like, more or less.

There exists several models for the evolu-tion of a scale-free network, the two most im-portant being theBarabási-Albert model(also called the preferential attachment model), and the configuration model (also called the Bollobás construction and the Molloy-Reed construction).

The Barabási-Albert model is an algo-rithm that generates random scale-free net-works with the preferential attachment prop-erty. This means that the more neighbors a vertex has, the higher probability it has of be-ing linked to new vertices introduced to the network. The network is built as follows. We start with a network on n₀ vertices and re-peatedly add new vertices to the network. Ev-ery time we add a new vertex to the network we connect it ton≤n0existing vertices. The probability pi that the vertex is connected to vertexiis proportional to the number of neighbors thatialready has. Formallypiis defined as

pi= ki

jkj

(2.1) where the sum is taken over all the verticesj that exist at the time we add the new vertex.

Small-world networks

A network is said to be a small-world network if it exhibits the small-world property, i.e. that the mean shortest path length`, taken over all vertex pairs, is small relative to the total number of vertices, n, in the network. ` must not grow faster than logarithmically as the number of vertices tends to infinity, that is, ` → O(logn) as n → ∞. Note that this definition does not allow one to determine whether an individual network is a small-world network since a collection of graphs are needed in order to rigorously define the property.

However, given a specific network, it is possible to compute all the usual graph properties. For instance it is possible make several randomized versions of the

network by randomly rewiring its edges, compute the geodesic distance for each version, and finally compute the mean geodesic distance [27]. The World Wide Web and the metabolic network are examples of small-world networks.

Community structure

Social networks (among other complex networks) usually display community structure (which is sometimes referred to as clustering). A network is said to display such structure if the vertices of the network can be partitioned into either overlapping or disjoint sets of vertices such that the number of internal edges exceeds the number of external edges by some reasonable amount. An internal edge is an edge connecting two vertices belonging to the same commu-nity whereas an external edge is an edge connecting two vertices of different communities.

Networks displaying a community structure may often display ahierarchical community structure as well. This means that the network may be divided into a small amount of large communities and each of these may be divided into several smaller communities. In section 5.2 (in the preliminaries) we will look closer at algorithms that reveal the hierarchical community structure of graphs, and in the main part, in chapter 7 and chapter 8, we will implement and test two such algorithms.

Clustering coefficients

Aclustering coefficient is a measure of how much the vertices of a network tend to cluster together. Especially in social networks vertices tend to form densely connected groups (communities). There are two different and well known clus-tering coefficients, a global and a local.

Theglobal clustering coefficientCis due to Luce and Perry [20] and is based on the concept of a triple of vertices. A connected triple is an ordered triple (in the set theoretic sense of the word)(a, b, c)of three verticesa,b, andc such thatais connected tobandb toc. The coefficient is defined as

C= 3×number of triangles

number of connected triples (2.2) The factor of 3 is due to the fact that every triangle gives rise to three connected triples, see Figure 2.2 [27].

The local clustering coefficient was introduced by Duncan J. Watts and Steven Strogatz in 1998 as a measure used to determine if a network is a small-world network. Each vertex v has its own clustering coefficient Cv, which is defined as the number of edges between the vertices in its neighborhood over the total number of possible edges between these vertices. For an undirected graph the coefficient is then

Cv= 2· |{{u, w}:u, w∈N(v),{u, w} ∈E}|

k_v(k_v−1) (2.3)

a b

a b c

b c a

c a b

Figure 2.2: A triangle graph consisting of three verticesa,b, andcshown on the top. Below it we have arranged the vertices of the graph in three different ways in order to reveal the three corresponding triples(a, b, c),(b, c, a), and(c, a, b).

wherek_v=|N(v)|.

In chapter 8 we will introduce a clustering coefficient called the edge clus-tering coefficient, which is the central concept in the main algorithm of this thesis.

In document Community Detection in Social Networks (sider 12-15)