• No results found

Efficient triangle counting

8.2 Implementing the algorithm

8.2.3 Efficient triangle counting

In order to make the algorithm described in the previous subsection as efficient as possible we need to count the number of triangles in the input graph as efficient as possible. To be even more precise, we need to count the number of triangles every edge is part of. In this subsection we first investigate efficient triangle counting algorithms for undirected graphs and then we pick the one that best suits our needs.

Schank and Wagner [29] have described and experimentally tested several triangle counting algorithms, of which we will mention only three. They have measured both the execution times of the algorithms and counted the number of triangle operations done by each algorithm on several both real and artificial networks. The rest of this subsection is based on their article.

We consider an undirected input graph G = (V, E), where n = |V|. The

RadicchiAlgorithm 7:

input : A graphH = (V, E)

output: A partition into communities LetG, a copy ofH, be the workbench graph;

N T is a HashMap mapping edges to the number of triangles they are part of;

N T ←NumTriangles(A, H);

LetQbe an indexed minimum priority queue storing tuples(e, Ce), whereeis an edge andCeis the edge clustering coefficient ofe, ordered by the valuesCe;

for(e={i, j})∈H do

c←(N T.get(e) + 1)/(min{ki−1, kj−1});

Q.put((e, c));

whileQ is not empty do

((e={i, j}), Cij)←Q.deleteM in();

Removeefrom G;

if areInDifferentComponents(G, i, j)then

LetGi andGj be the connected components ofiandj respectively;

if not (isCommunity(Gi) and isCommunity(Gj)) then Addeback to G

if ewas really removed then

UpdateDataStructures(Q,NT,e);

Compute and output all connected components ofG UpdateDataStructures 8:

input : An indexed minimum priority queueQ, a HashMapN T mapping edges to integers representing the number of triangles the edges are part of, and an edgee={u, v} that has been removed from the workbench graph.

output: No output but changes may have been made to bothN T andQ.

NT.put(e,0);

forevery vertex wthat together with uandv form a triangle inGdo NT.put({u, w}, NT.get({u, w})-1);

The simplest triangle counting algorithm is a n3

=O(n3)time algorithm that for every subset of three vertices checks if there is an edge between every pair of them.

The node-iterator algorithm iterates over all vertices in the graph and for every pair of neighbors checks if there is an edge between them. Thus the running time of this algorithm is

X The last algorithm is the edge-iterator. This algorithm iterates over every edge{u, v}of the graph and compares the neighbors ofuandv. {u, v, w}induce a triangle if and only if w is a neighbor both of uand v. If the neighbor list ofuis sorted, then the comparison can be done in timed(u) +d(v). Sorting of all neighbor lists can be done in P

v∈V d(v)logd(v)time. If we disregard the sorting part, the running time is given by

X

{u,v}∈E

d(u) +d(v) =O(mdmax) (8.4) where O(mdmax) is not a very accurate bound. A more accurate bound is given by the following amortized analysis. For every edge {u, v} we split the costd(u) +d(v)into two parts,d(u)andd(v), and assignd(u)touandd(v)to v. Thus in the outer loop a vertex vis passed d(v)times and the running time can equivalently be expressed as and so we can see that the running time of the edge iterator is the same as for the node-iterator.

A result from the experimental analysis of Schank and Wagner is that the edge-iteratorin practicehas the best running time of all the algorithms (includ-ing the ones not mentioned here). This is fortunate since we want to know the number of triangles that every edge is part of and this is easy to count with the approach of the edge-iterator.

A more detailed presentation of the edge-iterator algorithm is given in Al-gorithm 9. The alAl-gorithm assumes that the input graph is given both as an unsorted adjacency list and as an edge list. Since the adjacency list is not sorted, the algorithm starts by sorting the neighbor list of every vertex. The number of triangles an edge is part of is stored in a HashMap. The total number of triangles in the graph is also kept track of and the algorithm can easily be changed to return this number instead of the HashMap that it actually returns.

The default load factor of the Java class HashMap is0.75 and we will use this load factor in our algorithm since it gives a good tradeoff between time and space costs. We are so fortunate that we know the number of key-value pairs,

|E|, that we will store in the HashMap. By setting the initial capacity of the

HashMap to a number slightly larger than|E|/0.75, we ensure that no rehash operations will occur during the course of the algorithm.

NumTriangles 9:NumTriangles is an edge-iterator that iterates over every edgeeof the graph and counts the the number of triangles thateis part of.

input : A graphG= (V, E)represented both by an adjacency listAand an edge listEL.

output: A HashMap storing the number of triangles that every edge of the graph is part of.

forv∈V do SortA[v];

totalN umT riangles←0;

LetN T be a HashMap storing the number of triangles that every edge is part of;

for{u, v} ∈ELdo

numEdgeT riangles←0;

pu←0;

pv←0;

whilepu< A[u].size()orpv < A[v].size()do if A[u][pu] =A[v][pv] then

+ +totalN umT riangles;

+ +numEdgeT riangles;

+ +pu; + +pv;

else if A[u][pu]< A[v][pv] then + +pu;

else if A[u][pu]> A[v][pv] then + +pv;

N T.put({u, v}, numEdgeT riangles);

ReturnNT