• No results found

Improving DBLP Communities

10.2 Further research and application improvements

10.2.2 Improving DBLP Communities

Many improvements can be made to DBLP Communities. However, we think that the following three improvements are the most important.

• A database supporting community detection in both unweighted and weighted collaboration networks in the DBLP database.

• An API offering researchers to easily implement and incorporate their algorithms into DBLP Communities. Thus DBLP Communities could be made into a collaborative testing ground for the community detection community.

• A rating mechanism that harvests user ratings on the combination of al-gorithm, settings, and collaboration network. Thus one can get a clearer picture on which algorithms and which settings that achieves the best results on each network.

Appendices

Appendix A

Graph theory

In this chapter we present the very basic graph theory – nothing more than is needed to keep pace with the flow of this master thesis.

A.1 Basic terminology

Figure A.1: An example of a graph.

A graph is a pair G = (V, E) of sets, where V is a finite set of vertices (or nodes) and E ⊆ V2

is a collection of edges. Thus, E is a collection of 2-element subsets of V. The edge between vertices 1 and 2 is denoted {1,2} and is equivalent to {2,1}. Note that Ebeing a collection allows multiple edges be-tween a fixed pair of vertices. Vertices are usually drawn as dots and edges as lines be-tween two of these dots if the corresponding vertices form an edge, see Figure A.1. If an edgeejoins two verticesxandy, thenxandy are said to beadjacent andeisincident with bothxand y. An edge joining a vertexxto itself is called aloop. A graph without loops

or multiple edges is called asimple graph. The number of edges incident with a vertexv is called the degree ofv and is denotedd(v). A vertex of degree 1 is called aleaf.

A simple graph withnvertices and where every pair of vertices is connected by a single edge is called acomplete graph and is denoted by Kn. K3 is the complete graph on three vertices and is sometimes called atriangle.

If G = (V, E) and G0 = (V0, E0) are two graphs such that V0 ⊆ V and E0⊆E, thenG0 is asubgraphofGand we write this asG0⊆G. IfG0⊆Gand G0 contains every edgeuv∈E whereu, v∈V0, thenG0 is aninduced subgraph ofG.

Figure A.2: A graph with three connected components.

A sequence of edges in a graph G of the form v0v1, v1v2, v2v3, ..., vn−1vn or equiv-alently v0 → v1 → v2 → ... → vn is called a walk in G. v0 is the initial vertex of the walk, vn is the final vertex of the walk, and nis thelength of the walk. A walk where no two edges are equal is called a trail. A trail where no two vertices are equal, except pos-siblyv0=vn, is called apath. A path where v0 = vn is called a cycle. The shortest path between two verticesuandv is the minimum length path betweenuand v. The length of such a path is called thegeodesic distance be-tweenuandv. When computing the shortest path between every pair of vertices, the diam-eter of the graph is the maximum length shortest path. A graphGisconnected if there is a path between every pair of vertices in G. A graph that is not connected consists of a number of connected components, see Figure A.2.

Two vertices areneighbors if they are connected by an edge. The neighbor-hood of a vertexv is denotedN(v)and is the set of every neighbor ofv.

Bibliography

[1] A Arenas, A Fernández, S Fortunato, and S Gómez. Motif-based com-munities in complex networks. Journal of Physics A: Mathematical and Theoretical, 41(22):224001, 2008.

[2] Stephen P. Borgatti, Martin G. Everett, and Paul R. Shirey. {LS} sets, lambda sets and other cohesive subsets. Social Networks, 12(4):337 – 357, 1990.

[3] Ulrik Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 2001.

[4] C. Castellano, F. Cecconi, V. Loreto, D. Parisi, and F. Radicchi. Self-contained algorithms to detect communities in networks. The European Physical Journal B - Condensed Matter and Complex Systems, 38(2):311–

319, 2004.

[5] Aaron Clauset, M. E. J. Newman, and Cristopher Moore. Finding com-munity structure in very large networks. Phys. Rev. E, 70:066111, Dec 2004.

[6] Reuven Cohen and Shlomo Havlin. Complex Networks: structure, robust-ness, and function. Cambridge University Press, New York, 2010.

[7] Leon Danon, Albert Díaz-Guilera, and Alex Arenas. The effect of size heterogeneity on community identification in complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2006(11):P11010, 2006.

[8] David Easley and Jon Kleinberg. Networks, Crowds, and Markets: Rea-soning About a Highly Connected World. Cambridge University Press, New York, 1st edition, 2010.

[9] Erlend Eindride Fasmer. Cnm on github. https://github.com/erlfas/

CNM.

[10] Erlend Eindride Fasmer. Dblp communities on github. https://github.

com/erlfas/DBLPCommunities.

[11] Santo Fortunato. Community detection in graphs. 2010.

[12] Santo Fortunato and Marc Barthélemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36–41, 2007.

[13] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Physical Sciences - Applied Mathematics, 2002.

[14] Benjamin H. Good, Yves-Alexandre de Montjoye, and Aaron Clauset. Per-formance of modularity maximization in practical contexts. Phys. Rev. E, 81:046106, Apr 2010.

[15] Roger Guimerà, Marta Sales-Pardo, and Lu´A. Nunes Amaral. Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E, 70:025101, Aug 2004.

[16] B. W. Kernighan and S. Lin. An efficient heuristic procedure for partition-ing graphs. Bell System Technical Journal, 1970.

[17] Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. Bench-mark graphs for testing community detection algorithms. Phys. Rev. E, 78:046110, Oct 2008.

[18] F. Luccio and M. Sami. On the decomposition of networks in mini-mally interconnected subnetworks. Circuit Theory, IEEE Transactions on, 16(2):184–188, May 1969.

[19] Duncan Luce. Connectivity and generalized cliques in sociometric group structure. 1950.

[20] R. Duncan Luce and Albert D. Perry. A method of matrix analysis of group structure. Psychometrika, 14(2):95–116, 1949.

[21] H. Matsuda, T. Ishihara, and A. Hashimoto. Classifying molecular se-quences using a linkage graph with their pairwise similarities. Theoretical Computer Science, 1999.

[22] Robert J. Mokken. Cliques, clubs and clans. Quality and Quantity, 13(2):161 – 173, 1979.

[23] M. E. J. Newman. Fast algorithm for detecting community structure in networks. Phys. Rev. E, 69:066133, Jun 2004.

[24] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, February 2004.

[25] M.E.J. Newman. Detecting community structure in networks. The Euro-pean Physical Journal, 2004.

[26] Vreda Pieterse and Paul E. Black. Sparse graph — dictionary of algorithms and data structures, August 2008.

[27] M. A. Porter. Small-world network. 7(2):1739, 2012. revision 142132.

[28] Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and Domenico Parisi. Defining and identifying communities in networks.

Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658–2663, 2004.

[29] Thomas Schank and Dorothea Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Conference on Experimental and Efficient Algorithms, WEA’05, pages 606–609, Berlin, Heidelberg, 2005. Springer-Verlag.

[30] Philipp Schuetz and Amedeo Caflisch. Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement. Phys. Rev. E, 77:046112, Apr 2008.

[31] Robert Sedgewick and Kevin Wayne. http://algs4.cs.princeton.edu/

24pq/IndexMinPQ.java.html. [Online; accessed April 2015].

[32] Stephen B. Seidman. Network structure and minimum degree. Social Net-works, 5(3):269 – 287, 1983.

[33] Stephen B. Seidman and Brian L. Foster. A graph-theoretic generalization of the clique concept. The Journal of Mathematical Sociology, 6(1), 1978.

[34] Stanford.edu. Single-link and complete-link clustering, 2015.

[35] Ken Wakita and Toshiyuki Tsurumi. Finding community structure in mega-scale social networks: [extended abstract]. In Proceedings of the 16th In-ternational Conference on World Wide Web, WWW ’07, pages 1275–1276, New York, NY, USA, 2007. ACM.

[36] Craig Walls. Spring in Action. Manning Publications Co, Shelter Island, 4th edition, 2014.

[37] Stanley Wasserman and Katherine Faust. Social Network Analysis, Meth-ods and Applications. Cambridge University Press, 1994.

[38] Wikipedia. Complete-linkage clustering, 2015.

[39] Wikipedia. Single-linkage clustering, 2015.