Leaders, followers, and community detection
[摘要] Communities in social interaction networks or graphs are sets of well-connected, and very often overlapping vertices. Formally, we view any maximal clique of the social network graph as a community. The problem of finding maximal cliques is known to be computationally hard. The goal of this work is to identify structural conditions in social network graphs that lead to efficient identification of maximal cliques, i.e. overlapping communities. We propose an evolutionary model called sequential community graphs for community formation in social networks. In a sequential community graph, each node enters the graph by either joining an existing community, or creating its own. To discover communities, i.e. maximal cliques, in such graphs, we present the non-parametric Iterative Leader-Follower Algorithm (ILFA). We establish that the ILFA finds all the communities/maximal cliques correctly in the sequential community graph model in polynomial time in the number of vertices in the graph. To scale to very large data sets, we propose a minor simplification of the ILFA, called the fast leader-follower algorithm (FLFA) which effectively runs in linear time in the input data size, and finds all communities correctly for sequential community graphs with an additional constraint. Empirically, the FLFA and IFLA perform nearly the same in terms of accuracy, but the FLFA runs nearly three orders of magnitude faster. We find that the sequential community graph model is a good fit for a wide variety of social networks where users can be modeled as entering the graph by joining existing communities or creating their own. In such social networks, we demonstrate that the FLFA and ILFA outperform other state of the art algorithms both in terms of speed and accuracy. For example, in the Internet Movie Database (IMDB) graph where communities naturally correspond to actors in the same movie, our algorithms finds nearly all ground truth communities correctly while all other known community detection algorithms do very poorly. Similar empirical results are found for various other social data sets. This supports our hypothesis that we can model many social graphs as sequential community graphs and accurately detect their communities using the ILFA or FLFA.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]