Human Language Technology Center of Excellence
Department of Applied Mathematics and Statistics
Center for Imaging Science
Johns Hopkins University
V. Lyzinski, M. Tang, A. Athreya, Y. Park, C.E. Priebe, “Community Detection and Classification in Hierarchical Stochastic Blockmodels,” IEEE Transactions on Network Science and Engineering, vol. 4, no. 1, pp 13-26, 2017. publisher site
We propose a robust, scalable, integrated methodology for community detection and community comparison in graphs. In our procedure, we first embed a graph into an appropriate Euclidean space to obtain a low-dimensional representation, and then cluster the vertices into communities. We next employ nonparametric graph inference techniques to identify structural similarity among these communities. These two steps are then applied recursively on the communities, allowing us to detect more fine-grained structure. We describe a hierarchical stochastic blockmodel — namely, a stochastic blockmodel with a natural hierarchical structure — and establish conditions under which our algorithm yields consistent estimates of model parameters and motifs, which we define to be stochastically similar groups of sub-graphs. Finally, we demonstrate the effectiveness of our algorithm in both simulated and real data. Specifically, we address the problem of locating similar subcommunities in a partially reconstructed Drosophila connectome and in the social network Friendster.
The following algorithm is our methodology for identifying and estimating the structural properties of repeated motifs.
Detecting hierarchical structure for graphs
Input: Adjacency matrix \(A \in \{0,1\}^{n\times n}\) for a latent position random graph.
Output: Subgraphs and characterization of their dissimilarity
while Cluster size exceeds threshold do
Step 1: Compute the adjacency spectral embedding \(\hat{X}\);
Step 2: Project the rows of \(\hat{X}\) onto the sphere yielding \(\hat{Y}\); i.e., for each \(i \in [n]\), \(\hat{Y}_i := \hat{X}_i / \|\hat{X}_i\|_2\);
Step 3: Cluster \(\hat{Y}\) to obtain subgraphs \(\hat{H}_1,\cdots,\hat{H}_R\);
Step 4: For each \(i\in R\), use ASE to re-embed \(\hat{H}_i\), obtaining \(\hat{X}_{\hat{H}_i}\);
Step 5: Compute \(\hat{S} :=[T_{\hat{n}_{r},\hat{n}_{s}}(\hat{X}_{\hat{H}_r},\hat{X}_{\hat{H}_s})]\) producing a pairwise dissimilarity matrix on induced subgraphs;
Step 6: Cluster induced subgraphs into motifs according to \(\hat{S}\);
Step 7: Recurse on each motif;
end while
Definition 5: Let \((A,X) \sim RDPG(F)\) and \((B,Y) \sim RDPG(G)\). We say that \(A\) and \(B\) are of the same motif if there exists a unitary transformation \(U\) such that \(F = G \circ U\).
Corollary 8: Clustering \(\hat{S}\) matrix yields a consistent clustering of \(\{\hat{H}_i\}^{R}_{i=1}\) into motifs.
vlclust
version is here.vlclust
version is here.Friendster
network data: Motif detection in the Friendster network data in Section 5.2 in the manuscript.R
PackageThe latest R
source package can be downloaded from here.
It can be installed via:
if (!require(hsbm)) {
install.packages("http://www.cis.jhu.edu/~parky/HSBM/hsbm_0.1.0.tar.gz",type="source",method="wget")
library(hsbm)
}
and the simulation demo can be run by typing
demo(hsbm)
library(help='hsbm')
## Information on package 'hsbm'
##
## Description:
##
## Package: hsbm
## Type: Package
## Title: Hierarchical Stochastic Blockmodels
## Version: 0.1.0
## Depends: R (>= 3.0)
## Imports: igraph, Matrix, lattice, irlba
## Author: Minh Tang, Youngser Park
## Maintainer: Youngser Park <[email protected]>
## Description: Routines to perform community detection and
## classification in Hierarchical Stochastic
## Blockmodels
## License: GPL (>= 2)
## URL: http://www.cis.jhu.edu/~parky/HSBM/
## LazyData: TRUE
## RoxygenNote: 5.0.0
## Built: R 3.3.2; ; 2016-12-14 15:55:56 UTC; unix
##
## Index:
##
## computeS auxiliary functions to compute the kernel test
## statistic.
## find.transform auxiliary functions to compute the kernel test
## statistic.
## getElbows Dimensionality selection for singular values
## using profile likelihood.
## kernel.stat auxiliary functions to compute the kernel test
## statistic.
## my.image2 auxiliary functions for visualization
## plotmemb auxiliary functions for visualization
## rect.dist auxiliary functions to compute the kernel test
## statistic.
## reembed auxiliary functions to compute the kernel test
## statistic.
## rg.sample Make a random graph
## stfp Spectral Embedding of Adjacency Matrices
## vlclust Angle based clustering
prepared by [email protected] on Sat Mar 25 12:11:05 2017