diff --git a/topobenchmark/transforms/liftings/graph2graph/latent_clique.py b/topobenchmark/transforms/liftings/graph2graph/latent_clique.py index 5d98c116..0643dc52 100644 --- a/topobenchmark/transforms/liftings/graph2graph/latent_clique.py +++ b/topobenchmark/transforms/liftings/graph2graph/latent_clique.py @@ -1,7 +1,108 @@ +r"""This module implements the LatentGraphLifting class. + +**Background** + +A graph is sparse if its number of edges grows proportional to the number +of nodes. Many real-world graphs are sparse, but they contain many densely +connected subgraphs and exhibit high clustering coefficients. Moreover, +such real-world graphs frequently exhibit the small-world property, where +any two nodes are connected by a short path of length proportional to the +logarithm of the number of nodes. For instance, these are well-known properties +of social networks, biological networks, and the Internet. + + +**Contributions** + +In this module, we present a novel random lifting procedure from graphs to +graphs. The procedure is based on a relatively recent proposed Bayesian +nonparametric random graph model for random clique covers [WT2020]_. +Specifically, the model can learn latent clique complexes that are consistent +with the input graph. The model can capture power-law degree distribution, +global sparsity, and non-vanishing local clustering coefficient. +Its small-world property is also guaranteed, which is a very attractive property +for Topological Deep Learning (TDL). + +In the original work [WT2020]_, the distribution has been used as a prior on an +observed input graph. In particular, in the Bayesian setting, the model is useful +to obtain a distribution on latent clique complexes, i.e. a specific class of +simplicial complexes, whose 1-skeleton structural properties are consistent with +the ones of the input graph used to compute the likelihood. Indeed, one of the +features of the posterior distribution from which the latent complex is sampled +is that the set of latent 1-simplices (edges) is a superset of the set of edges of +the input graph. + + +**The random clique cover model** + +Let :math:`G = (V, E)` be a graph with :math:V: the set of vertices and +:math:`E` the set of edges. +Denote the number of nodes as :math:`N=|V|`. +A clique cover can be described as a matrix :math:`Z` of size +:math:`K \times N` where :math:`K` is the number of cliques such that +:math:`Z_{k,i}=1` if node :math:`i` is in clique :math:`k` +and :math:`Z_{k,i}=0` otherwise. +The Random Clique Cover (RCC) Model, defined in [WT2020]_, is a probabilistic +model for the matrix :math:`Z`. +This matrix can have an infinite number of rows and columns, but only a +finite number of them will be active. The model is based on the Indian Buffet +Process (IBP), which is a distribution over binary matrices with a possibly +infinite number of rows and columns, or more specifically, the Stable Beta IBP +as described in [5]. While the mathematics behind the IBP are complex, the model +admits a highly intuitive representation describe below. + +First, recall that a clique is a fully connected subset of vertices. +Therefore, a clique cover :math:`Z` induces an adjacency matrix by the formula +:math:`A = \min(ZTZ - \diag(ZTZ), 1)`, +where :math:`\min` is the element-wise minimum. +The IBP model can be described recursively as follows: + +Conditional on :math:`Z_1, Z_2, \cdots Z_{K-1}`, +where :math:`Z_j` is the :math:`j`-th row of :math:`Z`. +Then, :math:`Z_K` is drawn as follows: + +#. :math:`Z_K` will contain new unobserved nodes according to a distribution: + + .. math: + + Z_K|Z_1, Z_2, \cdots Z_{K-1} \sim \mathrm{Poisson}(\alpha \Gamma(1+c) \Gamma(N+c+\sigma-1) \Gamma(N+\sigma) \Gamma(c+\sigma)) + +#. | The probability that a previously observed node :math:`n` will belong to + | :math:`K` is proportional to how many cliques it is already in. + | Specifically, letting :math:`m_i = \Sigma_k = 1K - 1Z_{k, i}`, then + | :math:`P(Z_K, i=1|Z_1, Z_2, \cdots Z_{K-1}) = m_i \sigma K + c - 1`. + +The last expression is highly intuitive in the sense that the number of cliques +that a node will appear in is proportional to the number of cliques it is already in. + +The RCC model depends on four parameters :math:`\alpha`, :math:`c`, +:math:`\sigma`, :math:`\pi`. +The first three parameters are part of the IBP. Explaining them in detail is +beyond the scope of this notebook. +However, the reader may see [TG2009]_. +Fortunately, the learned (posterior) values of :math:`\alpha`, :math:`\sigma`, +:math:`c` are strongly determined by the data itself. +By contrast, :math:`pi` is approximately the probability that an edge is missing +from the graph. Generally, the lower :math:`\pi` is, the lower the number of +cliques will be and the less interconnected the nodes of the clique will be. + +Importantly, by leveraging the possibility of latent inferred edges, one will +superimpose the small-world property on the graph. + + +References +---------- +.. [WT2020] Williamson, S.A., Tec, M., 2020. Random Clique Covers for Graphs with Local + Density and Global Sparsity, in: Proceedings of The 35th Uncertainty in Artificial + Intelligence Conference. + Presented at the Uncertainty in Artificial Intelligence, PMLR, pp. 228--238. + http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf +.. [TG2009] Teh, Y., Gorur, D., 2009. Indian Buffet Processes with Power-law Behavior, + in: Advances in Neural Information Processing Systems. Curran Associates, Inc. +""" + import networkx as nx import numpy as np from scipy import stats -from scipy.sparse import csr_matrix from scipy.special import gammaln, logsumexp from tqdm.auto import tqdm @@ -87,20 +188,19 @@ def lift(self, domain): class _LatentCliqueModel: """Latent clique cover model for network data corresponding to the - Partial Observability Setting of the Random Clique Cover (Williamson & Tec, 2020) paper. + Partial Observability Setting of the Random Clique Cover of [WT2020]_. - Williamson & Tec (2020). "Random clique covers for graphs with local density and global sparsity". UAI 2020. - http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf - The model is based on the Stable Beta-Indian Buffet Process (SB-IBP). See Teh and Gorur (2010), - "Indian Buffet Processes with Power-Law Behavior", NIPS 2010 for additional reference. + The model is based on the Stable Beta-Indian Buffet Process (SB-IBP) [TG2009]_. - The model depends on four parameters: alpha, sigma, c, and pie. The parameters - alpha, sigma and c arepart of the SB-IBP and are described in Williamson & Tec (2020) and - Teh & Gorur (2010) with the same names. The parameter pie is was introduced by Williamson & Tec (2020) - and is a parameter for the model that determines the prior probability that an edge is unobserved. + The model depends on four parameters: alpha, sigma, c, and pie. + The parameters alpha, sigma and c arepart of the SB-IBP and are described in + [WT2020]_ and [TG2009]_ with the same names. + The parameter pie is was introduced by [WT2020]_ + and is a parameter for the model that determines the prior probability that + an edge is unobserved. - The following properties of a Random Clique Cover model are useful to interpret the - parameters alpha, c, and sigma. + The following properties of a Random Clique Cover model are useful to + interpret the parameters alpha, c, and sigma. Parameters ---------- @@ -144,6 +244,16 @@ class _LatentCliqueModel: likelihood computation. **Note**: The values of (K, N) are used interchanged from the paper notation. + + References + ---------- + .. [WT2020] Williamson, S.A., Tec, M., 2020. Random Clique Covers for Graphs with Local + Density and Global Sparsity, in: Proceedings of The 35th Uncertainty in Artificial + Intelligence Conference. + Presented at the Uncertainty in Artificial Intelligence, PMLR, pp. 228--238. + http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf + .. [TG2009] Teh, Y., Gorur, D., 2009. Indian Buffet Processes with Power-law Behavior, + in: Advances in Neural Information Processing Systems. Curran Associates, Inc. """ def __init__( diff --git a/topobenchmark/transforms/liftings/graph2simplicial/latent_clique.py b/topobenchmark/transforms/liftings/graph2simplicial/latent_clique.py index 49e26212..f6923527 100755 --- a/topobenchmark/transforms/liftings/graph2simplicial/latent_clique.py +++ b/topobenchmark/transforms/liftings/graph2simplicial/latent_clique.py @@ -1,3 +1,44 @@ +r"""This module implements the LatentCliqueLifting class. + +In the context of Topological Deep Learning [PBB2024]_[HZP2023]_, +and the very recently emerged paradigm of Latent Topology Inference +(LTI) [BST2023]_, it is natural to look at the model in [WT2020]_ as a +novel LTI method able to infer a random latent simplicial complex from an +input graph. Or, in other words, to use [WT2020]_ as a novel random lifting +procedure from graphs to simplicial complexes. + +To summarize, this is: + +* a non-deterministic lifting +* not present in the literature as a lifting procedure +* based on connectivity +* | modifying the initial connectivity of the graph by + | adding edges (thus, this can be also considered as a graph rewiring method). + +The lifting ensures both 1) small-world property and 2) edge/cell sparsity. +Combining these two properties is very attractive for Topological Deep Learning (TDL) +because it ensures computational efficiency due to the reduced number of higher-order +connections: only a few message-passing layers connect any two nodes. + +References +---------- +.. [WT2020] Williamson, S.A., Tec, M., 2020. Random Clique Covers for Graphs with Local + Density and Global Sparsity, in: Proceedings of The 35th Uncertainty in Artificial + Intelligence Conference. + Presented at the Uncertainty in Artificial Intelligence, PMLR, pp. 228--238. + http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf +.. [PBB2024] Papamarkou, T., Birdal, T., Bronstein, M., et al., 2024. + Position Paper: Challenges and Opportunities in Topological Deep Learning. + https://doi.org/10.48550/arXiv.2402.08871 +.. [HZP2023] Hajij, M., Zamzmi, G., Papamarkou, T., et al., 2023. + Topological Deep Learning: Going Beyond Graph Data. + https://doi.org/10.48550/arXiv.2206.00606 +.. [BST2023] Battiloro, C., Spinelli, I., Telyatnikov, L., et al., 2023. + From Latent Graph to Latent Topology Inference: Differentiable Cell + Complex Module. Presented at the The Twelfth International Conference + on Learning Representations. +""" + from topobenchmark.transforms.liftings.base import ComposedLiftingMap from topobenchmark.transforms.liftings.graph2graph.latent_clique import ( LatentGraphLifting,