Skip to content

Commit

Permalink
Add docstrings to LatentCliqueLifting and related modules
Browse files Browse the repository at this point in the history
  • Loading branch information
luisfpereira committed Jan 28, 2025
1 parent 8197646 commit a3be6a4
Show file tree
Hide file tree
Showing 2 changed files with 163 additions and 12 deletions.
134 changes: 122 additions & 12 deletions topobenchmark/transforms/liftings/graph2graph/latent_clique.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,108 @@
r"""This module implements the LatentGraphLifting class.
**Background**
A graph is sparse if its number of edges grows proportional to the number
of nodes. Many real-world graphs are sparse, but they contain many densely
connected subgraphs and exhibit high clustering coefficients. Moreover,
such real-world graphs frequently exhibit the small-world property, where
any two nodes are connected by a short path of length proportional to the
logarithm of the number of nodes. For instance, these are well-known properties
of social networks, biological networks, and the Internet.
**Contributions**
In this module, we present a novel random lifting procedure from graphs to
graphs. The procedure is based on a relatively recent proposed Bayesian
nonparametric random graph model for random clique covers [WT2020]_.
Specifically, the model can learn latent clique complexes that are consistent
with the input graph. The model can capture power-law degree distribution,
global sparsity, and non-vanishing local clustering coefficient.
Its small-world property is also guaranteed, which is a very attractive property
for Topological Deep Learning (TDL).
In the original work [WT2020]_, the distribution has been used as a prior on an
observed input graph. In particular, in the Bayesian setting, the model is useful
to obtain a distribution on latent clique complexes, i.e. a specific class of
simplicial complexes, whose 1-skeleton structural properties are consistent with
the ones of the input graph used to compute the likelihood. Indeed, one of the
features of the posterior distribution from which the latent complex is sampled
is that the set of latent 1-simplices (edges) is a superset of the set of edges of
the input graph.
**The random clique cover model**
Let :math:`G = (V, E)` be a graph with :math:V: the set of vertices and
:math:`E` the set of edges.
Denote the number of nodes as :math:`N=|V|`.
A clique cover can be described as a matrix :math:`Z` of size
:math:`K \times N` where :math:`K` is the number of cliques such that
:math:`Z_{k,i}=1` if node :math:`i` is in clique :math:`k`
and :math:`Z_{k,i}=0` otherwise.
The Random Clique Cover (RCC) Model, defined in [WT2020]_, is a probabilistic
model for the matrix :math:`Z`.
This matrix can have an infinite number of rows and columns, but only a
finite number of them will be active. The model is based on the Indian Buffet
Process (IBP), which is a distribution over binary matrices with a possibly
infinite number of rows and columns, or more specifically, the Stable Beta IBP
as described in [5]. While the mathematics behind the IBP are complex, the model
admits a highly intuitive representation describe below.
First, recall that a clique is a fully connected subset of vertices.
Therefore, a clique cover :math:`Z` induces an adjacency matrix by the formula
:math:`A = \min(ZTZ - \diag(ZTZ), 1)`,
where :math:`\min` is the element-wise minimum.
The IBP model can be described recursively as follows:
Conditional on :math:`Z_1, Z_2, \cdots Z_{K-1}`,
where :math:`Z_j` is the :math:`j`-th row of :math:`Z`.
Then, :math:`Z_K` is drawn as follows:
#. :math:`Z_K` will contain new unobserved nodes according to a distribution:
.. math:
Z_K|Z_1, Z_2, \cdots Z_{K-1} \sim \mathrm{Poisson}(\alpha \Gamma(1+c) \Gamma(N+c+\sigma-1) \Gamma(N+\sigma) \Gamma(c+\sigma))
#. | The probability that a previously observed node :math:`n` will belong to
| :math:`K` is proportional to how many cliques it is already in.
| Specifically, letting :math:`m_i = \Sigma_k = 1K - 1Z_{k, i}`, then
| :math:`P(Z_K, i=1|Z_1, Z_2, \cdots Z_{K-1}) = m_i \sigma K + c - 1`.
The last expression is highly intuitive in the sense that the number of cliques
that a node will appear in is proportional to the number of cliques it is already in.
The RCC model depends on four parameters :math:`\alpha`, :math:`c`,
:math:`\sigma`, :math:`\pi`.
The first three parameters are part of the IBP. Explaining them in detail is
beyond the scope of this notebook.
However, the reader may see [TG2009]_.
Fortunately, the learned (posterior) values of :math:`\alpha`, :math:`\sigma`,
:math:`c` are strongly determined by the data itself.
By contrast, :math:`pi` is approximately the probability that an edge is missing
from the graph. Generally, the lower :math:`\pi` is, the lower the number of
cliques will be and the less interconnected the nodes of the clique will be.
Importantly, by leveraging the possibility of latent inferred edges, one will
superimpose the small-world property on the graph.
References
----------
.. [WT2020] Williamson, S.A., Tec, M., 2020. Random Clique Covers for Graphs with Local
Density and Global Sparsity, in: Proceedings of The 35th Uncertainty in Artificial
Intelligence Conference.
Presented at the Uncertainty in Artificial Intelligence, PMLR, pp. 228--238.
http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf
.. [TG2009] Teh, Y., Gorur, D., 2009. Indian Buffet Processes with Power-law Behavior,
in: Advances in Neural Information Processing Systems. Curran Associates, Inc.
"""

import networkx as nx
import numpy as np
from scipy import stats
from scipy.sparse import csr_matrix
from scipy.special import gammaln, logsumexp
from tqdm.auto import tqdm

Expand Down Expand Up @@ -87,20 +188,19 @@ def lift(self, domain):

class _LatentCliqueModel:
"""Latent clique cover model for network data corresponding to the
Partial Observability Setting of the Random Clique Cover (Williamson & Tec, 2020) paper.
Partial Observability Setting of the Random Clique Cover of [WT2020]_.
Williamson & Tec (2020). "Random clique covers for graphs with local density and global sparsity". UAI 2020.
http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf
The model is based on the Stable Beta-Indian Buffet Process (SB-IBP). See Teh and Gorur (2010),
"Indian Buffet Processes with Power-Law Behavior", NIPS 2010 for additional reference.
The model is based on the Stable Beta-Indian Buffet Process (SB-IBP) [TG2009]_.
The model depends on four parameters: alpha, sigma, c, and pie. The parameters
alpha, sigma and c arepart of the SB-IBP and are described in Williamson & Tec (2020) and
Teh & Gorur (2010) with the same names. The parameter pie is was introduced by Williamson & Tec (2020)
and is a parameter for the model that determines the prior probability that an edge is unobserved.
The model depends on four parameters: alpha, sigma, c, and pie.
The parameters alpha, sigma and c arepart of the SB-IBP and are described in
[WT2020]_ and [TG2009]_ with the same names.
The parameter pie is was introduced by [WT2020]_
and is a parameter for the model that determines the prior probability that
an edge is unobserved.
The following properties of a Random Clique Cover model are useful to interpret the
parameters alpha, c, and sigma.
The following properties of a Random Clique Cover model are useful to
interpret the parameters alpha, c, and sigma.
Parameters
----------
Expand Down Expand Up @@ -144,6 +244,16 @@ class _LatentCliqueModel:
likelihood computation.
**Note**: The values of (K, N) are used interchanged from the paper notation.
References
----------
.. [WT2020] Williamson, S.A., Tec, M., 2020. Random Clique Covers for Graphs with Local
Density and Global Sparsity, in: Proceedings of The 35th Uncertainty in Artificial
Intelligence Conference.
Presented at the Uncertainty in Artificial Intelligence, PMLR, pp. 228--238.
http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf
.. [TG2009] Teh, Y., Gorur, D., 2009. Indian Buffet Processes with Power-law Behavior,
in: Advances in Neural Information Processing Systems. Curran Associates, Inc.
"""

def __init__(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,44 @@
r"""This module implements the LatentCliqueLifting class.
In the context of Topological Deep Learning [PBB2024]_[HZP2023]_,
and the very recently emerged paradigm of Latent Topology Inference
(LTI) [BST2023]_, it is natural to look at the model in [WT2020]_ as a
novel LTI method able to infer a random latent simplicial complex from an
input graph. Or, in other words, to use [WT2020]_ as a novel random lifting
procedure from graphs to simplicial complexes.
To summarize, this is:
* a non-deterministic lifting
* not present in the literature as a lifting procedure
* based on connectivity
* | modifying the initial connectivity of the graph by
| adding edges (thus, this can be also considered as a graph rewiring method).
The lifting ensures both 1) small-world property and 2) edge/cell sparsity.
Combining these two properties is very attractive for Topological Deep Learning (TDL)
because it ensures computational efficiency due to the reduced number of higher-order
connections: only a few message-passing layers connect any two nodes.
References
----------
.. [WT2020] Williamson, S.A., Tec, M., 2020. Random Clique Covers for Graphs with Local
Density and Global Sparsity, in: Proceedings of The 35th Uncertainty in Artificial
Intelligence Conference.
Presented at the Uncertainty in Artificial Intelligence, PMLR, pp. 228--238.
http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf
.. [PBB2024] Papamarkou, T., Birdal, T., Bronstein, M., et al., 2024.
Position Paper: Challenges and Opportunities in Topological Deep Learning.
https://doi.org/10.48550/arXiv.2402.08871
.. [HZP2023] Hajij, M., Zamzmi, G., Papamarkou, T., et al., 2023.
Topological Deep Learning: Going Beyond Graph Data.
https://doi.org/10.48550/arXiv.2206.00606
.. [BST2023] Battiloro, C., Spinelli, I., Telyatnikov, L., et al., 2023.
From Latent Graph to Latent Topology Inference: Differentiable Cell
Complex Module. Presented at the The Twelfth International Conference
on Learning Representations.
"""

from topobenchmark.transforms.liftings.base import ComposedLiftingMap
from topobenchmark.transforms.liftings.graph2graph.latent_clique import (
LatentGraphLifting,
Expand Down

0 comments on commit a3be6a4

Please sign in to comment.