example with data from csv file or pandas? #180

Guidosalimbeni · 2021-02-11T19:34:59Z

Guidosalimbeni
Feb 11, 2021

I wonder if you can provide an example with data input in a csv file? I understand the ease of providing tutorials using your custom dataset but it wasn't very easy for me to understand how to use my own data. If I need to create a Spektral Dataset and save it to disk this probably is not a good alternative for me.

Would be great if you can also provide an example on how to do regression instead of classification.

thanks
Guido

danielegrattarola · 2021-02-12T08:50:22Z

danielegrattarola
Feb 12, 2021
Maintainer

Hi Guido,

You don't have to use datasets, at all. All layers in Spektral take as input simple TensorFlow tensors, so as long as you know how to create the proper input you can completely ignore the dataset/loader interface.

If you don't want to deal with tensors manually, then datasets they are just there to interface whatever data you have with the rest of Spektral easily, they are not meant to be saved to disk.
So, if you have a CSV you just need to create a dataset that reads the CSV and you're done.

In any case, your CSV should be structured as follows.

Let's say that you have a graph with N nodes. Depending on the graph, you can have

an adjacency matrix: N rows, N columns, A[i, j] == 1 if there's an edge between node i and node j, 0 otherwise;
a matrix containing node attributes: N rows, F columns (one for each node attribute), and real values;
a matrix containing edge attributes: it can take different forms, we'll ignore it for now.

Most graphs will have at least the first two.
So to represent our graph with CSVs you could do something like this.

Adjacency matrix CSV

node_0, node_1, ..., node_n
0, 1, ..., 1
1, 0, ..., 0
...
1, 0, ..., 0

In this case, to create an adjacency matrix from this file you do

adjacency = pd.read_csv("path/to/adjacency.csv").values  # You might want to convert to a sparse matrix afterwards

Note that this is very expensive in terms of memory and that a much better way is to store only the sparse indices where your adjacency matrix is non-zero, i.e., the pairs i, j such that A[i, j] != 0. In this case, you have as many rows as there are edges:

i, j
0, 1  # edge from node 0 to node 1
0, 2
0, 5
1, 0
1, 2
...
5, 0

And to create a sparse adjacency matrix you can do:

indices = pd.read_csv("path/to/sparse_adjacency.csv").values
row, col = indices.T
data = np.ones_like(row)
adjacency = scipy.sparse.csr_matrix((data, (row, col))

Now you have the adjacency matrix of the graph.

If you want to use it in a model, you might also want to convert it to a sparse Tensor (not needed if you use a Dataset + Loader):

from spektral.layers import ops
adjacency = ops.sp_matrix_to_sp_tensor(adjacency)

Node attributes CSV

attr_1, attr_2, ..., attr_f
0.4, 23, ..., 1
0.5, 12, ..., 0
...
0.7, 99, ..., 1

Again, you can read it as:

node_attributes = pd.read_csv("path/to/node_attributes.csv").values

And that's it, you have the node attributes of the graph.

Targets

Note that in most cases you will also have the targets for training your model. Let's just create dummy targets for the sake of example:

y = np.random.rand(node_attributes.shape[0], 1)

Using the matrices

Now that you have adjacency, node_attributes, and y, you can do two things.

Let's say that you want to do node regression (i.e., for each node you want to predict a continuous variable) and you have a Keras model based on Spektral:

from spektral.models import GCN
model = GCN(1, output_activation=None)  # no activation to do regression

If you want to write your own training loop, you only need to know how to call the model:

output = model([node_attributes, adjacency])

and from here you can write your own training script (see here).

Otherwise, you can use the spektral.data API:

from spektral.data import Graph, Dataset, SingleLoader

Create a custom dataset with your inputs and targets:

class MyDataset(Dataset):
    def read(self):
        return [Graph(x=node_attributes, a=adjacency, y=y)]

dataset = MyDataset()

Train the model using a SingleLoader:

loader = SingleLoader(dataset)
model.compile("adam", "mse")
model.fit(loader.load(), steps_per_epoch=loader.steps_per_epoch)

And that's all.
Let me know if this helps.

Cheers
Daniele

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example with data from csv file or pandas? #180

{{title}}

Replies: 1 comment

{{title}}

Select a reply

example with data from csv file or pandas? #180

Guidosalimbeni Feb 11, 2021

Replies: 1 comment

danielegrattarola Feb 12, 2021 Maintainer

Adjacency matrix CSV

Node attributes CSV

Targets

Using the matrices

Guidosalimbeni
Feb 11, 2021

danielegrattarola
Feb 12, 2021
Maintainer