Add datasets module to load and generate toy datasets #345

lewtun · 2020-03-02T09:31:41Z

Description

scikit-learn has a datasets module that provides handy utility functions to load and generate toy datasets. These functions feature prominently in the scikit-learn examples and it would be nice to have a similar functionality in giotto-tda.

Suggestions for synthetic datasets include:

make_point_clouds: Generate an array of spheres and tori in 3-dimensions with corresponding label (useful for showing persistent homology + shape classification).
make_time_series: Generate an array periodic and non-periodic time series with corresponding label (useful for showing sliding window embeddings and time series classification).

Suggestions for point cloud and graph datasets could take inspiration from PyTorch geometric's dataset module

The text was updated successfully, but these errors were encountered:

ammedmar · 2020-03-03T11:13:39Z

This would be good. @gtauzin and I started doing something along the make_point_clouds methods you envisioning and manage to get a few nice spaces and constructions on spaces. The reason this was not completed was the lack of uniformity of the sampling. In order to get this done well, the probability function has to be modified by a hessian term associated to the parametrization of the curved space. Maybe we can revisit this point sometime.

lewtun · 2020-03-03T18:48:10Z

Cool, it seems you guys went for the hardcore version :) All I had in mind were spheres and tori with gaussian noise added, but perhaps this is too limiting.

If you have some Python code lying around, you could make GitHub gist and link it in these comments.

ammedmar · 2020-03-03T20:01:37Z

The code is not so important, specially since it doesn't do what one would really like it to do, but since you asked, I am sending code that samples a point cloud near the real projective plane embedded in R4.

To get this thing properly done, what we need is a method that can sample an interval according to a costume, non necessarily uniform, probability distribution function. Any leads on something like this?

The first part of this notebook has the sampling functions for S2 and RP2. I just run it and the plotting still works.

wreise · 2020-03-04T10:12:52Z

I wanted to have a look at the notebook, but i do not have access rights- you should receive an email requesting them.

For sampling from arbitrary densities, something like Metropolis-Hastings? Or, if the density is represented as a discretize array, maybe inverse transform sampling?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add datasets module to load and generate toy datasets #345

Add datasets module to load and generate toy datasets #345

lewtun commented Mar 2, 2020 •

edited

Loading

ammedmar commented Mar 3, 2020

lewtun commented Mar 3, 2020 •

edited

Loading

ammedmar commented Mar 3, 2020

wreise commented Mar 4, 2020

Add datasets module to load and generate toy datasets #345

Add datasets module to load and generate toy datasets #345

Comments

lewtun commented Mar 2, 2020 • edited Loading

Description

ammedmar commented Mar 3, 2020

lewtun commented Mar 3, 2020 • edited Loading

ammedmar commented Mar 3, 2020

wreise commented Mar 4, 2020

lewtun commented Mar 2, 2020 •

edited

Loading

lewtun commented Mar 3, 2020 •

edited

Loading