-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datasets module to load and generate toy datasets #345
Comments
This would be good. @gtauzin and I started doing something along the |
Cool, it seems you guys went for the hardcore version :) All I had in mind were spheres and tori with gaussian noise added, but perhaps this is too limiting. If you have some Python code lying around, you could make GitHub gist and link it in these comments. |
The code is not so important, specially since it doesn't do what one would really like it to do, but since you asked, I am sending code that samples a point cloud near the real projective plane embedded in R4. To get this thing properly done, what we need is a method that can sample an interval according to a costume, non necessarily uniform, probability distribution function. Any leads on something like this? The first part of this notebook has the sampling functions for S2 and RP2. I just run it and the plotting still works. |
I wanted to have a look at the notebook, but i do not have access rights- you should receive an email requesting them. For sampling from arbitrary densities, something like Metropolis-Hastings? Or, if the density is represented as a discretize array, maybe inverse transform sampling? |
Description
scikit-learn has a datasets module that provides handy utility functions to load and generate toy datasets. These functions feature prominently in the scikit-learn examples and it would be nice to have a similar functionality in giotto-tda.
Suggestions for synthetic datasets include:
make_point_clouds
: Generate an array of spheres and tori in 3-dimensions with corresponding label (useful for showing persistent homology + shape classification).make_time_series
: Generate an array periodic and non-periodic time series with corresponding label (useful for showing sliding window embeddings and time series classification).Suggestions for point cloud and graph datasets could take inspiration from PyTorch geometric's dataset module
The text was updated successfully, but these errors were encountered: