Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat dataset #1411

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

concat dataset #1411

wants to merge 10 commits into from

Conversation

yueyinqiu
Copy link
Contributor

@yueyinqiu yueyinqiu commented Nov 15, 2024

  • add support for torch.utils.data.ConcatDataset
    • a new interface IDataset<out T> is added
      • now Dataset<T> implements IDataset<T>
    • more overloads of DataLoader() has been added, to accept IDataset datasets
      • those overloads directly return a DataLoader<T, S>, rather than its subclasses
    • datasets supported by DataLoader<T, S> have been relaxed to IDataset<T>
  • parameter of collate functions in DataLoader<T, S> has been relaxed to IReadOnlyList

@yueyinqiu yueyinqiu changed the title concat dataset (WIP) concat dataset Nov 15, 2024
@yueyinqiu
Copy link
Contributor Author

yueyinqiu commented Nov 15, 2024

related to #1348 (comment) #1354 #1358

@yueyinqiu yueyinqiu changed the title (WIP) concat dataset concat dataset Nov 15, 2024
@yueyinqiu
Copy link
Contributor Author

yueyinqiu commented Nov 15, 2024

As mentioned here #1357 (comment), the current version has no API changes (except the type of dataset and collate_fn in DataLoader<T, S>).

However we don't actually need Dataset<T>, Dataset and IterableDataset. Do we have any plan to remove them later? They may also occupy the position of some other classes in PyTorch, like #1353.

(But I find it hard to use [Obsolete] on them since some methods are using them as the return type.)

/// </summary>
/// <param name="index">Index for tensor</param>
/// <returns>Tensors of index. DataLoader will catenate these tensors into batches.</returns>
T this[long index] { get; }
Copy link
Contributor Author

@yueyinqiu yueyinqiu Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using Int64. However actually most .net containers does not support Int64 indices. And actually it is impractical to have so much data. Should we switch to Int32 instead?

@yueyinqiu yueyinqiu marked this pull request as ready for review November 15, 2024 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant