Skip to content

Commit

Permalink
2024-11-07 nightly release (8636054)
Browse files Browse the repository at this point in the history
  • Loading branch information
pytorchbot committed Nov 7, 2024
1 parent 2c82874 commit 4686780
Show file tree
Hide file tree
Showing 7 changed files with 102 additions and 55 deletions.
3 changes: 2 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.
- If you are adding a new node, ensure you read that section in the contribution guide, as it includes requirements for
functionality and testing.

Fixes #{issue number}

Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/_build_test_upload.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ jobs:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
- pure
exclude:
- os: windows-latest
Expand Down Expand Up @@ -227,6 +228,7 @@ jobs:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
steps:
- name: Checkout Source Repository
uses: actions/checkout@v3
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/nodes_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ jobs:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
exclude:
- os: macos-latest
python-version: "3.13"
- os: windows-latest
python-version: "3.13"
steps:
- name: Get PyTorch Channel
shell: bash
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/stateful_dataloader_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ jobs:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
exclude:
- os: macos-latest
python-version: "3.13"
- os: windows-latest
python-version: "3.13"
steps:
- name: Get PyTorch Channel
shell: bash
Expand Down
119 changes: 75 additions & 44 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,16 @@ description is clear and has sufficient instructions to be able to reproduce the
For question related to the usage of this library, please post a question on the
[PyTorch forum, under the "data" category](https://discuss.pytorch.org/c/data/37).

## Pull Requests

We actively welcome your pull requests.

1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation and examples.
4. Ensure the test suite passes.
5. If you haven't already, complete the Contributor License Agreement ("CLA").

## Development installation

### Install PyTorch Nightly
Expand All @@ -46,66 +56,87 @@ conda install pytorch -c pytorch-nightly
# pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
```

### Install TorchData
### Install TorchData and Test Requirements

```bash
git clone https://github.com/pytorch/data.git
cd data
pip install -e .
pip install flake8 typing mypy pytest expecttest
pip install -r test/requirements.txt
```

## Pull Requests

We actively welcome your pull requests.

1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation and examples.
4. Ensure the test suite passes.
5. If you haven't already, complete the Contributor License Agreement ("CLA").

### Code style

`torchdata` enforces a fairly strict code format through [`pre-commit`](https://pre-commit.com). You can install it with

```shell
pip install pre-commit
```bash
conda install -c conda-forge pre-commit
# or pip install pre-commit
cd data
with-proxy conda install pre-commit
pre-commit install --install-hooks
```

or
### Running mypy and unit-tests locally

```shell
conda install -c conda-forge pre-commit
Currently we don't run mypy as part of pre-commit hooks

```bash
mypy --config-file=mypy.ini
```

```bash
pytest --durations=0 --no-header -v test/nodes/
```

To check and in most cases fix the code format, stage all your changes (`git add`) and run `pre-commit run`. To perform
the checks automatically before every `git commit`, you can install them with `pre-commit install`.

### Adding a new DataPipe

When adding a new DataPipe, there are few things that need to be done to ensure it is working and documented properly.

1. Naming - please follow the naming convention as
[described here](https://pytorch.org/data/main/dp_tutorial.html#naming).
2. Testing - please add unit tests to ensure that the DataPipe is functioning properly. Here are the
[test requirements](https://github.com/pytorch/data/issues/106) that we have.
- One test that is commonly missed is the serialization test. Please add the new DataPipe to
[`test_serialization.py`](https://github.com/pytorch/data/blob/main/test/test_serialization.py).
- If your test requires interacting with files in the file system (e.g. opening a `csv` or `tar` file, we prefer
those files to be generated during the test (see `test_local_io.py`). If the file is on a remote server, see
`test_remote_io.py`.
3. Documentation - ensure that the DataPipe has docstring, usage example, and that it is added to the right category of
the right RST file to be rendered.
- If your DataPipe has a functional form (i.e. `@functional_datapipe(...)`), include at the
[end of the first sentence](https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/combining.py#L25)
of your docstring. This will make sure it correctly shows up in the
[summary table](https://pytorch.org/data/main/torchdata.datapipes.iter.html#archive-datapipes) of our
documentation.
4. Import - import the DataPipe in the correct `__init__.py` file.
5. Interface - if the DataPipe has a functional form, make sure that is generated properly by `gen_pyi.py` into the
relevant interface file.
- You can re-generate the pyi files by re-running `pip install -e .`, then you can examine the new outputs.
### Adding a new Node

When adding a new Node, there are few things that need to be done to ensure it is working and documented properly.

The following simplifying assumptions are made of node implementations:

- state is managed solely by the BaseNode, not through any iterators returned from them.
- state_dict() returns the state of the most recently requested iterator.
- load_state_dict() will set the state for the next iterator.

1. Functionality - Nodes must subclass BaseNode and implement the required methods.

- `.iterator(self, initial_state: Optional[Dict[str, Any]])` - return a new iterator/generator that is properly
initialized with the optional initial_state
- `.get_state(self) -> Dict[str, Any]` - return a dictionary representing the state of the most recently returned
iterator, or if not yet requested, the initial state.
- ensure you're calling `state_dict()/load_state_dict()` on ancestor BaseNodes. Here is a simple example of a pretty
useless node:

```python
class MyNode(BaseNode[T]):
def __init__(self, parent: BaseNode[T]):
self.parent = parent
self.idx = 0 # not very useful state

def iterator(self, initial_state: Optional[Dict[str, Any]]) -> Iterator[T]
if initial_state is not None:
self.parent.load_state_dict(initial_state["parent"])
self.idx = initial_state["idx"]

for item in self.parent:
self.idx += 1
yield item

def get_state(self) -> Dict[str, Any]:
return {
"parent": self.parent.state_dict(), # note we call state_dict() and not get_state() here
"idx": self.idx,
}
```

2. Typing - Include type-hints for all public functions and methods
3. Testing - please add unit tests to ensure that the Node is functioning properly.
- In addition to testing basic functionatity, state management must also be tested.
- For basic state testing, you may use `test.nodes.utils.run_test_save_load_state`. See `test/nodes/test_batch.py`
for an example.
4. Documentation - ensure that the Node has a docstring, and a usage example.
5. Import - import the Node in the correct `__init__.py` file.

## Contributor License Agreement ("CLA")

Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,16 @@ examples

The following is the corresponding `torchdata` versions and supported Python versions.

| `torch` | `torchdata` | `python` |
| -------------------- | ------------------ | ----------------- |
| `master` / `nightly` | `main` / `nightly` | `>=3.9`, `<=3.12` |
| `2.5.0` | `0.9.0` | `>=3.9`, `<=3.12` |
| `2.4.0` | `0.8.0` | `>=3.8`, `<=3.12` |
| `2.0.0` | `0.6.0` | `>=3.8`, `<=3.11` |
| `1.13.1` | `0.5.1` | `>=3.7`, `<=3.10` |
| `1.12.1` | `0.4.1` | `>=3.7`, `<=3.10` |
| `1.12.0` | `0.4.0` | `>=3.7`, `<=3.10` |
| `1.11.0` | `0.3.0` | `>=3.7`, `<=3.10` |
| `torch` | `torchdata` | `python` |
| -------------------- | ------------------ | --------------------------------------- |
| `master` / `nightly` | `main` / `nightly` | `>=3.9`, `<=3.12` (`3.13` experimental) |
| `2.5.0` | `0.9.0` | `>=3.9`, `<=3.12` |
| `2.4.0` | `0.8.0` | `>=3.8`, `<=3.12` |
| `2.0.0` | `0.6.0` | `>=3.8`, `<=3.11` |
| `1.13.1` | `0.5.1` | `>=3.7`, `<=3.10` |
| `1.12.1` | `0.4.1` | `>=3.7`, `<=3.10` |
| `1.12.0` | `0.4.0` | `>=3.7`, `<=3.10` |
| `1.11.0` | `0.3.0` | `>=3.7`, `<=3.10` |

### Local pip or conda

Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ def remove_extension(pattern):
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: Implementation :: CPython",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
Expand Down

0 comments on commit 4686780

Please sign in to comment.