You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, some of us have started tracking some performance metrics for Dask clusters under a variety of different usage patterns. The idea is to be able to identify performance regressions before they are released (especially ones at scale which might not show up in a unit test context).
An example of these metrics is at this static site. It's only been collecting results for a few days, but already we seem to have come across a significant regression in cluster memory usage. Here is a test which measures array rechunking:
and a screenshot of average cluster memory usage for that operation over the last week+ :
(I encourage folks to click through, this same behavior appears on a lot of tests around July 26)
The above is based on a coiled cluster, but I've reproduced it using a LocalCluster with the following procedure:
Create a software environment with nightly dask versions from the dask conda channel:
conda create -n memory-regression python=3.9 dask distributed numpy
conda activate memory-regression
# Install nightly from July 22nd
conda install https://conda.anaconda.org/dask/label/dev/noarch/dask-2022.7.1a220722-py_ga55bfd36_21.tar.bz2 https://conda.anaconda.org/dask/label/dev/noarch/distributed-2022.7.1a220722-py_ga55bfd36_21.tar.bz2
# Or install nightly from July 25th
conda install https://conda.anaconda.org/dask/label/dev/noarch/dask-2022.7.2a220725-py_g55cc1a50_1.tar.bz2 https://conda.anaconda.org/dask/label/dev/noarch/distributed-2022.7.2a220725-py_g55cc1a50_1.tar.bz2
Timing-wise, this suggests to me that #6728 might have had some unintended side-effects in cluster memory usage, but I have not verified, nor do I know how it could be so drastic.
Edit, see below
The text was updated successfully, but these errors were encountered:
Recently, some of us have started tracking some performance metrics for Dask clusters under a variety of different usage patterns. The idea is to be able to identify performance regressions before they are released (especially ones at scale which might not show up in a unit test context).
An example of these metrics is at this static site. It's only been collecting results for a few days, but already we seem to have come across a significant regression in cluster memory usage. Here is a test which measures array rechunking:
and a screenshot of average cluster memory usage for that operation over the last week+ :
![image](https://user-images.githubusercontent.com/5728311/182952487-345018c3-c059-4031-880b-8078617a1a17.png)
(I encourage folks to click through, this same behavior appears on a lot of tests around July 26)
The above is based on a coiled cluster, but I've reproduced it using a
LocalCluster
with the following procedure:dask
conda channel:This produces results like the following:
Timing-wise, this suggests to me that #6728 might have had some unintended side-effects in cluster memory usage, but I have not verified, nor do I know how it could be so drastic.Edit, see below
The text was updated successfully, but these errors were encountered: