OpenML outage because of the TU/e cyberattack #20
Replies: 8 comments 29 replies
-
I was using a scikit-learn example that stopped working, and used the example on this page to create a workaround, which is provided below: Code I was attempting to run from scikit-learn:
Workaround based on example on this page that now downloads:
|
Beta Was this translation helpful? Give feedback.
-
Should R users wait until the primary server is back online? |
Beta Was this translation helpful? Give feedback.
-
For those who want to use image datasets (e.g. MetaAlbum), the cache paths are a bit weird at the moment, but should be normal again after the DNS updates. For now, here's a usage example: https://colab.research.google.com/drive/1aEZmcmSFFLPFhW7lVAtHGDsbIpqktu5H?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
I am getting privacy errors with |
Beta Was this translation helpful? Give feedback.
-
It says in the edited discussion above that the DNS was updated, but I am still seeing errors when calling the fetch_openml api without specifying the server directly in builds (as in, not using server=...):
See: https://github.com/microsoft/responsible-ai-toolbox/actions/runs/12795480111/job/36020797462?pr=2594 |
Beta Was this translation helpful? Give feedback.
-
Hello @joaquinvanschoren, # Satellite dataset https://www.openml.org/d/182
# Switch to the read-only server
openml.config.server = "http://145.38.195.79/api/v1/xml"
# Set the cache dir (for colab)
openml.config.set_root_cache_directory(os.path.expanduser('/content/cache'))
dataset = openml.datasets.get_dataset(182, download_data=True, download_all_files=True) # version 2 of the dataset I get:
|
Beta Was this translation helpful? Give feedback.
-
Hi everyone and thanks for the support. I am now able to download datasets but still unable to load runs if they are not cached: Example: Error: |
Beta Was this translation helpful? Give feedback.
-
If you are using openml-python, make sure to upgrade to the latest version (released minutes ago, January 25th, 2025). You can upgrade with With 0.15.1 installed, set environment variable OPENML_SKIP_PARQUET, i.e., Linux/MacOS: This stops openml-python from attempting to download parquet files. We recommend to use the feature only until the MinIO production server is operational again. |
Beta Was this translation helpful? Give feedback.
-
OpenML went down on Sunday January 12th, 2025, due to a cyber-attack on the Eindhoven University of Technology, which even reached the international press. The OpenML servers were not affected or breached. However, the university took down the entire network out of caution, meaning that the OpenML servers were not accessible, not even to the OpenML system admins. While OpenML is fully containerized, has redundancy against failing servers, and multiple backups, everything was hosted in data centers under the university network, meaning that we could not get access to any of our servers. Below is a list of updates and plans to avoid this from happening in the future.
📢 Updates:
https://www.openml.org
andhttps://api.openml.org
(used insklearn.fetch_openml
). We'll fix that as soon as possible.python -m pip install --upgrade openml
.🚧 Known remaining issues:
https://openml1.win.tue.nl
) are still blocked and won't be unblocked before Monday. This means that .pq versions of datasets are not available.💻 Use cases:
sklearn.fetch_openml
: Works :)🐍 OpenML Python:
If you are using openml-python, make sure to upgrade to the latest version (released January 25th, 2025).
With 0.15.1 installed, set environment variable
OPENML_SKIP_PARQUET
, i.e.,export OPENML_SKIP_PARQUET=true
for Linux/MacOS andset OPENML_SKIP_PARQUET=true
for Windows, to avoid to attempt download parquet files. We recommend to use the feature only until the MinIO production server is operational again.Without the environment variable set, openml-python is completely functional, but slower.
If you have to use OpenML 0.15.0 or earlier, you will experience the following issues:
get_dataset
: Works, but slow, because it will first try to download .pq version of datasets. This will likely be fixed by Monday.A workaround is to patch requests like this:
Download datasets with
dataset.get_data
results in an error due to a bug in the fallback logic.Note
Workarounds specifying the ip-based url (http://145.38.195.79/) are no longer needed, and may lead to errors.
In case you experience issues, make sure to try again with a clear cache. If the problem persists, please reach out to us.
Functionality of the read-only server (no longer relevant):
Future plans:
We apologize for the outage and for any inconvenience this may have caused. This was an unprecedented series of events that we did not foresee and had very little control over. We will be better prepared if this ever happens again.
The OpenML team
Beta Was this translation helpful? Give feedback.
All reactions