Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload #3

Closed
2 of 3 tasks
eboileau opened this issue May 12, 2022 · 3 comments
Closed
2 of 3 tasks

File upload #3

eboileau opened this issue May 12, 2022 · 3 comments

Comments

@eboileau
Copy link
Collaborator

eboileau commented May 12, 2022

  • File upload is too complex. We need to think how this can be simplified/harmonized and/or standardised.

  • The current 3-tab or 10X-like formats (or even worst Excel) are not viable for large datasets.

Added h5ad upload minimal support. See below.

  • I do not understand how public datasets are handled. If I am logged-in as admin with curator privilege, the data that I upload will not be seen as public, unless I select this choice in the dataset explorer. But then this is the same for any user. This data will be located under www/analyses/by_user, and not under www/analyses/by_dataset, where it should be, according to the documentation. Obviously, if I change the settings in the DB, e.g. UPDATE dataset SET is_public = 1 WHERE id = "id"; this only affects the status of the dataset, not it's actual location. So how datasets are correctly uploaded for public access?

I think I understand better how this work now...


I also document a few more minor issues:

  1. To avoid PHP Warning: failed to open stream: Permission denied, we need to make sure that
cd www
sudo chmod 777 datasets analyses/* uploads/files/

This was mentioned on the gEAR documentation, but somehow overlooked. This has been added to the Ansible playbook.

  1. To avoid PHP Warning: POST Content-Length of 16172687560 bytes exceeds the limit of 3145728000 bytes, limits were set in the apache2/php.ini (not in cli/php.ini). We faced this when trying to upload tar files (as per documentation). In fact, the files should be compressed (tar.gz)! But even then, I could not upload compressed large files.

  2. Minor changes to lib/gear/metadata.py, see CHANGELOG. In particular related to cgi.escape gone in Python3.8.

@eboileau
Copy link
Collaborator Author

Added option to upload h5ad files. @adkinsrs in case you're interested: ace4a71

However, to avoid errors such as scverse/scanpy#1351, I had to upgrade

anndata==0.8.0
h5py==3.7.0
scanpy==1.9.1

So far, this seems not to cause other issues, but let's see. Having a recent scanpy will also allow to add more functionalities in a near future.

There are a few things to think about:

  • In www/cgi/load_dataset_finalize.cgi, data is moved from '../uploads/files' to '../datasets'. I believe this is for data download, but I haven't gone this far yet. For h5ad, this is redundant. A quick fix is to just skip this, but I need to assign some time to look at data download (do we actually need to keep the tar/xlsx files on disk? Couldn't we rely only on h5ad?)

  • More generally, I need to think how to handle observations consistently across formats, and also for h5ad, if obsm, and/or uns are present.

@eboileau
Copy link
Collaborator Author

eboileau commented Jun 3, 2022

More things to think about:

  • Upload directory www/uploads/files is not cleaned afterwards (at least if we have uploaded h5ad files)... besides this leaves files with original names, potentially causing security issues...
  • threetabuploader.py and mexuploader.py in addition write to /tmp... this can eventually fill up...
  • See Back up adata fills up disk space #6 This should probably not happen when we move to the server, but with the current setup, we noticed that if disk space is full, upload (PHP) fails silently without error (at least I could not find anything in the logs).

@eboileau
Copy link
Collaborator Author

I will collect important points in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant