diff --git a/README.md b/README.md index 4619e9f..a620f21 100644 --- a/README.md +++ b/README.md @@ -96,7 +96,7 @@ Track the status of each repository here: | Repository | Programmatic submission | Development status | Deployed? | Source code | |---|---|---|---|---| | [BioSamples](https://www.ebi.ac.uk/biosamples/) | yes | Ready to be tested | no | [GitHub](repository-services/isajson-biosamples) | -| [ENA](https://www.ebi.ac.uk/ena/browser/) | yes | Ready to be tested | no | [GitHub](repository-services/isajson-json) | +| [ENA](https://www.ebi.ac.uk/ena/browser/) | yes | Ready to be tested | no | [GitHub](repository-services/isajson-ena) | | [MetaboLights](https://www.ebi.ac.uk/metabolights/) | NA | Not started | no | | | [BioStudies/ArrayExpress](https://www.ebi.ac.uk/biostudies/arrayexpress) | yes, in dev | Not started | no | | | [e!DAL-PGP](https://edal-pgp.ipk-gatersleben.de/) | NA | Not started | no | | diff --git a/mars-cli/README.md b/mars-cli/README.md index 2a4683e..875d8df 100644 --- a/mars-cli/README.md +++ b/mars-cli/README.md @@ -1,4 +1,8 @@ -# Installing the mars-cli +# MARS-CLI + +The MARS-CLI tool is a powerful interface for submitting metadata and associated files to various biological repository services like ENA, BioSamples, and MetaboLights. This command-line tool is useful for managing and validating metadata submissions in a ISA-JSON, as well as for automating aspects of repository submissions. + +## Installation This installation procedure describes a typical Linux installation. This application can perfectly work on Windows and MacOS but some of the steps might be different. @@ -37,7 +41,7 @@ echo 'export MARS_SETTINGS_DIR=' >> $HOM Once installed, the CLI application will be available from the terminal. -# Configuration +## Configuration Installing this application will also generate a `settings.ini` file in `$HOME/.mars/`. @@ -49,7 +53,34 @@ log_max_size = 1024 log_max_files = 5 ``` -## Logging +### Repository services + +To configure MARS for submissions, modify the configuration file `settings.ini` located at `~/.mars/settings.ini`. Ensure the following content is set: + +```ini +[webin] +development-url = https://wwwdev.ebi.ac.uk/ena/dev/submit/webin/auth +development-token-url = https://wwwdev.ebi.ac.uk/ena/dev/submit/webin/auth/token +production-url = https://www.ebi.ac.uk/ena/submit/webin/auth +production-token-url = https://www.ebi.ac.uk/ena/submit/webin/auth/token + +[ena] +development-url = http://localhost:8042/isaena +development-submission-url = http://localhost:8042/isaena/submit +development-data-submission-url = webin2.ebi.ac.uk +production-url = https://www.ebi.ac.uk/ena/submit/webin-v2/ +production-submission-url = https://www.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA +production-data-submission-url = webin2.ebi.ac.uk + +[biosamples] +development-url = http://localhost:8032/isabiosamples +development-submission-url = http://localhost:8032/isabiosamples/submit +production-url = https://www.ebi.ac.uk/biosamples/samples/ +production-submission-url = https://www.ebi.ac.uk/biosamples/samples/ +``` + + +### Logging The MARS-CLI will automatically log events to a `.log` file. @@ -67,7 +98,7 @@ __log_max_size__: The maximum size in kB for the log file. By default the maximu __log_max_files__: The maximum number of old log files to keep. By default, this is set to 5 -## Target repository settings +### Target repository settings Each of the target repositories have a set of settings: @@ -76,7 +107,7 @@ Each of the target repositories have a set of settings: - production-url: URL to the production server when performing a health-check - production-submission-url: URL to the production server when performing a submissionW -# Using the MARS-CLI +## Usage If you wish to use a different location for the `.mars' folder: @@ -85,8 +116,6 @@ export MARS_SETTINGS_DIR= mars-cli [options] ARGUMENT ``` -## Help - The mars-cli's help text can be found from the command line as such: ```sh @@ -107,6 +136,7 @@ Options: Commands: health-check Check the health of the target repositories. + set-password Store a password in the keyring. submit Start a submission to the target repositories. validate-isa-json Validate the ISA JSON file. ``` @@ -121,32 +151,37 @@ Output: ``` ➜ mars-cli submit --help -############# Welcome to the MARS CLI. ############# -Running in Production environment -Usage: mars-cli submit [OPTIONS] CREDENTIALS_FILE ISA_JSON_FILE +Usage: mars-cli submit [OPTIONS] ISA_JSON_FILE Start a submission to the target repositories. Options: - -d, --development Boolean indicating the usage of the development - environment of the target repositories. If not present, - the production instances will be used. - --help Show this message and exit. - -Commands: - health-check Check the health of the target repositories. - set-password Store a password in the keyring. - submit Start a submission to the target repositories. - validate-isa-json Validate the ISA JSON file. + --output TEXT + --investigation-is-root BOOLEAN + Boolean indicating if the investigation is + the root of the ISA JSON. Set this to True + if the ISA-JSON does not contain a + 'investigation' field. + --submit-to-metabolights BOOLEAN + Submit to Metabolights. + --data-files FILENAME Path of files to upload + --file-transfer TEXT provide the name of a file transfer + solution, like ftp or aspera + --submit-to-ena BOOLEAN Submit to ENA. + --submit-to-biosamples BOOLEAN Submit to BioSamples. + --credentials-file FILENAME Name of a credentials file + --username-credentials TEXT Username from the keyring + --credential-service-name TEXT service name from the keyring + --help Show this message and exit. ``` -## Development +### Development vs production By default, the mars-CLI will try to submit the ISA-JSON's metadata towards the repositories' production servers. Passing the development flag will run it in development mode and substitute the production servers with the development servers. -## Health check repository services +### Health check repository services You can check whether the supported repositories are healthy, prior to submission, by doing a health-check. @@ -161,17 +196,16 @@ Output: ############# Welcome to the MARS CLI. ############# Running in Production environment Checking the health of the target repositories. -Checking production instances. -Webin (https://www.ebi.ac.uk/ena/submit/webin/auth) is healthy. -ENA (https://www.ebi.ac.uk/ena/submit/webin-v2/) is healthy. -Biosamples (https://www.ebi.ac.uk/biosamples/samples/) is healthy. +Service 'webin' healthy and ready to use! +Service 'ena' healthy and ready to use! +Service 'biosamples' healthy and ready to use! ``` -## using the keychain +### Credential management -This CLI application comes with functionality to interact with your device's keychain backend. +This CLI application comes with functionality to interact with your device's keychain backend in order to fetch the necessary credentials. -### Store a password +#### Store a password You can add a password to keychain: @@ -189,21 +223,41 @@ Options: --help Show this message and exit. ``` -## Submitting to repository services - -TODO ### Options -- `--submit-to-ena`: By default set to `True`. Will try submit ISA-JSON metadata towards ENA. Setting it to `False` will skip sending the ISA-JSON's metadata to ENA. +#### Biosamples submissions +`--submit-to-biosamples`: By default set to `True`. Will try submit ISA-JSON metadata towards Biosamples. Setting it to `False` will skip sending the ISA-JSON's metadata to Biosamples. +> **Note**: Following command line will avoid submission to Biosamples repository: +```sh +mars-cli submit --submit-to-biosamples False my-credentials my-isa-json.json +``` + +#### ENA submissions +`--submit-to-ena`: By default set to `True`. Will try submit ISA-JSON metadata towards ENA. Setting it to `False` will skip sending the ISA-JSON's metadata to ENA. + +> **Note**: Following command line will avoid submission to ENA repository: ```sh mars-cli submit --submit-to-ena False my-credentials my-isa-json.json ``` -- `--submit-to-metabolights`: By default set to `True`. Will try to submit ISA-JSON metadata towards Metabolights. +`--file-transfer`: Provide the name of a file transfer solution, like ftp or aspera + +`--data-file`: Paths of files to upload. + +> **Note**: Following command line will submit isa-file and data-file using FTP solution to Biosamples and ENA: +```sh +mars-cli submit --submit-to-metabolights False --file-transfer ftp --data-files ../data/file_to_upload.fastq.gz my-credentials my-isa-json.json +``` + +#### Metabolights submissions +> **Status**: 🚧 To Be Developed + +`--submit-to-metabolights`: By default set to `True`. Will try to submit ISA-JSON metadata towards Metabolights. Setting it to `False` will skip sending the ISA-JSON's metadata to Metabolights. +Following command line will avoid submission to metabolights repository: ```sh mars-cli submit --submit-to-metabolights False my-credentials my-isa-json.json ``` @@ -216,7 +270,16 @@ the flag `--investigation-is-root` to `True` in order to validate the ISA-JSON. mars-cli submit --investigation-is-root True my-credentials my-isa-json.json ``` -## Validation of the ISA JSON +`--output`: By default "output_{datetime.now()}", the name of the isa final output. + +```sh +mars-cli submit --output final_isa my-credentials my-isa-json.json +``` + +## Feature: Validation of the ISA JSON +> **Status**: 🚧 To Be Developed + +This feature is planned but not yet implemented. Further details will be provided as development progresses. You can perform a syntactic validation of the ISA-JSON, without submitting to the target repositories. @@ -231,8 +294,6 @@ of the ISA-JSON and, in some cases, automatically patch inconsistencies. This feature is implemented as a set of additional validation rules a user can customize according to the submission needs. -TODO - ```sh mars-cli validate-isa-json --investigation-is-root True ../test-data/biosamples-input-isa.json ``` @@ -247,194 +308,26 @@ the flag `--investigation-is-root` to `True` in order to validate the ISA-JSON. mars-cli validate-isa-json my-isa-investigation.json ``` -# Extending BioSamples' records -The Python script ``biosamples-externalReferences.py`` defines a class BiosamplesRecord for managing biosample records. This class is designed to interact with the BioSamples database, allowing operations like fetching, updating, and extending biosample records. +## Feature: Extending BioSamples' records +> **Status**: 🚧 To Be Developed + +This part is designed to interact with the BioSamples database, allowing operations like fetching, updating, and extending biosample records. The script takes in a dictionary of BioSamples' accessions and their associated external references, and expands the former with the latter. To summarize, the steps of the code are: -1. Takes the BioSamples' submitter credentials and an input file containing a set of BioSamples accessions and their associated external references - 1. Validates inputs -1. For each BioSamples' accession, it downloads its JSON record from BioSamples -1. Extend the BioSamples' JSON with the ``externalReferences`` of the input file -1. Submit the extended JSON to BioSamples to replace the existing one +1. Takes the BioSamples' submitter credentials and a set of BioSamples accessions and their associated external references +2. Validates inputs +3. For each BioSamples' accession, it downloads its JSON record from BioSamples +4. Extend the BioSamples' JSON with the ``externalReferences`` of the input file +5. Submit the extended JSON to BioSamples to replace the existing one -## Examples -### BioSamples JSON -Mock example ([``SAMEA112654119``](https://www.ebi.ac.uk/biosamples/samples/SAMEA112654119)): -- Record (JSON) **before** extending with ``externalReferences``: -```` -{ - "name" : "AngH91", - "accession" : "SAMEA112654119", - ... -} -```` -- Record (JSON) **after** extending with ``externalReferences``: -```` -{ - "name" : "AngH91", - "accession" : "SAMEA112654119", - ... - "externalReferences" : [ { - "url" : "https://ega-archive.org/datasets/EGAD00010002458", - "duo" : [ ] - }, { - "url" : "https://ega-archive.org/metadata/v2/samples/EGAN00004248937", - "duo" : [ ] - }, { - "url" : "https://www.ebi.ac.uk/ena/browser/view/SAMEA112654119", - "duo" : [ ] - } ] - ... -} -```` -### Script input -In the following example, we would be adding 3 URLs to ``SAMEA112654119`` and one to ``SAMEA419425`` as ``externalReferences``. -```` -{ - "biosampleExternalReferences": [ - { - "biosampleAccession": "SAMEA112654119", - "externalReferences": [ - { - "url": "https://ega-archive.org/datasets/EGAD00010002458" - }, - { - "url": "https://ega-archive.org/metadata/v2/samples/EGAN00004248937" - }, - { - "url": "https://www.ebi.ac.uk/ena/browser/view/SAMEA112654119" - } - ] - }, - { - "biosampleAccession": "SAMEA419425", - "externalReferences": [ - { - "url": "https://ega-archive.org/datasets/EGAD00010002458" - } - ] - } - ] -} -```` -## Usage -### Command line -````bash -$ python3 biosamples-externalReferences.py --help -usage: biosamples-externalReferences.py [-h] [--production] biosamples_credentials biosamples_externalReferences - -This script extends a set of existing Biosamples records with a list of provided external references. - -positional arguments: - biosamples_credentials - Either a dictionary or filepath to the BioSamples credentials. - biosamples_externalReferences - Either a dictionary or filepath to the BioSamples' accessions mapping with external references. - -options: - -h, --help show this help message and exit - --production Boolean indicating the usage of the production environment of BioSamples. If not present, the development instance will be used. -```` -### Interfacing with BiosamplesRecord Class in Java [_By ChatGPT_] -#### Prerequisites -- **Jython**: A Java implementation of the Python interpreter. It allows running Python code within a Java application. -- **Environment Setup**: Ensure Python and all necessary libraries (``requests``, ``json``, etc.) are installed and accessible to Jython. - -#### Basic Steps for Integration -1. **Importing Jython in Java**: Add Jython as a dependency in your Java project. -1. **Executing Python Script**: Use Jython's ``PythonInterpreter`` class to execute the Python script. -1. **Creating BiosamplesRecord Instance**: Instantiate the BiosamplesRecord class through the interpreter. -1. **Interacting with BiosamplesRecord Methods**: Utilize methods like ``fetch_bs_json``, ``extend_externalReferences``, etc., via the interpreter. -1. **Integrating with the Main Function**: - - The ``main`` function in the script acts as an entry point for command-line usage. - - In Java, replicate the logic in ``main``. -1. **Data Handling**: Data passed between Java and Python must be in a compatible format (e.g., JSON). -1. **Error Handling**: Properly handle Python exceptions raised by the script in Java. - -Sample Java Integration Code: -````java -import org.python.util.PythonInterpreter; -import org.python.core.*; - -public class BiosamplesIntegration { - public static void main(String[] args) { - PythonInterpreter interpreter = new PythonInterpreter(); - - // Load and execute Python script - interpreter.execfile("path/to/biosamples-externalReferences.py"); - - // Create a BiosamplesRecord instance - PyObject biosamplesRecordClass = interpreter.get("BiosamplesRecord"); - PyObject biosamplesRecord = biosamplesRecordClass.__call__(new PyString("SAMPLE_ACCESSSION")); - - // Use methods of BiosamplesRecord - PyObject result = biosamplesRecord.invoke("fetch_bs_json", new PyString("biosamples_endpoint")); - System.out.println(result.toString()); - - - - - // Handle other operations similarly - } -} -```` -# Testing BioSamples submission using the local docer converter instance or a remote converter instance - -## Getting Started - -To set up and run the MARS tool locally using Docker, follow these steps: - -### Prerequisites - -- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/) installed on your system. - -### Running the Docker Containers - -1. **Navigate** to the `repository-services` directory in your cloned repository. - -2. **Start the Docker containers** by running the following command: - - ```bash - docker compose up - ``` - -3. **Check the BioSamples submission service** by visiting: - - ``` - http://localhost:8032/isabiosamples/swagger-ui/index.html - ``` - - This URL will indicate if the BioSamples submission Docker container is up and running. - -### Configuration - -To configure MARS for submissions, modify the configuration file `settings.ini` located at `~/.mars/settings.ini`. Ensure the following content is set: -```ini -[webin] -development-url = https://wwwdev.ebi.ac.uk/ena/dev/submit/webin/auth -development-token-url = https://wwwdev.ebi.ac.uk/ena/dev/submit/webin/auth/token -production-url = https://www.ebi.ac.uk/ena/submit/webin/auth -production-token-url = https://www.ebi.ac.uk/ena/submit/webin/auth/token - -[ena] -development-url = http://localhost:8042/isaena -development-submission-url = http://localhost:8042/isaena/submit -production-url = https://www.ebi.ac.uk/ena/submit/webin-v2/ -production-submission-url = https://www.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA - -[biosamples] -development-url = http://localhost:8032/isabiosamples -development-submission-url = http://localhost:8032/isabiosamples/submit -production-url = https://www.ebi.ac.uk/biosamples/samples/ -production-submission-url = https://www.ebi.ac.uk/biosamples/samples/ -``` +## Examples -### Running MARS Submission +### Submit isa-json to biosamples -After configuring the `settings.ini` file, you can run the MARS CLI tool to submit data: +After configuring the `settings.ini` file, you can run the MARS CLI tool to submit the isa-json: ```bash python mars_cli.py --development submit --submit-to-metabolights False --submit-to-ena False --credential-service-name --username-credentials ../test-data/biosamples-input-isa.json @@ -446,8 +339,20 @@ python mars_cli.py --development submit --submit-to-metabolights False --submit- Aternatively, you can also use a credentials file to authenticate to the services. An example can be found here: https://github.com/elixir-europe/MARS/blob/main/mars-cli/tests/test_credentials_example.json -Run the MARS CLI tool to submit the data: +Run the MARS CLI tool to submit the isa-json using credentials file: ```bash python mars_cli.py --development submit --submit-to-metabolights False --submit-to-ena False --credentials-file ../test-data/biosamples-input-isa.json -``` \ No newline at end of file +``` + +### Submit data files and isa-json and to biosamples and ENA + + +## Deploy repository services + +[To set up and run the MARS tool locally using Docker, follow these steps](../repository-services/README.md) + + +```bash +python mars_cli.py --credential-service-name biosamples --username-credentials --file-transfer ftp --data-files ../data/ENA_data.R1.fastq.gz --submit-to-metabolights False --output final-isa ../data/biosamples-input-isa.json +``` diff --git a/repository-services/README.md b/repository-services/README.md index 3291ed3..0027c20 100644 --- a/repository-services/README.md +++ b/repository-services/README.md @@ -2,15 +2,19 @@ In order to test the metadata submission to BioSamples and ENA, two Java Spring web services can be deployed, using docker. There is no compilation step necessary prior to deployment. -## Deployment +## Prerequisites -Change to the correct directory: +- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/) installed on your system. + +## Deployment - Running the Docker Containers + +1. **Navigate** to the `repository-services` directory in your cloned repository. ```sh cd repository-test-services ``` -Use docker compose to deploy both services simultaneously: +2. **Start the Docker containers** to deploy both services simultaneously: ```sh docker compose up