Skip to content

Commit

Permalink
feat(pyclient): add support for GraphQL API to 'get' method (#4558)
Browse files Browse the repository at this point in the history
* started creating parse function for GraphQL query from schema metadata

* * moved 'parse_query' to Client class
* added column types to new constants.py
* started creating 'get_pkeys' for metadata.py

* * created function 'parse_nested_pkeys'
* finished '_parse_get_table_query' method

* * implemented 'columns' filter in `get`

* * added check for None columns

* * split filter method results for CSV and GraphQL API

* * fixed logging issues

* * added dtype conversion to DataFrame output

* * fixed column type float to decimal

* * fixed column type float to decimal
* added remaining types

* * fixed server URL ending with '/'

* * updated dev script for catalogue model

* * implemented truncate

* * fixed imports

* * fixed truncate GraphQL url

* * created `ReferenceException`

* * refactored table names 'Collections', 'Cohorts' to 'Resources'

* * small fixes

* * fixed examples in dev.py

* * updated README.md

* * updated README.md

* * removed redundant script

* * fixed GraphQL query for column type FILE

* * added parser for ontology columns, top-level only

* * implemented parser for ontology columns nested in ref/ref_array/refback columns

* * fixed as_df=False return empty list instead of None in case of empty table

* * improved parsing for cases where ref columns reference ontology tables

* * fixed referencing tables in other schemas

* * replaced dtype 'bool' by 'boolean' for data type BOOL

* * added data type LONG for conversion

* * fixed issue with rounding of numeric values in string type columns

* * moved get 'GraphQL' option to separate method 'get_graphql'
* restored previous behaviour of get(as_df=False)

* * fixed error in parsing datatime data type

* * updated docs

* * corrected docstrings
* updated documentation
* updated changelog
  • Loading branch information
YpeZ authored Feb 10, 2025
1 parent dd8af51 commit bc3e226
Show file tree
Hide file tree
Showing 10 changed files with 540 additions and 118 deletions.
161 changes: 110 additions & 51 deletions docs/molgenis/use_usingpyclient.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,32 @@ The MOLGENIS EMX2 Python client allows the user to retrieve, create, update and

## Installation
The releases of the package are hosted at [PyPI](https://pypi.org/project/molgenis-emx2-pyclient/).
The recommended way to install the latest
The recommended way to install the latest version is through _pip_:

```commandline
pip install molgenis-emx2-pyclient
```

## Setting up the client
The Python client can be integrated in scripts authorized by either a username/password combination or a temporary token.
URLs of EMX2 servers on remote servers are required to start with `https://`.
The Python client can be integrated in scripts authorized by either a temporary token or a username/password combination.
URLs of EMX2 instances on remote servers must start with `https://`.
It is possible to use the Pyclient on a server running on a local machine. The URL should then be passed as `http://localhost:PORT`.

Signing in with a username/password combination requires using the client as context manager:
The recommended method for authorization in the Pyclient is with tokens, which can be generated in the UI following the instructions [here](use_tokens.md).
In the initialization of the Client object the token can then be passed as an argument.
It is recommended that the token be stored as an environment variable such that it can be read in and used as follows:
```python
import os
from molgenis_emx2_pyclient import Client

token = os.environ.get("MOLGENIS_TOKEN")

with Client(url='https://example.molgeniscloud.org', token=token) as client:
# Perform tasks
...

```
Signing in with a username/password combination is done using the `signin` method:
```python
from molgenis_emx2_pyclient import Client

Expand All @@ -31,29 +45,13 @@ with Client(url='https://example.molgeniscloud.org') as client:
...
```

Before using the Pyclient with a token, this token should be generated in the UI, see [Tokens](use_tokens.md).
Using the Pyclient with a token requires supplying the token in the initialization of the Client object:
```python
from molgenis_emx2_pyclient import Client

token = '********************************'

client = Client(url='https://example.molgeniscloud.org', token=token)

# Perform other tasks
...

```
If the client is only to be used for retrieving information from publicly viewable schemas no authorization is needed.

Additionally, if the Pyclient is to be used on a particular schema, this schema can be supplied in the initialization of the client, alongside the server URL:
```python
with Client('https://example.molgeniscloud.org', schema='My Schema') as client:
...
```
or
```python
client = Client('https://example.molgeniscloud.org', schema='My Schema', token=token)
```

### Scripts and Jobs
When using the client in a script that runs as part of a job via the [Task API](use_scripts_jobs.md), it is essential
Expand Down Expand Up @@ -124,52 +122,104 @@ Raises the `TokenSigninException` when the client is already signed in with a us
```python
def get(self,
table: str,
columns: list[str] = None,
query_filter: str = None,
schema: str = None,
as_df: bool = False) -> list | pandas.DataFrame:
...
```
Retrieves data from a table on a schema and returns the result either as a list of dictionaries or as a pandas DataFrame.
Retrieves data from a table on a schema using the CSV API and returns the result either as a list of dictionaries or as a pandas DataFrame.
Use the `columns` parameter to specify which columns to retrieve. By default all columns are returned
Use the `query_filter` parameter to filter the results based on filters applied to the columns.
This query requires a special syntax.
Columns can be filtered on equality `==`, inequality `!=`, greater `>` and smaller `<` than.
Values in columns can be filtered on equality `==`, inequality `!=`, greater `>` and smaller `<` than.
Values within an interval can also be filtered by using the operand `between`, followed by list of the upper bound and lower bound.
The values of reference and ontology columns can also be filtered by joining the column id of the table with the column id of the reference/ontology table by a dot, as in the example `countries.name`, where `countries` is a column in the table `My table` and `name` is the column id of the referenced table specifying the names of countries.
It is possible to add filters on multiple columns by separating the filter statements with _' and '_.
It is recommended to supply the filters that are compared as variables passed in an f-string.

Throws the `NoSuchSchemaException` if the user does not have at least _viewer_ permissions or if the schema does not exist.
Throws the `NoSuchColumnException` if the query filter contains a column id that is not present in the table.
Throws the `NoSuchColumnException` if the `columns` argument or query filter contains a column that is not present in the table.


| parameter | type | description | required | default |
|----------------|------|--------------------------------------------------------------------------------|----------|---------|
| `table` | str | the name of a table | True | None |
| `columns` | list | a list of column names to return | False | None |
| `schema` | str | the name of a schema | False | None |
| `query_filter` | str | a string to filter the results on | False | None |
| `as_df` | bool | if true: returns data as pandas DataFrame <br/> else as a list of dictionaries | False | False |

##### examples

```python
# Get all entries for the table 'Collections' on the schema 'MySchema'
table_data = client.get(table='Collections', schema='MySchema')
# Get all entries for the table 'Resources' on the schema 'MySchema'
table_data = client.get(table='Resources', schema='MySchema', columns=['name', 'collectionEvents'])

# Set the default schema to 'MySchema'
client.set_schema('MySchema')
# Get the same entries and return them as pandas DataFrame
table_data = client.get(table='Collections', as_df=True)
table_data = client.get(table='Resources', columns=['name', 'collection events'], as_df=True)

# Get the entries where the value of a particular column 'number of participants' is greater than 10000
table_data = client.get(table='Resources', query_filter='numberOfParticipants > 10000')

# Get the entries where 'number of participants' is greater than 10000 and the resource type is a 'Population cohort'
# Store the information in variables, first
min_subpop = 10000
cohort_type = 'Population cohort'
table_data = client.get(table='Resources', query_filter=f'numberOfParticipants > {min_subpop}'
f'and cohortType == {cohort_type}')
```


### get_graphql
```python
def get_graphql(self,
table: str,
columns: list[str] = None,
query_filter: str = None,
schema: str = None) -> list:
...
```
Retrieves data from a table on a schema using the GraphQL API and returns the result as a list of dictionaries.
This method and its parameters behave similarly to `get` with option `as_df=False`.
The results are returned in a slightly different way, however.
`get` retains the column _names_, whereas `get_graphql` returns column _id_s, which are in lower camel case.
Furthermore, the `get` method will return the values in columns with a reference type, while the results of `get_graphql` will also contain the primary keys for those columns.

Throws the `NoSuchSchemaException` if the user does not have at least _viewer_ permissions or if the schema does not exist.
Throws the `NoSuchColumnException` if the `columns` argument or query filter contains a column that is not present in the table.


| parameter | type | description | required | default |
|----------------|------|-----------------------------------|----------|---------|
| `table` | str | the name of a table | True | None |
| `columns` | list | a list of column ids to filter on | False | None |
| `schema` | str | the name of a schema | False | None |
| `query_filter` | str | a string to filter the results on | False | None |

##### examples

```python
# Get all entries for the table 'Resources' on the schema 'MySchema'
table_data = client.get_graphql(table='Resources', schema='MySchema', columns=['name', 'collectionEvents'])

# Set the default schema to 'MySchema'
client.set_schema('MySchema')

# Get the entries where the value of a particular column 'number of participants' is greater than 10000
table_data = client.get(table='Collections', query_filter='numberOfParticipants > 10000')
table_data = client.get_graphql(table='Resources', query_filter='numberOfParticipants > 10000')

# Get the entries where 'number of participants' is greater than 10000 and the cohort type is a 'Population cohort'
# Get the entries where 'number of participants' is greater than 10000 and the resource type is a 'Population cohort'
# Store the information in variables, first
min_subcohorts = 10000
min_subpop = 10000
cohort_type = 'Population cohort'
table_data = client.get(table='Collections', query_filter=f'numberOfParticipants > {min_subcohorts}'
f'and cohortType == {cohort_type}')
table_data = client.get_graphql(table='Resources', query_filter=f'numberOfParticipants > {min_subpop}'
f'and cohortType == {cohort_type}')
```


### get_schema_metadata
```python
def get_schema_metadata(self, name: str = None) -> Schema:
Expand Down Expand Up @@ -212,11 +262,11 @@ Throws the `NoSuchSchemaException` if the user does not have at least _viewer_ p
##### examples
```python

# Export the table 'Collections' on the schema 'MySchema' from the CSV API to a BytesIO object
collections_raw: BytesIO = await client.export(schema='MySchema', table='Collections')
# Export the table 'Resources' on the schema 'MySchema' from the CSV API to a BytesIO object
resources_raw: BytesIO = await client.export(schema='MySchema', table='Resources')

# Export 'Collections' from the Excel API to the file 'Collections-export.xlsx'
await client.export(schema='MySchema', table='Collections', filename='Collections-export.xlsx')
# Export 'Resources' from the Excel API to the file 'Resources-export.xlsx'
await client.export(schema='MySchema', table='Resources', filename='Resources-export.xlsx')
```


Expand All @@ -243,12 +293,12 @@ Throws the `NoSuchSchemaException` if the schema is not found on the server.

##### examples
```python
# Save an edited table with Collections data from a CSV file to the Collections table
client.save_schema(table='Collections', file='Collections-edited.csv')
# Save an edited table with Resources data from a CSV file to the Resources table
client.save_schema(table='Resources', file='Resources-edited.csv')

# Save an edited table with Collections data from memory to the Collections table
collections: pandas.DataFrame = ...
client.save_schema(table='Collections', data=collections)
# Save an edited table with Resources data from memory to the Resources table
resources: pandas.DataFrame = ...
client.save_schema(table='Resources', data=resources)
```

### upload_file
Expand All @@ -269,8 +319,8 @@ Throws the `NoSuchSchemaException` if the schema is not found on the server.

##### examples
```python
# Upload a file containing Collections data to a schema
await client.upload_file(file_path='data/Collections.csv')
# Upload a file containing Resources data to a schema
await client.upload_file(file_path='data/Resources.csv')

# Upload a file containing members information to a schema
await client.upload_file(file_path='molgenis_members.csv', schema='MySchema')
Expand Down Expand Up @@ -306,18 +356,27 @@ Throws the `NoSuchSchemaException` if the schema is not found on the server.

##### examples
```python
# Delete cohorts from a list of ids
cohorts = [{'name': 'Cohort 1', 'name': 'Cohort 2'}]
client.delete_records(schema='MySchema', table='Cohorts', data=cohorts)
# Delete resources from a list of ids
resources = [{'name': 'Resource 1', 'name': 'Resource 2'}]
client.delete_records(schema='MySchema', table='Resources', data=resources)

# Delete cohorts from pandas DataFrame
cohorts_df = pandas.DataFrame(data=cohorts)
client.delete_records(schema='MySchema', table='Cohorts', data=cohorts_df)
# Delete resources from pandas DataFrame
resources_df = pandas.DataFrame(data=resources)
client.delete_records(schema='MySchema', table='Resources', data=resources_df)

# Delete cohorts from entries in a CSV file
client.delete_records(schema='MySchema', table='Cohorts', file='Cohorts-to-delete.csv')
# Delete resources from entries in a CSV file
client.delete_records(schema='MySchema', table='Resources', file='Resources-to-delete.csv')
```

### truncate
```python
client.truncate(table='My table', schema='My Schema')
```
Truncates the table and removes all its contents.
This will fail if entries in the table are referenced from other tables.

Throws the `ReferenceException` if entries in the table are referenced in other tables.

### create_schema
```python
async def create_schema(self,
Expand Down
10 changes: 10 additions & 0 deletions tools/pyclient/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,16 @@ pip install molgenis-emx2-pyclient
Releases of the Molgenis EMX2 Pyclient follow the release number of the accompanying release of the Molgenis EMX2 software.
Therefore, releases of the Pyclient are less frequent than those of EMX2 and the latest version of the Pyclient may differ from the latest version of Molgenis EMX2.

#### 11.56.2
- Added: feature 'truncate' to remove all entries from a table
- Added: option to filter results of `get` method by columns
- Added: method `get_graphql` implements the GraphQL API
- Improved: added additional parsing for data returned from the CSV API to pandas DataFrame in `get` method
- Fixed: log level was set to `DEBUG` without possibility to change this. The user can now set the log level again at their preferred level

#### 11.47.1
Fixed: updated GraphQL queries to be in line with EMX2 database metadata

#### 11.23.0
Added: an optional `job` argument to the `Client` initialization, allowing the Pyclient to run asynchronous methods within a job in EMX2."

Expand Down
Loading

0 comments on commit bc3e226

Please sign in to comment.