Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop' into feat/mkdocs-material
Browse files Browse the repository at this point in the history
  • Loading branch information
philtom-ctds committed Jul 12, 2024
2 parents 29e3f80 + 93d99cc commit b42b6f1
Show file tree
Hide file tree
Showing 23 changed files with 850 additions and 1,517 deletions.
8 changes: 4 additions & 4 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -169,23 +169,23 @@
"filename": "docs/API/Users_Guide/Submission.md",
"hashed_secret": "93f5b94e262e685fee4a419438d60e82fafaf491",
"is_verified": false,
"line_number": 2389,
"line_number": 1202,
"is_secret": false
},
{
"type": "Hex High Entropy String",
"filename": "docs/API/Users_Guide/Submission.md",
"hashed_secret": "313355a8530a54c23567f7bbedd9f804bb269820",
"is_verified": false,
"line_number": 2509,
"line_number": 1441,
"is_secret": false
},
{
"type": "Hex High Entropy String",
"filename": "docs/API/Users_Guide/Submission.md",
"hashed_secret": "b47ceb76f45ab4e8b52da270875d85fdd9b7fc33",
"is_verified": false,
"line_number": 2580,
"line_number": 1512,
"is_secret": false
}
],
Expand Down Expand Up @@ -285,5 +285,5 @@
}
]
},
"generated_at": "2024-07-12T14:45:44Z"
"generated_at": "2024-07-12T16:08:34Z"
}
27 changes: 0 additions & 27 deletions Data_Portal_V1_UG.yml

This file was deleted.

212 changes: 212 additions & 0 deletions docs/API/Users_Guide/Data_Analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ The following data analysis endpoints are available from the GDC API:
|__Node__| __Endpoint__ | __Description__ |
|---|---|---|
|__Genes__| __/genes__ | Allows users to access summary information about each gene using its Ensembl ID. |
|__Gene Expression__|__/gene_expression/availability__|Allows users to retrieve the availability of gene expression data for specific cases and/or genes.|
||__/gene_expression/values__|Get gene expression values for specified cases and genes.|
||__/gene_expression/gene_selection__|Select the most variably expressed genes for a collection of cases and genes.|
|__SSMS__| __/ssms__ | Allows users to access information about each somatic mutation. For example, a `ssm` would represent the transition of C to T at position 52000 of chromosome 1. |
||__/ssms/`<ssm_id>`__|Get information about a specific ssm using a `<ssm_id>`, often supplemented with the `expand` option to show fields of interest. |
|| __/ssm_occurrences__ | A `ssm` entity as applied to a single instance (case). An example of a `ssm occurrence` would be that the transition of C to T at position 52000 of chromosome 1 occurred in patient TCGA-XX-XXXX. |
Expand Down Expand Up @@ -165,6 +168,215 @@ __Example 2:__ A user wants a subset of elements such as a list of coordinates f
(truncated)
```

## Gene Expression Examples

### Gene Expression Availability Endpoint

The purpose of this endpoint is to retrieve the availability of gene expression data for cases, genes, or both. The availability response informs the user if gene expression data exists for each case or gene, which are specified with case and gene IDs. Gene expression data is only available for protein-coding genes.

__Example 1__: A user wants to get the availability of gene expression data for a set of cases and genes.

```Filter
{
"case_ids": [
"6d4f38db-a97b-4dc0-8dc5-2ac7f2cc5e38",
"e3b32485-b204-43a7-93a5-601408fcdf96"
],
"gene_ids": [
"ENSG00000141510",
"ENSG00000181143"
]
}
```

```Shell
curl -X 'POST' \
'https://api.gdc.cancer.gov/gene_expression/availability' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"case_ids": [
"6d4f38db-a97b-4dc0-8dc5-2ac7f2cc5e38",
"e3b32485-b204-43a7-93a5-601408fcdf96"
],
"gene_ids": [
"ENSG00000141510",
"ENSG00000181143"
]
}'
```

```Response
{
"cases": {
"details": [
{
"case_id": "6d4f38db-a97b-4dc0-8dc5-2ac7f2cc5e38",
"has_gene_expression_values": false
},
{
"case_id": "e3b32485-b204-43a7-93a5-601408fcdf96",
"has_gene_expression_values": true
}
],
"with_gene_expression_count": 1,
"without_gene_expression_count": 1
},
"genes": {
"details": [
{
"gene_id": "ENSG00000141510",
"has_gene_expression_values": true
},
{
"gene_id": "ENSG00000181143",
"has_gene_expression_values": true
}
],
"with_gene_expression_count": 2,
"without_gene_expression_count": 0
}
}
```

### Gene Expression Values Endpoint

The purpose of this endpoint is to retrieve the gene expression values for the given cases and genes. The response is a TSV containing the expression values for genes to cases.
The `tsv_units` of gene expression data must be defined by exactly one of the following:

* `uqfpkm` - FPKM-UQ values. More information on calculations can be found [here](/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#calculations).
* `median_centered_log2_uqfpkm` - Median-centered log2(FPKM-UQ+1) values.

The `median_centered_log2_uqfpkm` is calculated through the following steps:

1. Calculate the Median: Determine the median of all provided log2(uqfpkm + 1) values.
1. Compute Median-Centered Values: Subtract the median from each log2(uqfpkm + 1) value.
1. Generate the Result Sequence: Create a new sequence with the median-centered values, preserving the original order.

__Example 1__: A user wants to get expression values using case IDs and gene IDs.

```Filter
{
"case_ids": [
"6d4f38db-a97b-4dc0-8dc5-2ac7f2cc5e38",
"e3b32485-b204-43a7-93a5-601408fcdf96",
"000ead0d-abf5-4606-be04-1ea31b999840",
"001ab32d-f924-4753-ad67-4366fb845ae6"
],
"gene_ids": [
"ENSG00000141510",
"ENSG00000181143"
],
"tsv_units": "median_centered_log2_uqfpkm",
"format": "tsv"
}
```

```Shell
curl -X 'POST' \
'https://api.gdc.cancer.gov/gene_expression/values' \
-H 'accept: text/tab-separated-values' \
-H 'Content-Type: application/json' \
-d '{
"case_ids": [
"6d4f38db-a97b-4dc0-8dc5-2ac7f2cc5e38",
"e3b32485-b204-43a7-93a5-601408fcdf96",
"000ead0d-abf5-4606-be04-1ea31b999840",
"001ab32d-f924-4753-ad67-4366fb845ae6"
],
"gene_ids": [
"ENSG00000141510",
"ENSG00000181143"
],
"tsv_units": "median_centered_log2_uqfpkm",
"format": "tsv"
}'
```

```Response
gene_id 000ead0d-abf5-4606-be04-1ea31b999840 001ab32d-f924-4753-ad67-4366fb845ae6 e3b32485-b204-43a7-93a5-601408fcdf96
ENSG00000141510 -0.58248 1.75830 0.00000
ENSG00000181143 -0.02529 0.00000 3.52293
```

### Gene Expression Gene Selection Endpoint

Select the most variably expressed genes for a collection of cases and collection of genes. The request must define a collection of cases, a collection of genes, and a selection size. A minimum expression value may optionally be defined.

A collection of cases must be defined by case IDs.

A collection of genes must be defined by exactly one of the following:

* `gene_ids`
* `gene_type` which has only one value: `protein_coding`.

A selection size (`selection_size`) defines the maximum number of genes to select.

An optional threshold (`min_median_log2_uqfpkm`) defines a minimum value for expression. Defaults to `1`.

__Example 1__: A user wants to get the most variably expressed genes for a list of case UUIDs and a list of Ensembl gene IDs.

```Filter
{
"case_ids": [
"000ead0d-abf5-4606-be04-1ea31b999840",
"001ab32d-f924-4753-ad67-4366fb845ae6",
"0024c94c-88ff-49d9-8dc4-bf77f832d85e",
"003f4f85-3244-4132-8c9d-c29f09382269",
"005d0639-c923-470f-a179-02a4dbb5cdf2",
"006931bb-f5b1-4aa4-b0a8-af517a912db0",
"0084e8b6-57fc-48b6-aa77-fec6e45161d2",
"008d3744-e7f0-41a5-a419-702960cf1ccb",
"0094e07c-1595-402e-9d38-68b9cac71e7b",
"00bd58bd-223d-433e-b60a-5bf355f342b1"
],
"gene_ids": [
"ENSG00000141510",
"ENSG00000181143"
],
"selection_size": 1
}
```

```Shell
curl -X 'POST' \
'https://api.gdc.cancer.gov/gene_expression/gene_selection' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"case_ids": [
"000ead0d-abf5-4606-be04-1ea31b999840",
"001ab32d-f924-4753-ad67-4366fb845ae6",
"0024c94c-88ff-49d9-8dc4-bf77f832d85e",
"003f4f85-3244-4132-8c9d-c29f09382269",
"005d0639-c923-470f-a179-02a4dbb5cdf2",
"006931bb-f5b1-4aa4-b0a8-af517a912db0",
"0084e8b6-57fc-48b6-aa77-fec6e45161d2",
"008d3744-e7f0-41a5-a419-702960cf1ccb",
"0094e07c-1595-402e-9d38-68b9cac71e7b",
"00bd58bd-223d-433e-b60a-5bf355f342b1"
],
"gene_ids": [
"ENSG00000141510",
"ENSG00000181143"
],
"selection_size": 1
}'
```

```Response
{
"gene_selection": [
{
"log2_uqfpkm_stddev": 0.9962971125913709,
"log2_uqfpkm_median": 2.904457107848132,
"gene_id": "ENSG00000141510",
"symbol": "TP53"
}
]
}
```

## Simple Somatic Mutation Endpoint Examples

__Example 1__: Similar to the `/genes` endpoint, a user would like to retrieve information about the mutation based on its COSMIC ID. This would be accomplished by creating a JSON filter, which will then be encoded to URL for the `curl` command.
Expand Down
Loading

0 comments on commit b42b6f1

Please sign in to comment.