Skip to content

A python package to annotate Class A GPCR or Kinase binding types

License

Notifications You must be signed in to change notification settings

LindeSchoenmaker/BindingType

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Package to annote binding type of bioactivity measures based on keyword search of

  • abstracts from PubMed, PubChem assay description, CrossRef or Google Patents
  • assay descriptions from ChEMBL assay descriptions

The annotation is currently supported for two types of targets

Getting started

Install

pip install git+https://github.com/sohviluukkonen/BindingType.git@main

Usage

The package has both an API and a CLI which can process either

  • Papyrus datasets
  • lists of document and/or assay IDs

Papyrus data

In the case of Papyrus-dataframe, the annotation will a new BindingType column to the dataframe and can be done from the command line with

bindtype_papyrus -i <dataset.csv/.tsv> -tt <GPCR/Kinase>

or with the API with

from bindtype.papyrus import add_binding_type_to_papyrus
df = add_binding_type_to_papyrus(df, target_type=GPCR/Kinase)

There is also an option to annotate all 'unknown' compounds that based on their Tanimoto similarity to the annotated compounds: -sim, --similarity flag in the CLI and similarity=True in the API.

General usage

In the more general case, the annotation will create dictionaries based list of document IDs and/or assays IDs. This can be done either from the command line with

bindtype -did <document_id_file_path> -aid <assay_id_file_path> -tt <GPCR/Kinase>

or with the API with

# for the GPCRs
from bindtype import ClassA_GPCR_HierachicalBindingTypeAnnotation
parser = ClassA_GPCR_HierachicalBindingTypeAnnotation()

# for the kinases
from bindtype import Kinase_AllostericAnnotation
parser = Kinase_AllostericAnnotation()

# Only abstracts
dct_doc_annotations = parser(document_ids=list_of_document_ids)

# Only assay descriptions
dct_assay_annotations = parser(assay_ids=list_of_assay_ids)

# Both
dct_doc_annotations, dct_assay_annotations = parser(document_ids=list_of_document_ids, assay_ids=list_of_assay_ids)

As the scripts were developed with data from Papyrus and uses document and assay description IDs should be in the format used in the all_doc_ids and AID columns: PMID:<pubchem_id>, PubChemAID:<pubchem_assay_id>, DOI:, PATENT:<patent_id> and <chembl_assay_id>.

About

A python package to annotate Class A GPCR or Kinase binding types

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%