This repository provides a Python script for parsing and curation of chemical data. The curation process follows the procedure recommended in the following paper:
https://pubs.acs.org/doi/10.1021/ci100176x
Eventually, this repository will include the procedures recommended in the following:
https://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00129
- sdf and csv files are supported, some basic searching for desired fields is implemented
- currently using MolVS Standardizer
- currently using MolVS Standardizer
- InChI keys are used to determine if structures are the same
- Users can provide the threshold for determining if activities for the same molecule are close enough to average, default threshold is one log unit
- Any issues throughout the process are stored in a text file for manual review