Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
397 add gold metadata to neon soil samples (#303)
* initial checkin with base class and basic tests - Base ChangeSheet write class - unit tests for base class * add conftest and gold changesheet tests - move test fixtures to conftest.py - add get_biosample_name function and unit test to GoldBiosample generator * update biosample name unit test add explicit expected values * Sketch out functions for gold changesheet generator * function and test for missing GOLD ecosystem metadata * add function and test for missing gold_biosample_identifiers * add get_normalized_gold_biosample_identifier * update logic with omics processing step * skeleton find_omics_processing_set function, and updated (correct this time) test data files * Add Omics to Biosample map - add omics_to_biosample map imput - added nmdc / gold BioSample comparison logic - unit tests - stub API dependent methods * Add changesheets.py pachage for common functions and classes - Changesheet and ChangesheetLineItem classes - API @op functions * refactor to split omice procesing data file read to stand-aloine function * more refactoring and code cleanup * add test generation job * add resource definitions and config * refactor and code cleanup Simplify to just ChangeSheet and ChangeSheerLineItem classes * Cleanup this branch to focus on getting assets working * fix defs and fetch statement * get basic GOLD asset generation working * Add Api resources as ConfigurableResources * Add asset scaffolding * update normalizer functions to all take and return strings * update resources add empty click script * fix gold ID normalization and add unit tests * implement compare biosamples and write_changesheet * add omics reccord comparison * Add validate_changesheet method * cleanup unused data files * fix validate_changesheet method and add logging * delete dagster asset based code and tests - move to a demo branch * add changesheet_output to .gitignore * add changesheet_output to .gitignore * remove Dagster-related code and settings * style: format with black * Use TypeAlias for JSON_OBJECT * Removed hard-coded URL from Changesheet.validate() * remove .tsv file - should be ignorewd * clarify function name and blacken formatting * fix click options help text and blacken * yet more blackening * uncomment wait-for-it * Delete get_data.ipynb * Revert "Delete get_data.ipynb" This reverts commit fe3e68a. * add docstring for generate_changesheet * automatic reformatting * bring get_data noteback back to original state * add some logging * update to use gold_sequencing_identifiers over alternative_identifiers * Delete neon_cache.sqlite * strip and de-tab the value in tsv output * set default line_items in changesheet class correctly * update output_dir type hint * remove apply_changes option * Dry up unfindable logging * Clean up gold normalization and documentation * fix: style --------- Co-authored-by: Donny Winston <[email protected]>
- Loading branch information