Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore ICEES+ API multivariate approach #254

Closed
karafecho opened this issue Jan 4, 2023 · 4 comments
Closed

Explore ICEES+ API multivariate approach #254

karafecho opened this issue Jan 4, 2023 · 4 comments
Assignees

Comments

@karafecho
Copy link
Contributor

Recently, I worked with several students to develop and test a new ICEES+ functionality that supports the generation of multivariate tables for applications such as regression and random forest. Those students have since moved on, leaving me as the sole person who currently understands the approach and can implement it. As such, I thought it might useful for you to explore the functionality a bit ... but only if this is of interest to you.

Here's the initial methods paper: Fecho K,* Haaland P, Krishnamurthy A, Lan B, Ramsey S, Schmitt PL, Sharma P, Sinha M, Xu H. An approach for open multivariate analysis of integrated clinical and environmental exposures data. Inform Med Unlocked 2021;26:100733. doi.org/10.1016/j.imu.2021.100733. https://pubmed.ncbi.nlm.nih.gov/35875189/.

And here's the repo we worked from: https://github.com/ExposuresProvider/icees-kp-analytics. It's private, but it's under the ExposureProvider org, so I think you should have access, but please let me know if you don't.

@karafecho
Copy link
Contributor Author

We have four additional papers that I can send you, if that would be helpful. Just let me know.

@hyi
Copy link
Collaborator

hyi commented Jan 5, 2023

@karafecho This is pretty interesting and I have access to the private repo. I have downloaded the PubMed paper and will read that paper to start with. Yes, it'd be great if you can send me the other 4 papers if you have them handy. Thanks

@karafecho
Copy link
Contributor Author

karafecho commented Jan 5, 2023

Yeah, I figured out the approach during a whiteboarding session with a very persistent student from the NC School of Science & Math who insisted on moving beyond bivariate associations. Took some creative thinking, but I soon realized that you could leverage the dynamic cohort creation functionality to generate multivariate tables through iterative requests to the OpenAPI for bivariate associations. The approach has limitations to be sure (e.g., data loss), but it's completely open and approved by the CDWH Oversight Committee (I formally requested approval, as I was a bit worried about certain aspects of the multivariate functionality), so definitely valuable for exploratory analysis.

Here are three of the papers:

https://pubmed.ncbi.nlm.nih.gov/34769911/
https://www.medrxiv.org/content/10.1101/2022.12.20.22283734v1
https://renci.org/technical-reports/tr-22-01/

Priya's paper has not yet been accepted for publication, and she didn't create a preprint, so I probably shouldn't post it to a public repo. I'll share it after it's accepted for publication (it was just resubmitted with minor revisions, per reviewer request).

A few things I've been thinking about, and would love to brainstorm with you about, include the following:

  1. How can we expose the multivariate functionality as a new ICEES+ endpoint, while maintaining all regulatory requirements and supporting user choice in, e.g., outcomes and feature variables? With a generic script and a warning to users about the limitations such as data loss, my gut feeling is that this should be relatively straightforward.
  2. How can we expose the multivariate functionality as part of Translator? See New operation suggestion: create_multivariate_table NCATSTranslator/OperationsAndWorkflows#72. I think Translator folks are interested in this. In fact, it's been on the agenda for more than one meeting of the Ops & Workflows WG. Just not sure how to make this work. An alternative, I think, is to run prescribed regression analyses, for example, tailored to each use case, and expose the model results.

@karafecho
Copy link
Contributor Author

Hong and I identified and implemented a solution to support open multivariate analysis using the ICEES+ OpenAPI. We also have identified and are implementing an approach to support open longitudinal multivariate analysis by exposing and leveraging two key feature variables: study_period and PatientID. The latter approach will be tested initially using the ICEES+ PCD instance.

Closing this issue as it has been replaced by #285 and #286.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants