Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Vaccine data normalization #41

Open
joel-oe-lacey opened this issue May 4, 2021 · 0 comments
Open

Improve Vaccine data normalization #41

joel-oe-lacey opened this issue May 4, 2021 · 0 comments

Comments

@joel-oe-lacey
Copy link
Collaborator

Currently, the vaccine data shows upward and downward fluctuations, which doesn't make sense as we can't suddenly have less vaccinations than we did previously.

I imagine it is due to differences between reporting locations, but the online schema does not provide any details there. Our best route would be to confirm with the team that supports the data. I've corresponded with them a few times at [email protected].

Data source:
https://data-cdphe.opendata.arcgis.com/datasets/cdphe-covid19-vaccine-daily-summary-statistics

Data currently goes through heavy normalization within apiClient.js getVaccineStatistics. The data was not grouped in any functional way, was not sorted and contained many duplicates. Handling was put into place to fix this, we currently simply take the first point available for the day, which is not the most skillful approach.

An approach to improving normalization here may be to keep the filtering, (the _groupBy and mapValues), but to drop the remove duplicates. We might want to have a more clever way of choosing the best fit from all the data points available for that day. Either by best fit to a trend line, by maximum, or by some other metric. There doesn't seem to be any useful meta data such as reporting site that allows us to differentiate one data point from the other, but again, maybe the team managing the data can elucidate there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant