-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph skewed 1958 - 1970 - Scatterplot request! #109
Comments
Yeah, I think you're probably right there. Not sure there is much we can about that? It's simply a reflection of the data we have. Possibly a candidate for all the explanatory documentation we don't have ;) |
We could double check the amount of data points between that period querying the api ? |
Like, make two queries, where you grab all the data for the first 15 years? And another for the remaining data using your usual 10 points per whatever query? |
Well, the graph could be reconfigured to graph the points on the date axis. Like a scatterplot or something. Or, since we’re sampling 1/10 of the data, could the sampling function pick a fixed # of samples each year? ie 1960 and 1990 would have the same # of data points |
Currently within in the frontend we have all the data. So we could most definitely update the sampling function to sample data differently based on the yea. E.G. First 15 years -> show all data points |
Sounds fine to me! |
How weird! So therefore, in this experiment I would expect the early years to be wider than the later years on the graph (which presents its own issue). But instead the earlier years are still skinnier. Are you sure it's working as you intended? The points should be really just be positioned relative to the date rather than just giving each point equal spacing. The graphing function is basically ignoring the fact that each datapoint has a corresponding date. How might we get the graph to do this? |
Is this more like the desired result ? @titojankowski |
@grady-lad That's headed in the right direction! Can we do a scatterplot instead? Sorry to mention it again if you've already thought through it and it's not doable, but it would make everything easy! With the current method, we have to manually get the right datapoints otherwise it skews the graph a lot. @lwm thoughts? With this method: The count is 790 data points of weekly data from the beginning until 1974-05-17. After that it's pretty much daily data. Conclusion: I suggest trying all of the first 790 data points, and then every 7th data point after that (keep it consistent with weekly data). How's that look? Again, a scatter plot would not need any of this, and this count will change if we ever add more data to the early period. Let me know if there's anything I can do to help! |
I thought the sampling was to reduce the performance issue with the chart ? @titojankowski I have added your suggestion for the sampling (1st 790 items & every 7th item) and its looking good! In relation to the scatter plot would we still not have to sample the data? I still trying to understand the full benefits of switching over to a scatter plot ? |
@grady-lad 2 separate issues The sampling is to cut the total amount of data. Sampling is good overall...ie pulling 1/10 of all samples. But the issue here is skewing, and happens whether we sample or not. It was an issue on the ALL chart forever, we just didn’t notice it. A normal line graph works great if the spacing between every point is the same. ie Coinbase has one data point for every day. But it’s not in our case. We have weekly data at the beginning of our dataset, and daily data towards the end. X-Y Scatterplot is useful because it places the datapoints on the x-axis based on their date. We shouldn’t need the whole “treat the first 790 datapoints this way, and the rest this other way”. A scatterplot would just make it all work without that, put the data in, and it positions the data points correctly. Does that help? Or more confusing? |
And yes! The current fix does look great, just saw it! tho it will break if the early data ever changes from 790. @grady-lad |
The reason this skew issue matters is it made it look like CO2 rose really fast from 1958-1975 or so, and then slowed down. But that’s not true at all! CO2 has risen steadily every year since 1958...and maybe now it’s speeding up a bit. But the skew issue made it look like the CO2 increase isnt as bad as it was around 1958. That’s why I noticed it recently, I was wondering if CO2 was accelerating vs just steadily rising. That’s when I saw the issue with the 1958-1975 data. So that’s why it’s important we convey the data correctly. And a scatterplot makes that easy. |
@grady-lad I'm taking another look at the sampling. One major outlier is the year 1964. There's only 31 data points for 1964[1], so with the current graph it ends up looking really skinny. But we should just take an equal number of data points for each year. Thoughts on how to fix this? Maybe we could make our sampling function smarter so it picks the same number of points from every year? That way you wouldn't need separate code for 1958-1974 and >1974. [1]: 31 datapoints in 1964: http://api.carbondoomsday.com/api/co2/?date__range=1964-01-01%2C1965-01-01 |
Briefly coming in here, didn't read everything 🦅 BUT is this the status of this issue? A feature request for a new plot? |
Yeah, make the x-axis position based on the date, rather than equally
spacing data. At this point if we had a datapoint from 2025, it would
simply appear right next to our existing data, rather than over on 2025
on the x-axis.
On Sun, Jan 21, 2018, at 2:30 AM, Luke Murphy wrote:
> And a scatterplot makes that easy.
Briefly coming in here, didn't read everything 🦅 BUT is this the
status of this issue? A feature request for a new plot?> — You are receiving this because you were mentioned. Reply to this
email directly, view it on GitHub[1], or mute the thread[2].>
|
The 10-15 year period at the beginning of the grap is much shorter than another 15 year period, ie 1980-1995. I think this is because there are less data points in the early 1958 period? @lwm @grady-lad
The effect is it looks like levels rocketed up fast from 1958-1970 but it’s just that the graph is compressed.
The text was updated successfully, but these errors were encountered: