Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph skewed 1958 - 1970 - Scatterplot request! #109

Open
titojankowski opened this issue Nov 19, 2017 · 18 comments
Open

Graph skewed 1958 - 1970 - Scatterplot request! #109

titojankowski opened this issue Nov 19, 2017 · 18 comments
Assignees
Milestone

Comments

@titojankowski
Copy link
Contributor

The 10-15 year period at the beginning of the grap is much shorter than another 15 year period, ie 1980-1995. I think this is because there are less data points in the early 1958 period? @lwm @grady-lad
The effect is it looks like levels rocketed up fast from 1958-1970 but it’s just that the graph is compressed.

@decentral1se
Copy link
Member

Yeah, I think you're probably right there.

Not sure there is much we can about that? It's simply a reflection of the data we have.

Possibly a candidate for all the explanatory documentation we don't have ;)

@grady-lad
Copy link
Contributor

We could double check the amount of data points between that period querying the api ?

@decentral1se
Copy link
Member

Like, make two queries, where you grab all the data for the first 15 years?

And another for the remaining data using your usual 10 points per whatever query?

@titojankowski
Copy link
Contributor Author

Well, the graph could be reconfigured to graph the points on the date axis. Like a scatterplot or something. Or, since we’re sampling 1/10 of the data, could the sampling function pick a fixed # of samples each year? ie 1960 and 1990 would have the same # of data points
@lwm @grady-lad

@grady-lad
Copy link
Contributor

Currently within in the frontend we have all the data. So we could most definitely update the sampling function to sample data differently based on the yea.

E.G. First 15 years -> show all data points
Remaining data -> sample every 10th item.

@titojankowski
Copy link
Contributor Author

Sounds fine to me!

@grady-lad grady-lad self-assigned this Nov 30, 2017
@grady-lad
Copy link
Contributor

This is what the graph looks like when we show all the data for the first 1500 data points (up to 1976-07-16) and sample 1/10 of the data after.
screen shot 2017-11-29 at 20 33 24

And here is what the data looks like currently when we sample every 10th data point
screen shot 2017-11-29 at 20 35 11

Not much of a difference =/

@titojankowski
Copy link
Contributor Author

How weird!
Every 10th data point = 36.5 datapoints per year (365 days / 10)
1500 datapoints over 18 years (1958 - 1976) = 83 datapoints per year

So therefore, in this experiment I would expect the early years to be wider than the later years on the graph (which presents its own issue). But instead the earlier years are still skinnier. Are you sure it's working as you intended?

The points should be really just be positioned relative to the date rather than just giving each point equal spacing. The graphing function is basically ignoring the fact that each datapoint has a corresponding date. How might we get the graph to do this?

Here's the gold standard, the Keeling Curve for reference!
mlo_full_record

@grady-lad
Copy link
Contributor

Ah I've just realised I was not updating the values for the x axis when doing the sampling 🤦‍♂️

So here is what the sampling logic looks like now.

For the first 1000 items -> take every 2nd data point
For the remaining items -> take every 5th data point.

screen shot 2017-11-30 at 13 41 16

screen shot 2017-11-30 at 13 41 56

@grady-lad
Copy link
Contributor

grady-lad commented Nov 30, 2017

Is this more like the desired result ? @titojankowski

@titojankowski
Copy link
Contributor Author

@grady-lad That's headed in the right direction!

Can we do a scatterplot instead? Sorry to mention it again if you've already thought through it and it's not doable, but it would make everything easy! With the current method, we have to manually get the right datapoints otherwise it skews the graph a lot. @lwm thoughts?

With this method:
Looking at our raw data, we have roughly weekly data up until 1974-05-17. (API data: http://api.carbondoomsday.com/api/co2/?date__lte=1974-05-17)

The count is 790 data points of weekly data from the beginning until 1974-05-17. After that it's pretty much daily data.

Conclusion: I suggest trying all of the first 790 data points, and then every 7th data point after that (keep it consistent with weekly data). How's that look? Again, a scatter plot would not need any of this, and this count will change if we ever add more data to the early period. Let me know if there's anything I can do to help!

@grady-lad
Copy link
Contributor

I thought the sampling was to reduce the performance issue with the chart ?
The reason for the chart being skewed was an error on my part (not updating the x axis values correctly).

@titojankowski I have added your suggestion for the sampling (1st 790 items & every 7th item) and its looking good!

In relation to the scatter plot would we still not have to sample the data? I still trying to understand the full benefits of switching over to a scatter plot ?

@titojankowski
Copy link
Contributor Author

@grady-lad 2 separate issues

The sampling is to cut the total amount of data. Sampling is good overall...ie pulling 1/10 of all samples.

But the issue here is skewing, and happens whether we sample or not. It was an issue on the ALL chart forever, we just didn’t notice it.

A normal line graph works great if the spacing between every point is the same. ie Coinbase has one data point for every day.

But it’s not in our case. We have weekly data at the beginning of our dataset, and daily data towards the end.

X-Y Scatterplot is useful because it places the datapoints on the x-axis based on their date. We shouldn’t need the whole “treat the first 790 datapoints this way, and the rest this other way”. A scatterplot would just make it all work without that, put the data in, and it positions the data points correctly. Does that help? Or more confusing?

@titojankowski
Copy link
Contributor Author

And yes! The current fix does look great, just saw it! tho it will break if the early data ever changes from 790. @grady-lad

@titojankowski
Copy link
Contributor Author

@grady-lad

The reason this skew issue matters is it made it look like CO2 rose really fast from 1958-1975 or so, and then slowed down. But that’s not true at all! CO2 has risen steadily every year since 1958...and maybe now it’s speeding up a bit. But the skew issue made it look like the CO2 increase isnt as bad as it was around 1958.

That’s why I noticed it recently, I was wondering if CO2 was accelerating vs just steadily rising. That’s when I saw the issue with the 1958-1975 data.

So that’s why it’s important we convey the data correctly. And a scatterplot makes that easy.

@titojankowski
Copy link
Contributor Author

@grady-lad I'm taking another look at the sampling. One major outlier is the year 1964. There's only 31 data points for 1964[1], so with the current graph it ends up looking really skinny. But we should just take an equal number of data points for each year. Thoughts on how to fix this? Maybe we could make our sampling function smarter so it picks the same number of points from every year? That way you wouldn't need separate code for 1958-1974 and >1974.

[1]: 31 datapoints in 1964: http://api.carbondoomsday.com/api/co2/?date__range=1964-01-01%2C1965-01-01

screenshot 2018-01-04 21 04 15

@decentral1se
Copy link
Member

decentral1se commented Jan 21, 2018

And a scatterplot makes that easy.

Briefly coming in here, didn't read everything 🦅 BUT is this the status of this issue?

A feature request for a new plot?

@decentral1se decentral1se added this to the P3: Optional milestone Jan 21, 2018
@titojankowski
Copy link
Contributor Author

titojankowski commented Jan 22, 2018 via email

@decentral1se decentral1se changed the title Graph skewed 1958 - 1970 Graph skewed 1958 - 1970 - Scatterplot request! Jan 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants