-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple example w/ real time-series data? #74
Comments
This sounds like a good idea. If you're interested in contributing, I'm happy to review a PR. I'll try and add an example or two myself over the next couple of days though. |
I'll be happy to add an example once I see an example of how to use this package. I created the following simple dataset (US annual GDP growth) which I'd like to use w/ your package. using HTTP, DataFrames, CSV
link = "https://bit.ly/3vtTaW5"
r = HTTP.get(link)
df = CSV.read(r.body, DataFrame, missingstring="NA") Suppose I have GDP data from 1948-1968. |
Great. TemporalGPs implements the AbstractGPs.jl API, so using it is essentially the same as using that. We've also not managed to document that fantastically, so here's a worked example. It's not a great model (you'd want to optimise the kernel if you were doing this seriously), but it demonstrates how to use the package: # Load dataset.
using CSV
using DataFrames
using HTTP
link = "https://bit.ly/3vtTaW5";
r = HTTP.get(link);
df = CSV.read(r.body, DataFrame; missingstring="NA");
using AbstractGPs
using Plots
using TemporalGPs
# Specify training data. AbstractGPs doesn't explicitly support dates, although this is
# certainly something we could (probably should) add. It would be straightforward to do so.
x = RegularSpacing(0.0, 1.0, length(df.date));
y = df.value;
# Build a GP with fixed kernel. You would want to learn them in practice in any serious setting.
# See the README for an example using Optim + Zygote for this.
# This probably won't be a very good model as-is -- it's the most basic thing you could do.
f = to_sde(GP(Matern52Kernel()), SArrayStorage(Float64));
f_post = posterior(f(x, 1e-3), y);
# Specify locations at which to make predictions. I've chosen to make predictions at
# all of the training data, and for 5 years into the future. If you just wanted a single year
# into the future, that would be fine too.
x_predict = RegularSpacing(0.0, 1.0, length(df.date) + 5);
# If you just want the posterior marginals, you would write something like
marginals(f_post(x_predict))
# If you want a sample from the posterior, you would write something like
rand(f_post(x_predict))
# We have a plotting recipe that will plot the marginals.
plot(f_post(x_predict))
scatter!(x, y)
# For the length of time series you've provided, it's probably not necessary to use
# TemporalGPs -- the standard approach to inference in a GP will do just fine.
# You can do this is in almost exactly the same way as using TemporalGPs.
# The only thing that has changed is there's no need to call `to_sde`.
f = GP(Matern52Kernel());
# Everything else from above is the same. Let me know whether this is clear! edit: |
Thanks! I got the following to work: # Load dataset.
using HTTP, CSV, DataFrames, Dates
link = "https://bit.ly/3vtTaW5";
r = HTTP.get(link);
df = CSV.read(r.body, DataFrame; missingstring="NA");
# plot ts data
using Plots
plot(year.(df.date), df.value, lab = "");
scatter!(year.(df.date), df.value, lab = "US Real GDP growth, pch1");
hline!([0], color = "red", lab = "") using AbstractGPs, TemporalGPs
# Specify training data.
x = RegularSpacing(0.0, 1.0, length(df.date));
y = df.value;
# Specify locations at which to make predictions.
x_predict = RegularSpacing(0.0, 1.0, length(df.date) + 5);
# Build a GP with fixed kernel.
f_naive = GP(Matern52Kernel());
f = to_sde(f_naive, SArrayStorage(Float64));
f_post = posterior(f(x, 1e-3), y); # get posterior dist having observed y @ x.
marginals(f_post(x_predict)) # Posterior marginals
rand(f_post(x_predict)) # Sample from posterior
logpdf(f_post(x), y) # 183.95 posterior log predictive prob of `y`
# Posterior marginals for out-of sample GDP
# Observe: bigger forecast horizon => bigger σ
marginals(f_post(x_predict))[end-4:end]
"""
5-element Vector{Distributions.Normal{Float64}}:
Distributions.Normal{Float64}(μ=-2.5304528896537226, σ=0.8354470986323949)
Distributions.Normal{Float64}(μ=-0.7158798705732248, σ=0.9888083046711964)
Distributions.Normal{Float64}(μ=-0.14706868329661496, σ=0.9995439581936363)
Distributions.Normal{Float64}(μ=-0.02571225001856308, σ=0.9999862705923213)
Distributions.Normal{Float64}(μ=-0.004078352668396576, σ=0.9999996576590127)
"""
px = x_predict[end-4:end]
pm = mean.(marginals(f_post(x_predict))[end-4:end])
ps = std.(marginals(f_post(x_predict))[end-4:end])
plb = pm - 1.96*ps
pub = pm + 1.96*ps
# Plotting recipe
plot(f_post(x_predict), lab = "");
xticks!((0:8:72, string.(year.(df.date))[1:8:73]));
scatter!(x, y, lab = "US GDP growth: observed");
scatter!(px, pm, lab = "US GDP growth: mean prediction");
#scatter!(px,plb,lab="")
#scatter!(px,pub,lab="")
hline!([0], color = "red", lab = "")
My background, I know a bit about forecasting & ML. Some questions (me thinking out loud):
|
Yes, it's a multivariate Gaussian for any (finite) collection of inputs, and given by
Hmm that's not something we've built in -- it's the kind of thing I would hope a generic time series analysis package would offer, and that we could hook into. Do you happen to know if this kind of functionality exists elsewhere in Julia's time series analysis ecosystem? It's not something I've looked into as much as I should have.
Yes and no. If you use the basic functionality (as shown above) then yes. But the things in this package (and the AbstractGPs ecosystem more generally) are best thought of as a toolkit that you can plug into other things -- for example, you can plug AbstractGPs into Turing and Soss to produce more complicated, non-Gaussian, models. In addition to generic probabilistic programming tools, there are lots of GP-specific tools for handling non-Gaussian likelihoods. They'll hopefully appear in other packages within the JuliaGaussianProcesses org at some point, but I don't have a particular timeline for when they're likely to be available / mature unfortunately.
I suspect that the best way to do this is to get the
It will really depend on the problem -- I've actually not seen NGBoost before, so thanks for pointing it out. Unfortunately, the best advice I can give is to try both out on your problem and see which wins. |
I've actually been pretty excited about the emerging lit on probabilistic forecasting. Python: NGBoost, CatBoostLSS. The NGBoost paper compares the performance of several methods in Table 1: I feel like probabilistic forecasting could work nicely in Julia. I wanna benchmark some of the Julia options as well. |
Hi and thank you for this package!
Is it possible to provide a simple example w/ real time-series data in the README?
For example: TSAnalysis.jl gives several real data examples in the readme...
The text was updated successfully, but these errors were encountered: