Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add state space model based on SpeedyWeather.jl #9

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

matt-graham
Copy link
Member

@matt-graham matt-graham commented Sep 12, 2024

Some initial work on integrating SpeedyWeather.jl global atmospheric models with ParticleDA, as an alternative to existing integration with (Fortran90) SPEEDY implementation.

So far this deals with initialising a SpeedyWeather model, mapping to and from a flat state vector to the PrognosticVariables data structure representing the model state in SpeedyWeather (corresponding to the real and imaginary components of spectral coefficients in a spherical harmonic basis for fields for layered variables - a subset of vorticity, divergence, temperature and humidity - and a single surface only variable, pressure) and performing the deterministic component of the state updates.

TODO:

  • Implement some form of stochastic update to state. In spectral space, independently perturbing the spectral coefficients with noise which scales with degree of basis (such that higher spatial frequencies have smaller scale peturbations) may be a reasonable first pass.
  • Figure out how to update diagnostic variables in model after updating prognostic variables to allow the former to be used for simulating / extracting observation from the gridded fields.
  • Allow use of Julia multithreading by creating instances of SpeedyWeather model and diagnostic / prognostic variable data structures for each task.
  • Add some tests!

@sguillas
Copy link
Member

sguillas commented Nov 29, 2024

@matt-graham @DanGiles @giordano
Let's try pseudo-code here at RIKEN with Hideyuki Sakamoto -- we are looking at the code in SpeedyWeather.jl and spherical harmonics indeed great ideas.
Julia multithreading might not be needed as SpeedyWeather.jl is very fast! and we are only doing this at small lengths of time.

SpeedyWeather/Project.toml Outdated Show resolved Hide resolved
SpeedyWeather/src/speedy.jl Outdated Show resolved Hide resolved
@milankl
Copy link

milankl commented Nov 29, 2024

Just to chip in here: I've taken out multi-threading in SpeedyWeather.jl in favour of a development towards GPUs with KernelAbstractions.jl, but there will always a single-CPU version that can run fast on a laptop, used for development, testing, teaching etc. Actually on arm chips with SVE SpeedyWeather.jl can be 8x faster than Fortran SPEEDY. So Fugaku's A64FX might be very exciting!! I believe you want to run large ensembles anway so that might be a very good fit! Here's some code snippets showing how we've run ensembles using Julia's Distributed SpeedyWeather/SpeedyWeather.jl#380 (comment)

There's probably a few things I'd adapt now, but it's a good starting point. Feel free to reach out any time if you need more input/support from our side. We (and I!!) would be very excited to push this forward. Our interface to run ensemble simulations isn't set in stone yet, so very happy to make the necessary changes that would fit such a project!

@sguillas
Copy link
Member

sguillas commented Dec 1, 2024

Wow, @milankl this is great. 8x faster!
We are aiming for ensembles of size 1 million to target extreme events, with (some) confidence. We have the time on Fugaku. @DanGiles and Hideyuki and I scheduled a meeting 8am UK 5pm Japan Monday 2 Dec and I am sending an invite to all in case some of you can make it (or part of it) @matt-graham @milankl @giordano


end

function update_spectral_coefficients_from_vector!(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use copyto! for that, e.g.

julia> using SpeedyWeather

julia> L = rand(LowerTriangularMatrix, 6, 5)
20-element, 6x5 LowerTriangularMatrix{Float64}
 0.739126   0.0         0.0       0.0       0.0
 0.0209722  0.00954583  0.0       0.0       0.0
 0.554566   0.259313    0.698014  0.0       0.0
 0.362604   0.515809    0.25572   0.12875   0.0
 0.732036   0.612449    0.503511  0.511416  0.757548
 0.517978   0.0764747   0.905544  0.594695  0.142411

julia> v = ones(15)

julia> L2 = LowerTriangularMatrix(v, 5, 5)
15-element, 5x5 LowerTriangularMatrix{Float64}
 1.0  0.0  0.0  0.0  0.0
 1.0  1.0  0.0  0.0  0.0
 1.0  1.0  1.0  0.0  0.0
 1.0  1.0  1.0  1.0  0.0
 1.0  1.0  1.0  1.0  1.0

julia> copyto!(L, L2)
20-element, 6x5 LowerTriangularMatrix{Float64}
 1.0       0.0        0.0       0.0       0.0
 1.0       1.0        0.0       0.0       0.0
 1.0       1.0        1.0       0.0       0.0
 1.0       1.0        1.0       1.0       0.0
 1.0       1.0        1.0       1.0       1.0
 0.517978  0.0764747  0.905544  0.594695  0.142411

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, should have looked a bit harder for something built in 😁. Just to check I am understanding your snippet correctly, copyto!(L, L2) when L has 1 more row than L2 will just leave the final row unchanged? And for scalar fields in the prognostic variables is it generally safe to assume we can just completely ignore the final row in the spectral coefficients as these are only used for computing meridional derivatives for vector valued fields?

As well as mapping a flat vector to lower triangular coefficients, the current update_spectral_coefficients_from_vector! also deals with mapping from all real values in vector to complex valued coefficients. I think using copyto! then the equivalent of your snippet to also deal with the real-to-complex map would be something like:

T = 4
L = randn(LowerTriangularMatrix, T+2, T+1) + randn(LowerTriangularMatrix, T+2, T+1) * 1im
v = randn((T + 1)^2)
x = [v[1:T+1]; reinterpret(Complex{eltype(v)}, v[T+2:end])]
L2 = LowerTriangularMatrix(x, T+1, T+1)
copyto!(L, L2)

Does that look reasonable?

for layer_index in 1:model.parameters.n_layers
end_index = start_index + dim_spectral - 1
update_spectral_coefficients_from_vector!(
layered_spectral_coefficients[:, layer_index],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't work because [:, layer_index] triggers a deep copy

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also a 3D (or n-D) version of this

julia> L = rand(LowerTriangularArray, 6, 5, 2)
20×2 LowerTriangularArray{Float64, 2, Matrix{Float64}}:
 0.61282   0.135929
 0.41022   0.873429
 0.2886    0.583336
 0.222706  0.890217
 0.314537  0.132059
 0.915097  0.00720557
 0.103113  0.716123
 0.646855  0.407609
 0.678361  0.0550529
 0.527747  0.0590204
 0.737875  0.907845
 0.563023  0.90874
 0.509839  0.400994
 0.719782  0.508627
 0.945752  0.184539
 0.186379  0.173045
 0.243441  0.108564
 0.333588  0.824159
 0.260858  0.255723
 0.291348  0.332528

julia> L2 = zeros(LowerTriangularArray, 5, 5, 2)
15×2 LowerTriangularArray{Float64, 2, Matrix{Float64}}:
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0

julia> copyto!(L, L2)
20×2 LowerTriangularArray{Float64, 2, Matrix{Float64}}:
 0.0       0.0
 0.0       0.0
 0.0       0.0
 0.0       0.0
 0.0       0.0
 0.915097  0.00720557
 0.0       0.0
 0.0       0.0
 0.0       0.0
 0.0       0.0
 0.737875  0.907845
 0.0       0.0
 0.0       0.0
 0.0       0.0
 0.945752  0.184539
 0.0       0.0
 0.0       0.0
 0.333588  0.824159
 0.0       0.0
 0.291348  0.332528

end

function ParticleDA.get_state_dimension(model::SpeedyModel)
(model.parameters.spectral_truncation + 1)^2 * (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you count complex as 2? because in T31 there's

julia> L = rand(LowerTriangularMatrix, 32, 32)
528-element, 32x32 LowerTriangularMatrix{Float64}
 0.51529    0.0        0.0        0.0          0.0       0.0       0.0
 0.610801   0.217856   0.0        0.0           0.0       0.0       0.0
 0.967231   0.422571   0.267194   0.0           0.0       0.0       0.0
 0.402639   0.0535502  0.536866   0.564967      0.0       0.0       0.0
 0.668567   0.546218   0.594456   0.789578      0.0       0.0       0.0
 0.599468   0.240229   0.528021   0.55347      0.0       0.0       0.0
 0.631066   0.0457941  0.390009   0.308129      0.0       0.0       0.0
 0.735746   0.846006   0.937956   0.419002      0.0       0.0       0.0
 0.626658   0.0581199  0.318926   0.767743      0.0       0.0       0.0
 0.415032   0.297547   0.0915254  0.792665      0.0       0.0       0.0
                                                        
 0.140726   0.53703    0.236005   0.723116      0.0       0.0       0.0
 0.435199   0.523604   0.21269    0.0154463     0.0       0.0       0.0
 0.560295   0.238302   0.0797404  0.619366      0.0       0.0       0.0
 0.0754052  0.715296   0.35138    0.940574     0.0       0.0       0.0
 0.237397   0.617006   0.649118   0.167015      0.0       0.0       0.0
 0.905415   0.0107095  0.651298   0.0121397     0.0       0.0       0.0
 0.846485   0.761528   0.920698   0.278254      0.0       0.0       0.0
 0.983794   0.0482784  0.205777   0.839342      0.916153  0.0       0.0
 0.595536   0.697397   0.329433   0.277305     0.638102  0.796366  0.0
 0.211984   0.302623   0.44609    0.608055      0.495768  0.765237  0.233045

julia> length(L)
528

that many (complex) spherical harmonic coefficients (excluding the last row) which you can also compute using

julia> LowerTriangularMatrices.triangle_number(32)
528

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your formula of (trunc + 1)^2 would yield 1024, if you count complex as 2 then it would be 2*528 = 1056

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming the order $m = 0$ coefficients in the first column will always be real valued from the conjugate symmetry / real-valuedness of signal (even though stored as complex) which I think would account for the difference as $(1056 - 1024) = 32$ so the 1056 is including an extra 32 imaginary coefficients for first column which will be always zero? But is that not the case?

end
end

function update_vector_from_spectral_coefficients!(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyto! also works the other way and you can always do L.data to just get the vector (or matrix for 3D) back

@milankl
Copy link

milankl commented Dec 2, 2024

Also as said earlier I don't really like the idea to just "add" noise on a given time step to the model state $X$, there are more sophisticated ways to do this. Adding this noise $r$ to the diagnostic variables in grid space is not too bad though as it only applies to a copy of the prognostic variables, say $S$ is the grid to spectral transform with $\tilde{X} = S(X)$, then conceptually we start in spectral space with the prognostic variables $\tilde{X}$, transform them to grid space diagnostic variables compute the rhs $F(X)$ from those, transform them back and do a time step in spectral space

$$\begin{align} X &= S^{-1}(\tilde{X}) \\\ \partial_t \tilde{X} &= S(F(X)) \end{align}$$

adding noise in grid space would now mean you do

$$\partial_t \tilde{X} = S(F(X + r))$$

which is from a "solving a PDE"-perspective not the worst as you essentially add additive noise to a copy of the state space which you use to evaluate the RHS $F$. This means you don't mess with the state space itself but only with the fluxes that changes that state space. In physics, that's good because otherwise you can screw up conservation of mass, energy, momentum etc. However, additive noise is pretty adhoc in a sense that there isn't much physical basis of why on would want to do that just like $X + r$. There are backscatter schemes that try to reinject energy close to the grid scale but many of those first analyse where energy is lost and then put it back in in a physically consistent way that mimics an upward cascade of energy. However, there's another simple way that's actually still used operationally by ECMWFs model which is based on the idea that on the right-hand side $F$ there are dynamics $D$ with less uncertainty and physics with more uncertainty $P$ where the random numbers should hit and in a mulitplicative way because when the physics are negligible there shouldn't be much of a stochastic perturbation but if they are large they contribute more uncertainty. In a way one trusts the sign and magnitude of the physics $P$ but not the exact value

$$\partial_t \tilde{X} = S(D(X) + (1+r)P(X))$$

these are now the "stochastically pertubed parameterization tendencies" (SPPT) as only the tendencies from the parameterization aka physics are perturbed with $r \in [-1, 1]$ to preserve the sign of $P(X)$ and make the tendencies with perturbation at most 2x of the unperturbed tendency.

So I'm wondering, would this type of randomness suffice here?

@matt-graham
Copy link
Member Author

matt-graham commented Dec 2, 2024

So I'm wondering, would this type of randomness suffice here?

For the particle filtering algorithm to be valid, a key requirement is that given a state vector $X_t \in \mathbb{R}^{D_X}$ at time index $t$ the distribution of the state $X_{t+1}$ at time index $t + 1$ given $X_t$ is non-degenerate - formally that is it has density with respect the Lebesgue measure on $\mathbb{R}^{D_X}$, or more intuitively there isn't any direction in the state space where there is no noise introduced.

Having the state transition distribution correspond to a deterministic map $F_t: \mathbb{R}^{D_X} \to \mathbb{R}^{D_X}$ plus additive Gaussian noise $U_t \sim \mathsf{normal}(0, Q_t)$ such that $X_{t+1} = F_t(X_t) + U_t$ will suffice to meet the non-degeneracy requirement providing $Q_t$ is strictly positive definite, though other forms of state transition distribution would also be fine providing they are non-degenerate. Having additive Gaussian state noise (plus observations which linearly depend on state) does have some further advantages in terms of allowing use of more efficient particle filter algorithms, but this is a of less importance.

From a statistical perspective, the additive Gaussian noise model is sometimes motivated as corresponding to time discretisation of a stochastic partial differential equation (SPDE) of the form

$$\mathrm{d}X(s, \tau) = A(X(s, \tau), \tau) \mathrm{d}\tau + \kappa(s) \ast \mathrm{d}W(s, \tau)$$

where $s$ is the spatial coordinate (vector), $\tau$ the (continuous) time coordinate, $A$ defines a deterministic 'drift' component of the dynamics, $W$ is a spatiotemporal white noise process and $\kappa$ is a spatial kernel defining spatial correlation structure of the state noise (with $\ast$ convolution in space).

In terms of the spectral coefficients $\tilde{X}(k, \tau)$ with $k$ the (vector) wave number, we have something like

$$\mathrm{d}\tilde{X}(k, \tau) = \tilde{A}(\tilde{X}(k, \tau), \tau) \mathrm{d}\tau + \tilde{\kappa}(k) \mathrm{d}\tilde{W}(k, \tau)$$

where $\tilde{A}(\tilde{X}(k, \tau), \tau) = S(A(S^{-1}(\tilde{X}(k, \tau)), \tau))$, and $\tilde{\kappa}$ and $\tilde{W}$ are the kernel and white noise process in spectral space respectively.

A state update $\tilde{X}_{t+1} = F_t(\tilde{X}_t) + U_t, U_t \sim \mathsf{normal}(0, Q_t)$ can then (somewhat heuristically) be thought of as applying an operator splitting type approach with $F_t$ corresonding to solving $\mathrm{d}\tilde{X}(k, \tau) = \tilde{A}(X(k, \tau), \tau) \mathrm{d}\tau$ over time interval correponding to $t \to t + 1$ with some appropriate numerical method and $U_t$ corresponding to a realisation of the solution of the linear SPDE $\mathrm{d}\tilde{X}(k, \tau) = \tilde{\kappa}(k) \mathrm{d}\tilde{W}(k, \tau)$ over the time interval corresponding to $t \to t + 1$.

For the stochastically pertubed parameterization tendencies scheme you describe,

$$\partial_t \tilde{X} = S \left(D\left(S^{-1}(\tilde{X})\right) + (1 + r) P\left(S^{-1}(\tilde{X})\right)\right) \mathrm{d}t$$

by linearity of $S$ and assuming a convolution theorem $S(\tilde{Y}\tilde{Z}) = S(\tilde{Y}) \ast S(\tilde{Z})$ this is equivalent to

$$\partial_t \tilde{X} = S \left(D\left(S^{-1}(\tilde{X})\right) + P\left(S^{-1}(\tilde{X})\right)\right) + S\left(P\left(S^{-1}(\tilde{X})\right) \right) \ast S\left(r\right)$$

What's not entirely clear in this formulation is how to interpret the $r$ term - for example is it continuous in time?

I think this would however be roughly analagous to the SPDE

$$\mathrm{d}\tilde{X} = S(D(S^{-1}(\tilde{X}) + P(S^{-1}(\tilde{X})) \mathrm{d}\tau + \sigma S(P(S^{-1}(\tilde{X})) \ast \mathrm{d}\tilde{W}$$

where $\tilde{W}$ is a white noise process in spectral spance and $\sigma$ is a scalar chosen such that over a chosen time interval something akin to the condition

physics are perturbed with $r \in [-1, 1]$ to preserve the sign of $P(X)$ and make the tendencies with perturbation at most 2x of the unperturbed tendency.

is maintained with high probability.

The main difference from the additive noise model would then be that the state noise is scaled multiplicatively.

@milankl
Copy link

milankl commented Dec 3, 2024

more intuitively there isn't any direction in the state space where there is no noise introduced

So the idea of SPPT is to introduce noise where the physics P are non-zero (regardless of the dynamics D which are considered much more certain). You can have regions/times where/when P is close to zero (e.g. nighttime where the radiative cooling to space and the surface fluxes cancel) so you wouldn't perturb in that situation. But maybe these situations are in practice rare/irrelevant for the particle filter?

how to interpret the r term - for example is it continuous in time?

the random $r$ is usually a spectral AR1 process with autocorrelation in time (6h typically) and space (500km). It needs to have some autocorrelation to actually hit the model hard and conistent enough in a given direction and not to end up with random numbers cancelling each other in the time stepping (we don't use Euler forward...). A snapshot of r looks like

image

which has a std of 1/3 and is clamped into [-1, 1] to prevent outliers from swapping the sign of the physics P. Given we create that AR1 process in spectral space it's aware of the spherical domain, hence the length scales here seem to be stretched out towards the poles due to the lon-lat projection here.

The main difference from the additive noise model would then be that the state noise is scaled multiplicatively.

Which should be a much better way of doing things. From an atmospheric modelling perspective it doesn't really make sense to hit the model state with additive noise: Why would you hit an area with very certain and smooth temperatures suddenly with +3K? (or whatever your amplitude)

I've implemented SPPT into SpeedyWeather now as outlined here SpeedyWeather/SpeedyWeather.jl#629

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants