Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Oriented Design: which pattern(s)? #2

Closed
szaghi opened this issue Aug 6, 2015 · 14 comments
Closed

Object Oriented Design: which pattern(s)? #2

szaghi opened this issue Aug 6, 2015 · 14 comments

Comments

@szaghi
Copy link
Member

szaghi commented Aug 6, 2015

Hi all,

I have started this new exciting (for me) project.

I started studying the work of Jacob. Then, after reading the Rouson's book, I decided to try a more formal approach.

Presently I have implemented the Abstract Calculus Pattern providing also the same example of Rouson's book the integration of Lorenz equations by means of explicit one step first order Euler scheme.

The pattern is very simple

the abstract integrand

This define the abstract methods necessary for build a FOODiE integrator.

type, abstract :: integrand
  !< Abstract type for building FOODiE ODE integrators.
  contains
    procedure(time_derivative     ), pass(self), deferred :: t
    procedure(asymmetric_operator ), pass(lhs),  deferred :: multiply
    procedure(symmetric_operator  ), pass(lhs),  deferred :: add
    procedure(symmetric_assignment), pass(lhs),  deferred :: assign
    generic :: operator(+) => add
    generic :: operator(*) => multiply
    generic :: assignment(=) => assign
endtype integrand

a FOODiE integrator

Defines in very high level language (by means of operators overloading) the ODE integration scheme
e.g. the explicit Euler is simply

subroutine euler_explicit(field, dt)
  !---------------------------------------------------------------------------------------------------------------------------------
  !< Integrate model with explicit Euler scheme, 1st order.
  !---------------------------------------------------------------------------------------------------------------------------------
  class(integrand), intent(INOUT) :: field !< Field to be integrated.
  real(R_P),        intent(in)    :: dt    !< Time step.
  !---------------------------------------------------------------------------------------------------------------------------------

  !---------------------------------------------------------------------------------------------------------------------------------
  field = field + field%t()*dt
  return
  !---------------------------------------------------------------------------------------------------------------------------------
  endsubroutine

client side code

The user must extend the abstract integrand defining his/her own equations system. In the case of Lorenz equation it is simply

type, extends(integrand) :: lorenz
  private
  real(R_P), dimension(:), allocatable :: state            !< Solution vector.
  real(R_P)                            :: sigma, rho, beta !< Lorenz parameters.
contains
   procedure, pass(self), public :: t        => dLorenz_dt
   procedure, pass(lhs),  public :: add      => add_lorenz
   procedure, pass(lhs),  public :: multiply => multiply_lorenz
   procedure, pass(lhs),  public :: assign   => assign_lorenz
   procedure, pass(self), public :: output
end type

More details here

First considerations

The approach seems work well. For the lorenz test I have added the generation of a plot of the solution by means of the great pyplot-fortran of Jacob

My main concerns are now related to the efficiency of the approach, but this question will be answered in the future (or maybe @rouson can replay having deeply used such approaches into HPC codes).

@szaghi
Copy link
Member Author

szaghi commented Aug 7, 2015

The time derivative abstract function could be not well designed. I open another discussion here

@szaghi
Copy link
Member Author

szaghi commented Oct 21, 2015

Well,
after some months of tests I could say that the Damian's pattern works exceptionally well.

I will go with this pattern.

@szaghi szaghi closed this as completed Oct 21, 2015
@rouson
Copy link

rouson commented Oct 21, 2015

On Oct 21, 2015, at 9:54 AM, Stefano Zaghi [email protected] wrote:

Well,
after some months of tests I could say that the Damian's pattern works exceptionally well.

I will go with this pattern

That’s exciting! Of course, the key thing to be concerned about is performance. I saw that some performance data has been posted, but I haven’t scrutinized it. The key thing will be to compare FOODIE cods to existing open-source procedural codes that implement the same algorithms. Parallelism might be extremely important in such a comparison. If most of the open-source codes to which a user might compare you are serial codes, then I would at least hope FOODIE would be faster when it is allowed to utilize all available cores.

Damian

@szaghi
Copy link
Member Author

szaghi commented Oct 22, 2015

Hi Damian, you are right. Milan and me (at the same time and independently) also realized the same conclusion. I have already done the test you proposed for OpenMP analysis (CAF one will gollow the same approach). Essentially, I have compared 4 different codes implementing the identical algortihm (LSRK(5,4)) for solving Euler 1D PDE problem:

  • serial FOODIE based code;
  • parallel FOODIE based code;
  • serial procedural code without FOODIE;
  • parallel procedural code without FOODIE;

The procedural no-FOODIE codes have not operators overloaded and the ODE scheme is hard-coded without any abstraction for the Euler problem (the integrate procedure is here a simple type bound procedure if the Euler type).

The serial compiled codes are used for computing the scaling (strong and weak). The analysis of this benchmark reveals that the scaling if essentially the same for both FOODIE/no-FOODIE codes, but when the size of the problem increases (namely the space resolution) the FOODIE version scales better! What is missing is to compare the absolute values of computing time, but the scaling seems very good! I guess that having parallelized also the operators FOODIE-aware version can scale better while in the procedural no-FOODIE code the only main stages/time integration loop is parallelized (inside there are operations of sum and multiplication of consrrvative U) thus resulting into a bigger loop that maybe is less scalabe than more simple operation like single sum/multiplication. This is only a speculatiin, the conclusions will be your and I hope of @francescosalvadore .

See you soon.

@rouson
Copy link

rouson commented Oct 22, 2015

Fascinating. It has long been known that it is easier to reason about parallelism in functional programming. I often make the point that an expression composed of side-effect-free operators can be evaluated asynchronously because one need never worry that the possibility of one image evaluating one operator on the given image's portion of the data could modify the data that a second image needs to evaluate another operator (because it is either ahead of or behind the first image in its evaluation) on the second image's portion of the data.

Of course, the one place where this is not true is in an assignment so there is some need for coordination between images at the point of assignment, which probably necessitates defined assignments. Does FOODIE use defined assignments?

In the Burgers solver that is in the code archive for my book (which has evolved some since the publication of the book), each image synchronizes with its nearest neighbors at the end of each assignment. Another even more asynchronous approach will be available once we have Fortran 2015 "events."

Damian

@szaghi
Copy link
Member Author

szaghi commented Oct 22, 2015

@rouson

Does FOODIE use defined assignments?

Everywhere, in any solvers the defined assignment between integrands is exploited (as well as defined operators).

However, the above my considerations could be false... re-reading the codes for answering to you, I just see that in the procedural code some loops are not correctly parallelized (bad me!), thus maybe the comparison is not correct. Later today I will correct the codes and repeat the analysis. Sorry.

@rouson
Copy link

rouson commented Oct 22, 2015

All,

It turns out my contact is fine with me attributing the input directly to him (I hadn't asked if I could do so previously) so I'm adding the latest input and will mentioning that these are from Prof. Philip Sharp of The University of Auckland in New Zealand (see https://www.math.auckland.ac.nz/people/psha003).

The thesis mentioned below is available at http://www.davidketcheson.info/assets/papers/ketchesonphdthesis.pdf.

An excerpt of Philip's comments with one edits in square brackets, is as follows:

------ snip ------

I have attached a thesis that gives some examples of pdes where methods of higher order than the standard methods are used. However, higher order methods are usually no higher than order four and they have special properties. I believe the DP45 pair is of limited use when solving pdes, especially on large pde problems because the system of odes is typically stiff. For stiff odes you need an implicit methods such as an implicit Runge-Kutta method.

Are the developers mainly interested in producing OO code for applications in fluid dynamics, or are they interested in producing OO ode integrators and then applying then to pdes? If the former, I suggest they restrict themselves to methods of order two and concentrate on getting an efficient OO implementation. If the latter, I suggest they implement one of the higher order schemes such as in the attached thesis.

I can see from reading the email to you from the developers that they are excited about their work. The accuracy of DP45 and other variable-stepsize methods can be measured in the way they suggest but of far greater importance is the efficiency of the methods. Efficiency is measured by the amount of work required to achieve a prescribed error. For explicit methods, the amount of work is usually measured by the number of derivative evaluations or the CPU time. The error is usually the end-point global error or the maximum global error. The end-point global error is the difference between the numerical solution and a reference solution. The reference solution can either be a very accurate numerical solution or the true solution. The size of the error is the norm of the error. The maximum global error in the maximum of the norm of the difference between the numerical solution and a reference solution across a prescribed interval. For example, you might be integrating from t = 0 to t = 100. If y(t) is the reference solution and y_n(t) is the numerical solution, the maximum global error is

max_{0 <= t <= 100} ||y_n(t)-y(t)||

This maximum [is] often estimated by sampling ||y(t)-y_n(t)|| at N values of t on [0,100].

----- snip ---------

Damian

@szaghi
Copy link
Member Author

szaghi commented Oct 22, 2015

@rouson

Firstly, please give to Prof. Philip Sharp my humblest apologies if my previous comments were disrespectful.

The thesis mentioned below is available at http://www.davidketcheson.info/assets/papers/ketchesonphdthesis.pdf.

I follow David Ketchson (@ketch) work since 2010, his work greatly drives mine.

have attached a thesis that gives some examples of pdes where methods of higher order than the standard methods are used. However, higher order methods are usually no higher than order four and they have special properties. I believe the DP45 pair is of limited use when solving pdes, especially on large pde problems because the system of odes is typically stiff. For stiff odes you need an implicit methods such as an implicit Runge-Kutta method.

As I aforementioned, in solving PDEs (like the ones we are interested on) explicit and implicit (for stiff equations) RK schemes are typical xxRK(5,4) that is ubiquitous. Generally, the other sources of errors discourage the use of more accurate (thus expensive) schemes. Moreover, in general the time step is limited by CFL-like conditions, thus dynamical change the time step size by means of embedded RK is often not so useful (more computations that could be not useful). Our workflow is to employ efficient explicit RK for non-stiff problems (or for quasi-stiff like where there extremely fast transient phenomena, namely very small physical time scales, could affect solution at asymptotic regime, due to the non-linearity) and backward formulae (with efficient LU decomposition) for stiff ones. It is difficult (I am not aware of any) to see a Navier-Stokes solution of a fully appended moving bodies to employ schemes higher than 2nd/3rd formal order accuracy.

Are the developers mainly interested in producing OO code for applications in fluid dynamics, or are they interested in producing OO ode integrators and then applying then to pdes? If the former, I suggest they restrict themselves to methods of order two and concentrate on getting an efficient OO implementation. If the latter, I suggest they implement one of the higher order schemes such as in the attached thesis.

FOODIE (I hope) should have two main aims:

  • provide a robust set of ODE solvers ready to be applied to a wide range of different problems, included PDEs, but not limited to;
  • provide a simple framework/environment for the rapid development of new ODE solvers.

I hope to be comprehensive and I am inspired by https://github.com/ketch/nodepy, but with the efficiency of Fortran. We want cutting-edge solvers (highest accuracy, stability,...), but we want also PDE-tailored (efficient and not over-engineered) solvers. We think that we are going into the right direction because your ADT is very powerful. The roadmap I am following is: a) implement solvers with clearness and conciseness in mind, b) only then make them efficient.

I can see from reading the email to you from the developers that they are excited about their work.

I am happy to devote my lunch breaks to FOODIE, this workflow (small KISS libraries) I hope will help me to refactor OFF (or whatever my new CFD code will be named).

Efficiency is measured by the amount of work required to achieve a prescribed error. For explicit methods, the amount of work is usually measured by the number of derivative evaluations or the CPU time.

Sure, I am conscious of that.

The error is usually the end-point global error or the maximum global error. The end-point global error is the difference between the numerical solution and a reference solution. The reference solution can either be a very accurate numerical solution or the true solution. The size of the error is the norm of the error. The maximum global error in the maximum of the norm of the difference between the numerical solution and a reference solution across a prescribed interval. For example...

This hint is very appreciated (as all the above ones): I was concerning if the standard approach to measure the accuracy order of adaptive time step schemes could be based on the end-point approximation. Happy to read that it could be reasonable.

Let me say thank to both for your great help, these kind of discussions let me more conscious.

My best regards.

@szaghi szaghi reopened this Oct 22, 2015
@szaghi
Copy link
Member Author

szaghi commented Oct 22, 2015

@rouson

I have add into the manuscript a preliminary analysis of DPRK(5,4) errors, computing the global errors as indicated by Prof. Sharp: they are in very well agreement to the expectations (almost close to the theoretical 4th order accuracy). If we compared with an equivalent accurate solver, say LSRK(5,4) or SSPRK(5,4) we found that the time steps consumed for achieving almost the same error (about 10^-6) is 3 times lower: mean Dt about 300 for DPRK, whereas SSPRK needs Dt=100. Nice.

@zbeekman
Copy link
Member

Just trying to catch up.

Here is a comment, that I'm sure everyone, @szaghi etc. is aware of, but that I just wanted to point out:

With multi-step methods it's also important to factor in the cost in CPU time or expensive RHS evaluations for multi-step methods... if Dt is 3 times lower, but each time step is 3 times more work, then you haven't accomplished anything... Since all of these algorithm's are RK(5,4) the expense per time-step is probably quite comparable, and my observation irrelevant, but I haven't look at the algorithms in detail yet.

@szaghi
Copy link
Member Author

szaghi commented Nov 17, 2015

@zbeekman yes, absolutely right, when chosing a scheme you should consider the cost of the residual function (rhs). However, let us assume we know how much it cost, the selection between a multi-step or a multi-stage should be done on the basis of also other criterions. For example, I prefer the more expensive RK family (that involves many RHS evaluations per time step) over an equivalent accurate multi-step solver (that involves only 1 RHS evaluation per time step) just because I like to vary the time step size at each time step (that is also possible with multi-step methods but with more efforts).

@szaghi
Copy link
Member Author

szaghi commented Nov 17, 2015

@zbeekman I realize now that your comment is more strictly related to the analysis of the embedded RK and not on the comparison of multi-step/stage families, I am sorry. I can confirm that the cost of DPRK i ìs comparable to the equivalent RK (they have the same number of evaluations of RHS).

@szaghi
Copy link
Member Author

szaghi commented Jan 19, 2016

It seems that ACP is not an anti-pattern, Damian's idea seems to work very well. I am closing this issue, just because I guess you agree to continue with the current pattern. I will reopen it if necessary.

@szaghi szaghi closed this as completed Jan 19, 2016
@rouson
Copy link

rouson commented Jan 19, 2016

👏

😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants