-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object Oriented Design: which pattern(s)? #2
Comments
The time derivative abstract function could be not well designed. I open another discussion here |
Well, I will go with this pattern. |
That’s exciting! Of course, the key thing to be concerned about is performance. I saw that some performance data has been posted, but I haven’t scrutinized it. The key thing will be to compare FOODIE cods to existing open-source procedural codes that implement the same algorithms. Parallelism might be extremely important in such a comparison. If most of the open-source codes to which a user might compare you are serial codes, then I would at least hope FOODIE would be faster when it is allowed to utilize all available cores. Damian |
Hi Damian, you are right. Milan and me (at the same time and independently) also realized the same conclusion. I have already done the test you proposed for OpenMP analysis (CAF one will gollow the same approach). Essentially, I have compared 4 different codes implementing the identical algortihm (LSRK(5,4)) for solving Euler 1D PDE problem:
The procedural no-FOODIE codes have not operators overloaded and the ODE scheme is hard-coded without any abstraction for the Euler problem (the integrate procedure is here a simple type bound procedure if the Euler type). The serial compiled codes are used for computing the scaling (strong and weak). The analysis of this benchmark reveals that the scaling if essentially the same for both FOODIE/no-FOODIE codes, but when the size of the problem increases (namely the space resolution) the FOODIE version scales better! What is missing is to compare the absolute values of computing time, but the scaling seems very good! I guess that having parallelized also the operators FOODIE-aware version can scale better while in the procedural no-FOODIE code the only main stages/time integration loop is parallelized (inside there are operations of sum and multiplication of consrrvative U) thus resulting into a bigger loop that maybe is less scalabe than more simple operation like single sum/multiplication. This is only a speculatiin, the conclusions will be your and I hope of @francescosalvadore . See you soon. |
Fascinating. It has long been known that it is easier to reason about parallelism in functional programming. I often make the point that an expression composed of side-effect-free operators can be evaluated asynchronously because one need never worry that the possibility of one image evaluating one operator on the given image's portion of the data could modify the data that a second image needs to evaluate another operator (because it is either ahead of or behind the first image in its evaluation) on the second image's portion of the data. Of course, the one place where this is not true is in an assignment so there is some need for coordination between images at the point of assignment, which probably necessitates defined assignments. Does FOODIE use defined assignments? In the Burgers solver that is in the code archive for my book (which has evolved some since the publication of the book), each image synchronizes with its nearest neighbors at the end of each assignment. Another even more asynchronous approach will be available once we have Fortran 2015 "events." Damian |
Everywhere, in any solvers the defined assignment between integrands is exploited (as well as defined operators). However, the above my considerations could be false... re-reading the codes for answering to you, I just see that in the procedural code some loops are not correctly parallelized (bad me!), thus maybe the comparison is not correct. Later today I will correct the codes and repeat the analysis. Sorry. |
All, It turns out my contact is fine with me attributing the input directly to him (I hadn't asked if I could do so previously) so I'm adding the latest input and will mentioning that these are from Prof. Philip Sharp of The University of Auckland in New Zealand (see https://www.math.auckland.ac.nz/people/psha003). The thesis mentioned below is available at http://www.davidketcheson.info/assets/papers/ketchesonphdthesis.pdf. An excerpt of Philip's comments with one edits in square brackets, is as follows: ------ snip ------ I have attached a thesis that gives some examples of pdes where methods of higher order than the standard methods are used. However, higher order methods are usually no higher than order four and they have special properties. I believe the DP45 pair is of limited use when solving pdes, especially on large pde problems because the system of odes is typically stiff. For stiff odes you need an implicit methods such as an implicit Runge-Kutta method. Are the developers mainly interested in producing OO code for applications in fluid dynamics, or are they interested in producing OO ode integrators and then applying then to pdes? If the former, I suggest they restrict themselves to methods of order two and concentrate on getting an efficient OO implementation. If the latter, I suggest they implement one of the higher order schemes such as in the attached thesis. I can see from reading the email to you from the developers that they are excited about their work. The accuracy of DP45 and other variable-stepsize methods can be measured in the way they suggest but of far greater importance is the efficiency of the methods. Efficiency is measured by the amount of work required to achieve a prescribed error. For explicit methods, the amount of work is usually measured by the number of derivative evaluations or the CPU time. The error is usually the end-point global error or the maximum global error. The end-point global error is the difference between the numerical solution and a reference solution. The reference solution can either be a very accurate numerical solution or the true solution. The size of the error is the norm of the error. The maximum global error in the maximum of the norm of the difference between the numerical solution and a reference solution across a prescribed interval. For example, you might be integrating from t = 0 to t = 100. If y(t) is the reference solution and y_n(t) is the numerical solution, the maximum global error is max_{0 <= t <= 100} ||y_n(t)-y(t)|| This maximum [is] often estimated by sampling ||y(t)-y_n(t)|| at N values of t on [0,100]. ----- snip --------- Damian |
Firstly, please give to Prof. Philip Sharp my humblest apologies if my previous comments were disrespectful.
I follow David Ketchson (@ketch) work since 2010, his work greatly drives mine.
As I aforementioned, in solving PDEs (like the ones we are interested on) explicit and implicit (for stiff equations) RK schemes are typical xxRK(5,4) that is ubiquitous. Generally, the other sources of errors discourage the use of more accurate (thus expensive) schemes. Moreover, in general the time step is limited by CFL-like conditions, thus dynamical change the time step size by means of embedded RK is often not so useful (more computations that could be not useful). Our workflow is to employ efficient explicit RK for non-stiff problems (or for quasi-stiff like where there extremely fast transient phenomena, namely very small physical time scales, could affect solution at asymptotic regime, due to the non-linearity) and backward formulae (with efficient LU decomposition) for stiff ones. It is difficult (I am not aware of any) to see a Navier-Stokes solution of a fully appended moving bodies to employ schemes higher than 2nd/3rd formal order accuracy.
FOODIE (I hope) should have two main aims:
I hope to be comprehensive and I am inspired by https://github.com/ketch/nodepy, but with the efficiency of Fortran. We want cutting-edge solvers (highest accuracy, stability,...), but we want also PDE-tailored (efficient and not over-engineered) solvers. We think that we are going into the right direction because your ADT is very powerful. The roadmap I am following is: a) implement solvers with clearness and conciseness in mind, b) only then make them efficient.
I am happy to devote my lunch breaks to FOODIE, this workflow (small KISS libraries) I hope will help me to refactor OFF (or whatever my new CFD code will be named).
Sure, I am conscious of that.
This hint is very appreciated (as all the above ones): I was concerning if the standard approach to measure the accuracy order of adaptive time step schemes could be based on the end-point approximation. Happy to read that it could be reasonable. Let me say thank to both for your great help, these kind of discussions let me more conscious. My best regards. |
I have add into the manuscript a preliminary analysis of DPRK(5,4) errors, computing the global errors as indicated by Prof. Sharp: they are in very well agreement to the expectations (almost close to the theoretical 4th order accuracy). If we compared with an equivalent accurate solver, say LSRK(5,4) or SSPRK(5,4) we found that the time steps consumed for achieving almost the same error (about 10^-6) is 3 times lower: mean Dt about 300 for DPRK, whereas SSPRK needs Dt=100. Nice. |
Just trying to catch up. Here is a comment, that I'm sure everyone, @szaghi etc. is aware of, but that I just wanted to point out: With multi-step methods it's also important to factor in the cost in CPU time or expensive RHS evaluations for multi-step methods... if Dt is 3 times lower, but each time step is 3 times more work, then you haven't accomplished anything... Since all of these algorithm's are RK(5,4) the expense per time-step is probably quite comparable, and my observation irrelevant, but I haven't look at the algorithms in detail yet. |
@zbeekman yes, absolutely right, when chosing a scheme you should consider the cost of the residual function (rhs). However, let us assume we know how much it cost, the selection between a multi-step or a multi-stage should be done on the basis of also other criterions. For example, I prefer the more expensive RK family (that involves many RHS evaluations per time step) over an equivalent accurate multi-step solver (that involves only 1 RHS evaluation per time step) just because I like to vary the time step size at each time step (that is also possible with multi-step methods but with more efforts). |
@zbeekman I realize now that your comment is more strictly related to the analysis of the embedded RK and not on the comparison of multi-step/stage families, I am sorry. I can confirm that the cost of DPRK i ìs comparable to the equivalent RK (they have the same number of evaluations of RHS). |
It seems that ACP is not an anti-pattern, Damian's idea seems to work very well. I am closing this issue, just because I guess you agree to continue with the current pattern. I will reopen it if necessary. |
👏 😄 |
Hi all,
I have started this new exciting (for me) project.
I started studying the work of Jacob. Then, after reading the Rouson's book, I decided to try a more formal approach.
Presently I have implemented the Abstract Calculus Pattern providing also the same example of Rouson's book the integration of Lorenz equations by means of explicit one step first order Euler scheme.
The pattern is very simple
the abstract integrand
This define the abstract methods necessary for build a FOODiE integrator.
a FOODiE integrator
Defines in very high level language (by means of operators overloading) the ODE integration scheme
e.g. the explicit Euler is simply
client side code
The user must extend the abstract integrand defining his/her own equations system. In the case of Lorenz equation it is simply
More details here
First considerations
The approach seems work well. For the lorenz test I have added the generation of a plot of the solution by means of the great pyplot-fortran of Jacob
My main concerns are now related to the efficiency of the approach, but this question will be answered in the future (or maybe @rouson can replay having deeply used such approaches into HPC codes).
The text was updated successfully, but these errors were encountered: