You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing very low acceptance rates when running the RHMC for SU(2) with one adjoint flavour when compared to what I believe are exactly the same run parameters for HiRep. With HiRep this is around 90%, while with the same parameters in Grid it is very close to 0%.
What I've checked:
Same fermion and gauge action (Wilson in both cases)
Same value for beta and bare fermion mass
Same lattice volume
The integrator in both cases is MinimumNorm2 (called O2MN_multistep in HiRep)
MD trajectory length is the same
Same number of MD time steps
Two levels of integration, with 1 multiplier at the top (fermionic) level and 5 multiplier at the bottom (gauge) layer
Same numerical precision (double precision throughout)
Even-odd preconditioning is used in both cases
Boundary conditions are the same
Initial conditions are the same (hot start)
We previously found that using 2/4 as the exponent gives better performance than 1/2; I have tried adjusting Grid/qcd/action/pseudofermion/OneFlavourEvenOddRational.h to this effect but no obvious change in acceptance (or time to trajectory)
I've also tried adjusting the parameters to the rational approximation to increase the order and precision, but this doesn't noticeably affect the acceptance, and does make the update take longer.
The code appears to be simulating the correct theory, as a scan of the phase diagram on a 4^4 lattice very closely reproduces the plot of arXiv:1412.5994
I've tested CPU and GPU builds (with and without MPI for the latter) and see the same issue in both.
Increasing the number of MD steps per trajectory increases the acceptance, but makes each trajectory take correspondingly longer.
Does anyone have any idea what might be going on, and how I could fix it, please?
@LupoA pointed out that there is a normalisation factor of HMC_MOMENTUM_DENOMINATOR that is by default set to 2, while in HiRep this factor is not included. Setting this to 1 (via removing the #define CPS_MD_TIME in Grid/qcd/action/gauge/GaugeImplTypes.h and Grid/qcd/action/scalar/ScalarImpl.h) does not remove the discrepancy.
HiRep multiplies the step size by beta / NG, which as far as I can see isn't done in Grid. Removing this factor further exacerbates the discrepancy.
Some more testing shows that with a thermalised configuration (the same one for both codes), and controlling for all the factors above, the acceptances match much more closely between HiRep and Grid. Additionally, setting --Thermalizations to a number larger than zero (I've been using 20) will immediately overcome this initial barrier and allow the acceptance to stabilise at the same parameters as work for HiRep (which does not do this thermalisation step, as far as I am aware). This raises three possibilities that I can see:
HiRep does some thermalisation that I'm not aware of (although I have searched and haven't found evidence of this)
Grid's integrator behaves differently on very far-from-equilibrium configurations
Grid initialises a hot start differently from HiRep
I'm seeing very low acceptance rates when running the RHMC for SU(2) with one adjoint flavour when compared to what I believe are exactly the same run parameters for HiRep. With HiRep this is around 90%, while with the same parameters in Grid it is very close to 0%.
What I've checked:
MinimumNorm2
(calledO2MN_multistep
in HiRep)Grid/qcd/action/pseudofermion/OneFlavourEvenOddRational.h
to this effect but no obvious change in acceptance (or time to trajectory)I've tested CPU and GPU builds (with and without MPI for the latter) and see the same issue in both.
Increasing the number of MD steps per trajectory increases the acceptance, but makes each trajectory take correspondingly longer.
Does anyone have any idea what might be going on, and how I could fix it, please?
If it's useful, I've attached an example grid.configure.summary, the program I'm running, an example submit script to see the parameters being used, and the equivalent input file used with HiRep.
Many thanks in advance for any advice.
The text was updated successfully, but these errors were encountered: