Origin/reweighting #185

pittlerf · 2019-09-30T20:56:55Z

Hi,

I started to add functionality to able to handle reweighting for correlation functions.
I use 'cfrw_boot' label for correlation functions that have been reweighted.
These correlation function should not be resampled, I removed the 'cf_orig' label from them.
Next thing is to make this consistent with the other functions in hadron.

…tistics by addStat

…for the generated cf objects, and invalidating cf_orig, introduces a new class property cfrw_boot, if this option is on, than resampling the correlation function should not be allowed

R/readutils.R

R/rw.R

man/addStat.cf.Rd

R/rw.R

R/cf.R

… samples included

R/cf.R

urbach · 2021-04-09T10:00:12Z

It's not totally clear to me how the reweighting is supposed to work here? I had thought that a function like bootstrap_and_rw.cf was sufficient, with a cf and reweighting factors as input. How does it work here?

Why is it not allowed to resample a reweighted cf?

urbach · 2021-04-09T10:03:01Z

devtools::check output:

❯ checking examples ... ERROR
  Running examples in ‘hadron-Ex.R’ failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: is_empty.rw
  > ### Title: Checks whether the cf object contains no data
  > ### Aliases: is_empty.rw
  > 
  > ### ** Examples
  > 
  > # The empty rw object must be empty:
  > is_empty.rw(rw())
  Error in is_empty.rw(rw()) : could not find function "is_empty.rw"
  Execution halted

❯ checking examples with --run-donttest ... ERROR
  Running examples in ‘hadron-Ex.R’ failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: is_empty.rw
  > ### Title: Checks whether the cf object contains no data
  > ### Aliases: is_empty.rw
  > 
  > ### ** Examples
  > 
  > # The empty rw object must be empty:
  > is_empty.rw(rw())
  Error in is_empty.rw(rw()) : could not find function "is_empty.rw"
  Execution halted

❯ checking for missing documentation entries ... WARNING
  Undocumented code objects:
    ‘rw_unit’ ‘samplerw’ ‘samplerw_inverse’
  Undocumented data sets:
    ‘samplerw’ ‘samplerw_inverse’
  All user-level objects in a package should have documentation entries.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking Rd \usage sections ... WARNING
  Undocumented arguments in documentation object 'read.rw'
    ‘monomial_id’
  
  Undocumented arguments in documentation object 'rw_orig'
    ‘rw’

  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: ‘rhdf5’

❯ checking DESCRIPTION meta-information ... NOTE
  Package listed in more than one of Depends, Imports, Suggests, Enhances:
    ‘dplyr’
  A package should be listed in only one of these fields.

❯ checking R code for possible problems ... NOTE
  *.rw: no visible binding for global variable ‘cf1’
  *.rw: no visible binding for global variable ‘cf2’
  read.rw: no visible binding for global variable ‘monomialid’
  Undefined global functions or variables:
    cf1 cf2 monomialid

2 errors ✖ | 2 warnings ✖ | 3 notes ✖

urbach · 2021-04-09T13:13:04Z

fixed most of the check problems.

where can I find an example for this? I'm still not convinced all of this is needed...!?

urbach · 2021-04-09T13:16:52Z

this is left:

   read.rw: no visible binding for global variable ‘monomialid’
   Undefined global functions or variables:
     monomialid

which I don't understand yet.

urbach · 2021-04-09T13:17:43Z

also, the data object will mean we can no longer install for R < 3.5.0

     NB: this package now depends on R (>= 3.5.0)
     WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: ‘hadron/data/samplerw.RData’  ‘hadron/data/samplerw_inverse.RData’

pittlerf · 2021-04-09T13:56:30Z

fixed most of the check problems.

where can I find an example for this? I'm still not convinced all of this is needed...!?

Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following):

00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01
00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01
00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01
00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01
00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01

In the PLNG project I got the reweighting factor from Marco, entirely different format.

urbach · 2021-04-09T14:24:06Z

> fixed most of the check problems. > > where can I find an example for this? I'm still not convinced all of this is needed...!? Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following): ## 00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01 00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01 00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01 00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01 00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01 ## In the PLNG project I got the reweighting factor from Marco, entirely different format.

I had in mind an example of the whole thing working?

kostrzewa · 2021-04-09T14:44:09Z

It's not totally clear to me how the reweighting is supposed to work here? I had thought that a function like bootstrap_and_rw.cf was sufficient, with a cf and reweighting factors as input. How does it work here?

Why is it not allowed to resample a reweighted cf?

I understood this to originate from the fact that the normalisation needs to be recomputed (the average of the weights). In other words, the data and the reweighting factors both need to be resampled consistently and separately, such that for each bootstrap resample, the normalisation and the corresponding reweighted data can be generated.

There are of course ways to handle this: reweighted data could be stored unnormalised:

d^{rw}_i = d_i * w_i

which can be resampled any way one wants. However, when the reweighted data (and resampling thereof) is used, the corresponding normalisations need to be available and correctly applied to the central value and bootstrap samples. In other words, w_i need to be resampled too, giving boot.R values for the normalisation factor. The normalisation factor for the central value is of course just sum_i w_i.

Does the above sound reasonable and describe correctly, why one can't "blindly" resample the reweighted data?

kostrzewa · 2021-04-09T15:04:08Z

Does the above sound reasonable and describe correctly, why one can't "blindly" resample the reweighted data?

Let me add another qualifying remark: we also don't deal with just a single reweighting factor, but sequences of factors which move us along in parameter space. For this, some sort of solution was required (such as supporting the multiplication of two sets of reweighting factors to form a third).

urbach · 2021-04-09T20:43:20Z

I understood this to originate from the fact that the normalisation needs to be recomputed (the average of the weights). In other words, the data and the reweighting factors *both* need to be resampled consistently and separately, such that for each bootstrap resample, the normalisation and the corresponding reweighted data can be generated. There are of course ways to handle this: reweighted data could be stored unnormalised: ``` d^{rw}_i = d_i * w_i ``` which can be resampled any way one wants. However, when the reweighted data (and resampling thereof) is used, the corresponding normalisations need to be available and correctly applied to the central value and bootstrap samples. In other words, `w_i` need to be resampled too, giving `boot.R` values for the normalisation factor. The normalisation factor for the central value is of course just `sum_i w_i`. Does the above sound reasonable and describe correctly, why one can't "blindly" resample the reweighted data?

I fully agree here. So it's merely a safety feature that no one resamples "again"? Because you clearly want to resample to compute statistical errors. And you want the object, which is the result of what I'd call `bootstrap_and_reweight` or so, to be a `cf` again, because that is convenient. Like it is for principal correlators, correct?

urbach · 2021-04-09T20:45:17Z

> Does the above sound reasonable and describe correctly, why one can't "blindly" resample the reweighted data? Let me add another qualifying remark: we also don't deal with just a single reweighting factor, but sequences of factors which move us along in parameter space. For this, some sort of solution was required (such as supporting the multiplication of two sets of reweighting factors to form a third).

sure. But reweighting factors are always complex (if not real) valued vectors, aren't they? Of course, I agree, it's better to have the data type properly defined...

urbach · 2021-04-11T10:15:39Z

fixed most of the check problems. > > where can I find an example for this? I'm still not convinced all of this is needed...!? Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following): ## 00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01 00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01 00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01 00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01 00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01 ## In the PLNG project I got the reweighting factor from Marco, entirely different format.
I had in mind an example of the whole thing working?

@pittlerf In other words, is there a rmarkdown file which explains how to use this? Are there some tests?

pittlerf · 2021-04-11T15:26:09Z

fixed most of the check problems. > > where can I find an example for this? I'm still not convinced all of this is needed...!? Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following): ## 00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01 00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01 00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01 00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01 00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01 ## In the PLNG project I got the reweighting factor from Marco, entirely different format.
I had in mind an example of the whole thing working?

@pittlerf In other words, is there a rmarkdown file which explains how to use this? Are there some tests?

Hi @urbach, I uploaded a how-to use cheat sheet in rmarkdown.

urbach · 2021-04-13T09:25:10Z

thanks.
There are still changes requested...

…he gauge configuration indices for the x axis

urbach · 2021-04-27T13:59:55Z

check(cran=TRUE) gives

❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: ‘rhdf5’

❯ checking R code for possible problems ... NOTE
  read.rw: no visible binding for global variable ‘monomialid’
  Undefined global functions or variables:
    monomialid

❯ checking Rd line widths ... NOTE
  Rd file 'rw_orig.Rd':
    \examples lines wider than 100 characters:
       rw_factor <- rw_orig( rw=rw_data, conf.index=seq(1,20), max_value= max(rw_data),stochastic_error=rep(0,20))
  
  These lines will be truncated in the PDF manual.

thanks!

urbach · 2021-04-27T14:02:46Z

The comment on rhdf5 is on my side...

urbach · 2021-05-26T14:37:49Z

hmm?

pittlerf · 2021-05-26T17:26:02Z

hmm?

ah, sorry I will do it now.

… to two rows

pittlerf added 17 commits September 26, 2019 22:50

start working on adding reweighting options for hadron

9932c84

rw_orig added

b0ed038

reading function for reweighting factor

9b67c8b

Correcting errors

b678374

correcting errors

3dc4bfa

adding possibility to reverse correlation functions

c1eb13c

correcting errors

754d287

renaming and error search

4b26f9f

corrected errors

12f0654

after multiplying two reweighting factors, you could not increase sta…

add1d1a

…tistics by addStat

correcting error

32e577a

Implementing reweighting: setting the bootstrap or jackknife samples …

bed0b43

…for the generated cf objects, and invalidating cf_orig, introduces a new class property cfrw_boot, if this option is on, than resampling the correlation function should not be allowed

removing testing print statements

b88a01a

correcting errors

95e7ce7

Performing gauge conf list check in reweighting

7275044

Including it in jackknife as well

c9aa2a5

Typo corrected

a871557