-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unweighting implementation #2083
base: master
Are you sure you want to change the base?
Conversation
Why is this a cli command and not just a vp action? |
@RoyStegeman, I followed the suggestion of @scarlehoff to do it this way. |
pyproject.toml
Outdated
@@ -106,7 +107,7 @@ qed = ["fiatlux"] | |||
nolha = ["pdfflow", "lhapdf-management"] | |||
|
|||
[tool.poetry-dynamic-versioning] | |||
enable = true | |||
enable = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you disable versioning?
(it needs to be reenabled, but i'd like to know the exact problem that led to needing to disable it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this happened because I built NNPDF locally. I reverted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in principle you should be able to install with pip install -e .
(or without -e
) and it should not modify this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange. It did happen in my case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your environment managed by poetry?
import pandas as pd | ||
from scipy.special import xlogy | ||
from typing import Tuple | ||
from tqdm import tqdm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fond of tqdm because it usually makes logs and debugging harder to read.
But if you are set on using it you must add it to the dependencies as well.
parser.add_argument("chi2_name", help = "Add the filename of the chi2 dataset (.csv)") | ||
parser.add_argument("N", help = "Add the amount of experimental datapoints that the chi2 is based on") | ||
args = parser.parse_args() | ||
chi2 = pd.read_csv(args.chi2_name).values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the csv generated? This should be part of the script.
Ideally the script should have two steps
-
Check whether the csv exist and if It doesn't generate it
-
Do the actual computation
In the validphys language these would be two actions, where the second depends on the first, but we can make it into proper validphys actions at the end, it can remain a script for the time being.
For the first action you can use a lot of vp stuff (e.g., functions to compute the chi2, or load pdfs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I could not find a better solution for. The calculation of the chi2 per replica is done in the nnpdf/external/ poljet code and if I would shift the calculation of the chi2 to validphys, I'm afraid I must migrate more functionality. However, I don't think the solution will be more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you are already using validphys in that code.
So a first stage can be a simple port, and then later on instead of doing self.l.check_commondata(data_name)
or API.dataset_inputs_covmat_from_systematics(**inp)
you would have a function that takes as input results
and then according to the definiton of dataset_inputs
in the runcard / input you would get a results
object already populated with the dataset, covariance matrix, statistic / systematic errors, etc.
|
||
return Nps, entropies, Nopt | ||
|
||
def main(chi2, N, store = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add also here some doctrings explaining what it does.
""" | ||
Initialize the Unweight class. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to follow the numpy conventions https://numpydoc.readthedocs.io/en/latest/format.html#parameters at least for new code (and specially for parameters, returns, etc). It helps consistency.
@toonhasenack Just to echo what has been above about the re/un-weighting being self-consistent. Everything except the generation of the theory predictions should be here, ie from the computations of the Also agreed on this being vp actions instead of scripts. If you want, I could take over this PR from this point. |
Calculation of the unweighted weights in the vp framework. Module can be called with
vp-unweight "path to chi2".csv "number of datapoints"