-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduction of experiments #7
Comments
Could you provide some more details? What were the per-task results that you got? Did you use the quality filter that filters for length, numeric ratio, etc? Did you preprocess the data into chunks? |
Ah, just found a typo that was introduced when fixing the
Could you try running the resampling step again? |
Hi, I have preprocessed the data by running |
Actually, I believe your work is reasonable and I have been following it for a long time. I find your algorithms are totally different between your 'v1' and 'v3' released in the arXiv. However, I am puzzled by the fact that the reported results in Table 4 of 'v1' version and the results in Table 3 of 'v2' version are identical. |
Did you try doing the resampling again after your first post on this issue? Basically, this line was mistakenly moved above the for loop, and this made it so that the selection-by-domain did not work (with the typo, all the indices for each domain were the same). This affects the experiment since we treat the wikipedia and books domains differently. Regarding the different arxiv versions - the algorithm has stayed the same across all the versions. Any differences would be due to clarification or improvement of the presentation. |
Hi, thanks very much.
|
Thank you for clarifying my confusion. Are you saying that you use the token distributions to compute the weights in 'v1' rather than learning two generative models as 'v1' suggests? |
BTW, I am also confused about the different results of Top-k selection and resample selection. In my experiments, the performance of resample selection often falls between the performances of Top-k selection and random selection. However, the opposite is reported in the paper. |
When you print
Generative models are just models of the data distribution - bag-of-words ("token distributions") is a simple generative model. I suppose the recent "generative AI" stuff has made it seem like generative = transformers/GPT/diffusion models.
To clarify, by top-k here you mean to not perturb the importance weights with Gumbel noise before taking top-k? I've run the resampling a couple times before and haven't seen this, but I can take a look when I get a chance soon. |
Thank you very much. Yes. The number is matching 1745766302. And the top-k means to not perturb the importance weights with Gumbel noise. I'm excited to see the further experiments. |
Hi,
We follow the training pipeline in
experimental
to replicate the DSIR results. However, our average performance reached only 81.05, significantly below the reported benchmark of 82.30. Are there any additional techniques or optimizations that we might have overlooked?The text was updated successfully, but these errors were encountered: