Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sampling with the combination of splitratio and replace = FALSE #48

Open
theo-s opened this issue Jan 20, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@theo-s
Copy link
Contributor

theo-s commented Jan 20, 2023

No description provided.

@edwardwliu
Copy link
Collaborator

edwardwliu commented Mar 1, 2023

Related to sampling, it looks like when applying the doubleBootstrap for honest random forests without groups and folds, we ignore observationWeights and uniformly sample with replacement (see code) for the averaging set. When groups and folds are used, it looks like we sample with replacement according to observationWeights (see code). As a corollary, we also always take a doubleBootstrap when using groups/folds.

  1. Is this understanding correct?
  2. If so, is this intended behavior?

@theo-s
Copy link
Contributor Author

theo-s commented Mar 2, 2023

I believe this understanding is correct.

Right now this is intended behavior, but I could see reason to change this to use the weights in the second bootstrap as well. The only complication would be that the meaning of the weights would change based on the sample that is taken in the first bootstrap (since those observations are removed from sampling and their weights are removed from the total set of weights that would be used in the next step).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants