Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separating interfaces #1

Open
topepo opened this issue Aug 14, 2018 · 3 comments
Open

separating interfaces #1

topepo opened this issue Aug 14, 2018 · 3 comments
Labels
enhancement New feature or request

Comments

@topepo
Copy link

topepo commented Aug 14, 2018

Would you be open to a PR for refactoring rpart intorpart.formula and rpart.default methods? I think I may have mentioned this back in January at the Rstudio conference.

@bethatkinson
Copy link
Owner

bethatkinson commented Aug 14, 2018 via email

@topepo
Copy link
Author

topepo commented Aug 16, 2018

Thanks. Just to be clear, my goals would be to avoid process the formula calculations (especially if a suitable data set consistent with rpart.matrix's output exists). For bagging and some other ensembles, a lot of the final model footprint contains redundant terms and formula-related components.

I can see that the change from function to generic for rpart might be problematic. Exposing rpart.matrix and internal functions below that call would allow us to reproduce rpart internals. That might be a good fallback if changing rpart to be an S3 wouldn't work (but this approach is not preferable for obvious reasons).

@bethatkinson
Copy link
Owner

Beth and Max,

This is exactly what is done by the division between coxph and coxph.fit, survreg and survreg.fit, etc in the survival package. That has been a successful strategy to allow others access to the computations without using formulas. I prefer doing it this way rather than creating a rpart method along with rpart.formula and rpart.default since a. the method doesn't buy us anything more and b. the rpart.fit strategy works better wrt documentation and .Rd files. It is easier for us, and more importantly easier for the user, who won't need the "rpart:::" secret handshake to find the routines.

So yes, I agree we should go ahead. The primary decisions will be exactly where to split it (how little or much should the formula part do), and how many of the error checks to replicate. The coxph.fit routine assumes for instance that the data is good, e.g. X and y match in dimension, as it will normally be called by a front end routine that has taken care of such checks.

Terry Therneau


I think that would be fine.

I would suggest exporting rpart.matrix and having rpart.fit start right after that.

My reasoning is that people should only have to call model.matrix (contained in rpart.matrix) once so that that potential inefficacy (for large $p$) might be avoided.

However, I’m not the expert here and there are some terms and similar objects that are needed.

Let me know if I can help.

Thanks,

Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants