separating interfaces #1

topepo · 2018-08-14T14:29:41Z

Would you be open to a PR for refactoring rpart intorpart.formula and rpart.default methods? I think I may have mentioned this back in January at the Rstudio conference.

The text was updated successfully, but these errors were encountered:

bethatkinson · 2018-08-14T18:44:59Z

I’ll discuss this with Terry and get back to you. Beth From: Max Kuhn [mailto:[email protected]] Sent: Tuesday, August 14, 2018 9:30 AM To: bethatkinson/rpart Cc: Subscribed Subject: [EXTERNAL] [bethatkinson/rpart] separating interfaces (#1) Would you be open to a PR for refactoring rpart intorpart.formula and rpart.default methods? I think I may have mentioned this back in January at the Rstudio conference. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#1>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AK0Ddw9xNO7UJRbW43nFHAdt0Ac-0mUkks5uQt7jgaJpZM4V8jKA>.

topepo · 2018-08-16T14:30:19Z

Thanks. Just to be clear, my goals would be to avoid process the formula calculations (especially if a suitable data set consistent with rpart.matrix's output exists). For bagging and some other ensembles, a lot of the final model footprint contains redundant terms and formula-related components.

I can see that the change from function to generic for rpart might be problematic. Exposing rpart.matrix and internal functions below that call would allow us to reproduce rpart internals. That might be a good fallback if changing rpart to be an S3 wouldn't work (but this approach is not preferable for obvious reasons).

bethatkinson · 2021-10-06T21:13:14Z

Beth and Max,

This is exactly what is done by the division between coxph and coxph.fit, survreg and survreg.fit, etc in the survival package. That has been a successful strategy to allow others access to the computations without using formulas. I prefer doing it this way rather than creating a rpart method along with rpart.formula and rpart.default since a. the method doesn't buy us anything more and b. the rpart.fit strategy works better wrt documentation and .Rd files. It is easier for us, and more importantly easier for the user, who won't need the "rpart:::" secret handshake to find the routines.

So yes, I agree we should go ahead. The primary decisions will be exactly where to split it (how little or much should the formula part do), and how many of the error checks to replicate. The coxph.fit routine assumes for instance that the data is good, e.g. X and y match in dimension, as it will normally be called by a front end routine that has taken care of such checks.

Terry Therneau

I think that would be fine.

I would suggest exporting rpart.matrix and having rpart.fit start right after that.

My reasoning is that people should only have to call model.matrix (contained in rpart.matrix) once so that that potential inefficacy (for large $p$) might be avoided.

However, I’m not the expert here and there are some terms and similar objects that are needed.

Let me know if I can help.

Thanks,

Max

bethatkinson mentioned this issue Sep 3, 2020

rpart has undefined behavior when a predictor has many categories. #22

Open

bethatkinson added the enhancement New feature or request label Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

separating interfaces #1

separating interfaces #1

topepo commented Aug 14, 2018

bethatkinson commented Aug 14, 2018 via email

topepo commented Aug 16, 2018

bethatkinson commented Oct 6, 2021

separating interfaces #1

separating interfaces #1

Comments

topepo commented Aug 14, 2018

bethatkinson commented Aug 14, 2018 via email

topepo commented Aug 16, 2018

bethatkinson commented Oct 6, 2021