Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] add a tree plotting function #6729
base: master
Are you sure you want to change the base?
[R-package] add a tree plotting function #6729
Changes from 2 commits
6862821
0a7ea0e
5206b11
757dc84
55aba68
85ff97a
b4b648a
ed62441
2710705
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify this, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please make this 1-based, as that's a direction we eventually want to move in the package: #4970 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally convinced about this idea... it should be possible to recover the feature names from the model directly.
But before you remove this... can you please expand this doc and add examples and tests showing what this would look like? Right now, it's hard for me to understand what the content of
rules
is supposed to be.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how
min_data = 1L
is related (growing a deeper tree makes the resulting plot more interesting). But I think we can safely removemetric = "l2"
(that will be the default for theregression
objective) and any customization of the learning rate (since here we're only interested in showing the structure of one tree).Let's simplify this, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to repeat in a comment here the same information that's already in the roxygen comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any situation where it would be ok for
model
ortree
to beNULL
, can you?If not, let's please require callers to provide values explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the patterns used elsewhere in the library for this:
LightGBM/R-package/R/lgb.restore_handle.R
Lines 42 to 44 in 83c0ff3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use the name
dt
. That is a function in the{stats}
package (for finding the density of a t-distribution)... try?dt
to see that.Shadowing names from the standard library can lead to confusing errors. Please use
modelDT
as the name for thisdata.table
instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify this error message so that it has enough information for someone to quickly debug the issue, like the provided value of
tree
and the number of trees in the model. And please combine it with the other check that the value is `>=01.Something like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this... what's the purpose of setting all rows to
0.0
and then immediately overwriting them? It seems to me that the0.0
could probably be removed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some comments to make it a bit easier to understand what's happening in this wall of code? It's very difficult to read (at least for me) as currently written).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please avoid re-defining internal helper functions every time
lgb.plot.tree()
is called. This is a little bit expensive, and makes the code harder to read and develop.Please move this up near the top of the file, and give it a name beginning with a
.
to clarify that it's internaly-only, like.zero_present
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my previous comment, please move this up out of the definition of
lgb.plot.tree()
and give it a name beginning with a.
, and without any other inner.
, like.levels_to_names
.Avoiding the inner dots is useful to reduce the risk of that function accidentally being interpreted as an S3 method in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this project, we prefer having an explicit
return()
statement in every function... to make the intention clearer and to avoid accidentally returning data unintentionally. See #3352 for some background.Please add an explicit return statement to every function you're defining here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this project, by convention we:
%>%
operatorPlease update this code and all the other code you're adding to follow that. Keeping all of the code looking the same across the codebase helps us to develop and review changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xgboost
's implementation of similar functionality might be useful as a reference. See https://github.com/dmlc/xgboost/blob/e988b7cf1515b08ad0f949c26beb043ce0b33fe8/R-package/R/xgb.plot.tree.R#L159-L181Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add tests for the other types of machine learning tasks LightGBM can be used for:
num_classes
trees produced per iteration)And for the following model situations:
These are all cases that could affect the code as written... for example, categorical features have different splitting rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of some suggested changes, please apply other changes following from it and to other examples and tests.
lgb.train()
should setnum_threads = .LGB_MAX_THREADS
inparams
, to avoid using too many CPUs on the CRAN check machines (see [R-package] limit number of threads used in tests and examples (fixes #5987) #5988 for background)lightgbm
functions should set verbosity to.LGB_VERBOSITY
to allow globally controlling the amount of log messages produced across all tests (see https://github.com/microsoft/LightGBM/blob/master/R-package/README.md#running-the-tests)params
is small and only being used once in this test code, just define it inlineparams
which are necessary for the test to be effective (e.g., no need to setlearning_rate
to a non-default value)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove all these uses of a validation set? This feature is about plotting the trained model, and you are not using early stopping, so all of this work to create validation sets is unnecessary.
Keeping the tests and examples as small and simple as possible makes the code easier to read / develop, and makes it clearer how test cases differ from each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my comments on the docs... I strongly suspect we could just use default parameters here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it was included accidentally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every use of
expect_error()
here, please check for the specific error you are expecting, like this:https://github.com/microsoft/LightGBM/blob/83c0ff3de1925b0e2d4831a9ccb6ffc196aa795b/R-package/tests/testthat/test_lgb.importance.R#L33-35
That way, the test will be able to catch the case where some other unexpected issue causes this code path to fail.