Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setDT() should gain envir= argument #6754

Open
MichaelChirico opened this issue Jan 23, 2025 · 3 comments
Open

setDT() should gain envir= argument #6754

MichaelChirico opened this issue Jan 23, 2025 · 3 comments

Comments

@MichaelChirico
Copy link
Member

Currently, setDT() spends some effort trying to suss out where the input x lives so that it can overwrite that value with the newly-created data.table if need be:

data.table/R/data.table.R

Lines 2893 to 2898 in f72e46b

home = function(x, env) {
if (identical(env, emptyenv()))
stopf("Cannot find symbol %s", cname)
else if (exists(x, env, inherits=FALSE)) env
else home(x, parent.env(env))
}

(this is only part of the logic)

It would be more robust (especially in package code) for the caller to provide precisely in which argument they'd like to convert an object to data.table.

That would help, for example, with #6735 (if we also plan to retain the .shallow() behavior of #6551).

@Divendra2006
Copy link

Hi @MichaelChirico ,

In response to the post you raised, regarding setDT(), the dependence on home() when trying to get the environment of the input object, I agree that this approach would tend to be weak and inefficient, particularly in nested environments or package workflows. I think there needs to be an alternative approach to solve this.

What I propose here is,

1- Explicit Environment Passing:

Make setDT() take an argument env, which defaults to parent.frame(), so that users can explicitly mention the environment the input object resides in, for example:

setDT(x, env = parent.frame())

2- Direct Argument Specification:

Add a target argument in setDT() that makes it possible for users to directly specify what variable or argument needs to be set as data.table. This ensures that there is no ambiguity in the event of existence of multiple objects with the same name from different scopes.

3- Limit home():

If env or target is provided, the home() logic can be completely bypassed. For backward compatibility, home() could remain as a fallback but emit warnings when its recursive search is triggered.

Better Error Handling
If neither env nor target is provided and ambiguity arises, throw a meaningful error or warning to make the behavior transparent.

I believe these changes would make setDT() more robust, efficient, and predictable for users, especially in package development contexts. If you’re open to this approach, I’d be happy to start working on a pull request.

Let me know your thoughts!

@MichaelChirico
Copy link
Member Author

I think this is not an ideal issue for first-time contributors; this part of the code base is particularly sensitive and bug-prone.

@rikivillalba
Copy link
Contributor

Subtly related (and with #6726) note that calls like setDT(((((x))))) will fail, as () call is not detected as a symbol.
A possible solution here:

while(colsub %iscall% "(") colsub = as.list(colsub)[[-1L]]

But... IMHO there is no safe way to detect ALL possible manners of passing a data.frame that can be bound to a symbol somewhere ( i.e. get0, eval(parse(,,'x'))... ), i think the best is to warn whether x is not a symbol, or whether it is not one of the accepted possible ways to specify a data.table that should be bound to a symbol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants