Skip to content

Commit

Permalink
corrected the error
Browse files Browse the repository at this point in the history
  • Loading branch information
venom1204 committed Dec 29, 2024
1 parent b3efcdb commit 99e7898
Show file tree
Hide file tree
Showing 12 changed files with 61 additions and 56 deletions.
2 changes: 1 addition & 1 deletion NEWS.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -1072,7 +1072,7 @@
query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](https://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](https://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO.
2. In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](https://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r).
for more info about internal.selfref Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information.
for more info about internal.selfref.
3. If `fread` returns a data error (such as no closing quote on a quoted field) it now closes the file first rather than holding a lock open, a Windows only problem.
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -744,7 +744,7 @@ rowwiseDT(
27. `as.data.frame(DT)`, `setDF(DT)` and `as.list(DT)` now remove the `"index"` attribute which contains any indices (a.k.a. secondary keys), as they already did for other `data.table`-only attributes such as the primary key stored in the `"sorted"` attribute. When indices were left intact, a subsequent subset, assign, or reorder of the `data.frame` by `data.frame`-code in base R or other packages would not update the indices, causing incorrect results if then converted back to `data.table`, [#4889](https://github.com/Rdatatable/data.table/issues/4889). Thanks @OfekShilon for the report and the PR.
28. `dplyr::arrange(DT)` uses `vctrs::vec_slice` which retains `data.table`'s class but uses C to bypass `[` method dispatch and does not adjust `data.table`'s attributes containing the index row numbers, [#5042](https://github.com/Rdatatable/data.table/issues/5042). `data.table`'s long-standing `.internal.selfref` mechanism to detect such operations by other packages was not being checked by `data.table` when using indexes, causing `data.table` filters and joins to use invalid indexes and return incorrect results after a `dplyr::arrange(DT)`. Thanks to @Waldi73 for reporting; @avimallu, @tlapak, @MichaelChirico, @jangorecki and @hadley for investigating and suggestions; and @mattdowle for the PR. The intended way to use `data.table` is `data.table::setkey(DT, col1, col2, ...)` which reorders `DT` by reference in parallel, sets the primary key for automatic use by subsequent `data.table` queries, and permits rowname-like usage such as `DT["foo",]` which returns the now-contiguous-in-memory block of rows where the first column of `DT`'s key contains `"foo"`. Multi-column-rownames (i.e. a primary key of more than one column) can be looked up using `DT[.("foo",20210728L), ]`. Using `==` in `i` is also optimized to use the key or indices, if you prefer using column names explicitly and `==`. An alternative to `setkey(DT)` is returning a new ordered result using `DT[order(col1, col2, ...), ]`.
Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information.
29. A segfault occurred when `nrow/throttle < nthread`, [#5077](https://github.com/Rdatatable/data.table/issues/5077). With the default throttle of 1024 rows (see `?setDTthreads`), at least 64 threads would be needed to trigger the segfault since there needed to be more than 65,535 rows too. It occurred on a server with 256 logical cores where `data.table` uses 128 threads by default. Thanks to Bennet Becker for reporting, debugging at C level, and fixing. It also occurred when the throttle was increased so as to use fewer threads; e.g. at the limit `setDTthreads(throttle=nrow(DT))`.
Expand Down
4 changes: 2 additions & 2 deletions man/assign.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,12 @@ Since \code{[.data.table} incurs overhead to check the existence and type of arg
\value{
\code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}).
}
\seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}},\code{\link{internal.selfref}}
\seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}},\code{\link{.internal.selfref}}
}
\examples{
DT = data.table(a = LETTERS[c(3L,1:3)], b = 4:7)
DT[, c := 8] # add a numeric column, 8 for all rows
DT[, d := 9L] # add an integer column, 9L for all rows\code{\link{.Last.updated}}
DT[, d := 9L] # add an integer column, 9L for all rows
DT[, c := NULL] # remove column c
DT[2, d := -8L] # subassign by reference to d; 2nd row is -8L now
DT # DT changed by reference
Expand Down
2 changes: 1 addition & 1 deletion man/copy.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ A \code{copy()} may be required when doing \code{dt_names = names(DT)}. Due to R
Returns a copy of the object.
}
\seealso{
\code{\link{data.table}}, \code{\link{address}}, \code{\link{setkey}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}} \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setattr}}, \code{\link{setnames}},\code{\link{internal.selfref}}
\code{\link{data.table}}, \code{\link{address}}, \code{\link{setkey}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{.internal.selfref}}
}
\examples{
# Type 'example(copy)' to run these at prompt and browse output
Expand Down
3 changes: 1 addition & 2 deletions man/data.table-class.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,8 @@

\author{ Steve Lianoglou }
\seealso{
\code{\link{data.table}},\code{\link{internal.selfref}}
\code{\link{data.table}}, \code{\link{tables}}, \code{\link{J}}, \code{\link[base:order]{sort.list}}, \code{\link{copy}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{chorder}}, \code{\link{setNumericRounding}}, \code{\link{.internal.selfref}}
}

\examples{
## Used in inheritance.
setClass('SuperDataTable', contains='data.table')
Expand Down
4 changes: 3 additions & 1 deletion man/datatable-optimize.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,9 @@ Auto indexing can be switched off with the global option
\code{options(datatable.auto.index = FALSE)}. To switch off using existing
indices set global option \code{options(datatable.use.index = FALSE)}.
}
\seealso{ \code{\link{setNumericRounding}}, \code{\link{getNumericRounding}},\code{\link{internal.selfref}} }
\seealso{
\code{\link{setNumericRounding}}, \code{\link{getNumericRounding}}, \code{\link{.internal.selfref}}
}
\examples{
\dontrun{
old = options(datatable.optimize = Inf)
Expand Down
44 changes: 44 additions & 0 deletions man/internal.selfref.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
\name{.internal.selfref}
\alias{.internal.selfref}
\title{Internal Self-Reference Attribute in data.table}
\description{
The \code{.internal.selfref} attribute is an internal mechanism used by \code{data.table} to optimize memory management and performance. It acts as a pointer that allows \code{data.table} objects to reference their own memory location. While the \code{.internal.selfref} attribute may appear to always point to \code{NULL} when inspected directly, this is a result of its implementation in R's memory management system. The true significance of this attribute lies in its role in supporting reference semantics, which enables efficient in-place modification of \code{data.table} objects without unnecessary copying.
The \code{.internal.selfref} attribute is deliberately structured so that \code{identical()} checks return \code{TRUE} for two \code{data.table} objects with identical contents, even when their attributes point to the same memory address. This behavior is achieved by storing the actual self-reference pointer in the \code{prot} part of an external pointer, wrapped in another external pointer to avoid creating visible reference loops. When a \code{data.table} is duplicated, its memory address changes, making it possible to detect the copy and handle it accordingly.
}
\details{
The \code{.internal.selfref} attribute is a pointer that ensures that \code{data.table} objects can be modified by reference without redundant memory allocation. This avoids copying when performing in-place modifications such as adding or updating columns, filtering rows, or performing joins.
Key details about the \code{.internal.selfref} attribute:
\itemize{
\item \code{p=NULL} is used instead of \code{R_NilValue}, allowing \code{data.table} to detect objects loaded from disk and ensure correct behavior.
\item Wrapping the self-reference in another external pointer prevents infinite loops during \code{object.size} calculations.
\item If the attribute is removed or corrupted, the next operation involving \code{:=} triggers a warning and creates a new self-reference after copying.
}
The \code{_selfrefok} function verifies the validity of the \code{.internal.selfref} attribute. It checks whether the attribute correctly references the current \code{data.table} object by comparing memory addresses. If the attribute is invalidated (e.g., due to duplication or corruption), \code{_selfrefok} triggers a repair mechanism to restore reference semantics, ensuring that in-place operations remain efficient.
}
\value{
The \code{.internal.selfref} attribute is an internal implementation detail and does not produce a value that users would typically interact with. It is invisible during regular \code{data.table} operations.
}
\seealso{
\code{\link{data.table}}, \code{\link{setkey}}, \code{\link{merge}}, \code{\link{[.data.table}}
}
\examples{
# Create a data.table
dt <- data.table(A = 1:5, B = letters[1:5])
# Trace memory to check for reference semantics
tracemem(dt) # Outputs the memory address of the data.table
# Perform an in-place operation
dt[, C := A * 2] # Add a new column in place
# Verify no copying has occurred
# (The output of tracemem should show no memory change)
# Example of losing .internal.selfref (hypothetical, for illustration)
dt_copy <- copy(dt) # Copy the data.table
.Internal(inspect(dt_copy)) # Shows .internal.selfref attribute no longer matches
}
\keyword{internal}
37 changes: 0 additions & 37 deletions man/internal.selfref.rd

This file was deleted.

3 changes: 2 additions & 1 deletion man/setDT.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ setDT(x, keep.rownames=FALSE, key=NULL, check.names=FALSE)
The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., \code{setDT(X)[, sum(B), by=A]}. If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See \code{?copy}.
}
\seealso{ \code{\link{data.table}}, \code{\link{as.data.table}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}},\code{\link{internal.selfref}}
\seealso{
\code{\link[base]{transform}}, \code{\link[base:with]{within}}, \code{\link{:=}}, \code{\link{.internal.selfref}}
}
\examples{
Expand Down
7 changes: 2 additions & 5 deletions man/setkey.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,8 @@ reference.
\url{https://cran.r-project.org/package=bit64}\cr
\url{https://github.com/Rdatatable/data.table/wiki/Presentations}
}
\seealso{ \code{\link{data.table}}, \code{\link{tables}}, \code{\link{J}},
\code{\link[base:order]{sort.list}}, \code{\link{copy}}, \code{\link{setDT}},
\code{\link{setDF}}, \code{\link{set}} \code{\link{:=}}, \code{\link{setorder}},
\code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}},
\code{\link{chorder}}, \code{\link{setNumericRounding}},\code{\link{internal.selfref}}
\seealso{
\code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setNumericRounding}}, \code{\link{.internal.selfref}}
}
\examples{
# Type 'example(setkey)' to run these at the prompt and browse output
Expand Down
5 changes: 1 addition & 4 deletions man/setorder.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,7 @@ If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See
\url{https://medium.com/basecs/getting-to-the-root-of-sorting-with-radix-sort-f8e9240d4224}
}
\seealso{
\code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}},
\code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setDT}},
\code{\link{setDF}}, \code{\link{copy}}, \code{\link{setNumericRounding}},
\code{\link{internal.selfref}}
\code{\link{data.table}}, \code{\link{as.data.table}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{.internal.selfref}}
}
\examples{
Expand Down
4 changes: 3 additions & 1 deletion man/transform.data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ columns that appear in \ldots) are not in the key of the data.table.
\value{
The modified value of a copy of \code{data}.
}
\seealso{ \code{\link[base]{transform}}, \code{\link[base:with]{within}} and \code{\link{:=}},\code{\link{internal.selfref}} }
\seealso{
\code{\link{data.table}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{.internal.selfref}}
}
\examples{
DT <- data.table(a=rep(1:3, each=2), b=1:6)

Expand Down

0 comments on commit 99e7898

Please sign in to comment.