Start replacing `TRUELENGTH` markers with a hash #6694

aitap · 2024-12-26T15:52:14Z

With apologies to Matt Dowle, who had poured a lot of effort into making data.table go fast.

Ongoing work towards #6180. Unfortunately, doesn't completely remove any uses of non-API entry points by itself. Detailed motivation here in a pending blog post. Can't start implementing stretchy ALTREP vectors until data.table stops using TRUELENGTH to mark them.

Currently implemented:

An open-addressing hash table with Knuth's linear multiplication hashing
- Deliberately no removal operation, only insertion and update
- Can be made more performant (collisions much less likely) by switching to tables of power-of-two length and double hashing
Almost all uses of TRUELENGTH to mark CHARSXPs or columns replaced with a hash

Needs more work:

the uses in rbindlist() and forder() pre-allocate memory for the worst-case usage
- may be noticeably bad for cache performance
- a dynamically growing hash table would take even more CPU time and cache misses and maybe system calls to resize
forder.c, the last remaining user of savetl
- SET_TRUELENGTH is atomic, hash_set is not, will need additional care in multi-threaded environment
savetl machinery in assign.c

Let's just see how much worse is the performance going to get.

codecov · 2024-12-26T15:59:05Z

Codecov Report

Attention: Patch coverage is 98.85714% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.60%. Comparing base (9875906) to head (d7a9a17).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
src/hash.c	98.13%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6694      +/-   ##
==========================================
- Coverage   98.61%   98.60%   -0.02%     
==========================================
  Files          79       80       +1     
  Lines       14558    14595      +37     
==========================================
+ Hits        14357    14392      +35     
- Misses        201      203       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-12-26T16:07:30Z

Generated via commit d7a9a17

Download link for the artifact containing the test results: ↓ atime-results.zip

Task	Duration
R setup and installing dependencies	4 minutes and 43 seconds
Installing different package versions	8 minutes and 12 seconds
Running and plotting the test cases	2 minutes and 30 seconds

Also avoid crashing when creating a 0-size hash.

This may likely require a dynamically growing hash of TRUELENGTHs instead of the current pre-allocation approach with a very conservative over-estimate.

The hash needs O(n) memory (actually 2*n/load_factor entries) which isn't great.

tdhock · 2024-12-29T16:38:51Z

hi, thanks for this. Can you please propose one or two performance test cases that you think may be adversely affected by these changes? Is it when we create a table with one column, and then use := or set to add a lot more columns?

aitap · 2024-12-29T17:27:17Z

The rbindlist() code pre-allocates O(sum(lengths(lapply(l, names))) memory. In the best case, only O(length(names(l[[1]]))) is needed, length(l) times less. In the worst case, all names are different and the code needs all that memory for fill = TRUE.

The forder() code pre-allocates O(nrow(x)) memory.

I'll try giving many data.tables to rbindlist() and long data.tables with string columns to setorder().

SebKrantz · 2024-12-30T12:50:30Z

Perhaps you can also try a fast 3rd party hash map: https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-overview/

I particular Google's Abseil hash is pretty fast: https://abseil.io/docs/cpp/guides/container https://abseil.io/docs/cpp/guides/hash

aitap · 2024-12-31T10:28:36Z

It's pretty bad. For typical cases, the current hash table eats >1 order of magnitude more R* memory, and it's similarly slower in forder():

The hash table is only on par by time in the worst case for forder() (all strings different) and might be on par in the worst case for rbindlist(fill = TRUE) (all column names different), but the latter is completely unrealistic as a use case.

# may need 16G of RAM to run comfortably due to pathological memory allocation patterns
library(atime)

pkg.path <- '.'
limit <- 1
# taken from .ci/atime/tests.R
pkg.edit.fun <- function(old.Package, new.Package, sha, new.pkg.path) {
  pkg_find_replace <- function(glob, FIND, REPLACE) {
    atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
  }
  Package_regex <- gsub(".", "_?", old.Package, fixed = TRUE)
  Package_ <- gsub(".", "_", old.Package, fixed = TRUE)
  new.Package_ <- paste0(Package_, "_", sha)
  pkg_find_replace(
    "DESCRIPTION",
    paste0("Package:\\s+", old.Package),
    paste("Package:", new.Package))
  pkg_find_replace(
    file.path("src", "Makevars.*in"),
    Package_regex,
    new.Package_)
  pkg_find_replace(
    file.path("R", "onLoad.R"),
    Package_regex,
    new.Package_)
  pkg_find_replace(
    file.path("R", "onLoad.R"),
    sprintf('packageVersion\\("%s"\\)', old.Package),
    sprintf('packageVersion\\("%s"\\)', new.Package))
  pkg_find_replace(
    file.path("src", "init.c"),
    paste0("R_init_", Package_regex),
    paste0("R_init_", gsub("[.]", "_", new.Package_)))
  pkg_find_replace(
    "NAMESPACE",
    sprintf('useDynLib\\("?%s"?', Package_regex),
    paste0('useDynLib(', new.Package_))
}

versions <- c(
  master   = '70c64ac08c6becae5847cd59ab1efcb4c46437ac',
  truehash = '24e81785669e70caac31501bf4424ba14dbc90f9'
)

N <- 10^seq(2, 8.5, .25)
# expected case: a few distinct strings
forderv1_work <- lapply(setNames(nm = N), \(N)
  sample(letters, N, TRUE)
)
forderv1 <- atime_versions(
  pkg.path, N,
  expr = data.table:::forderv(forderv1_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(forderv1_work); gc(full = TRUE)

# worst case: all strings different
# (a challenge for the allocator too due to many small immovable objects)
N <- 10^seq(2, 7.5, .25)
forderv2_work <- lapply(setNames(nm = N), \(N)
  format(runif(N), digits = 16)
)
forderv2 <- atime_versions(
  pkg.path, N,
  expr = data.table:::forderv(forderv2_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(forderv2_work); gc(full = TRUE)

# expected case: all columns named the same
N <- 10^seq(1, 6.5, .25) # number of data.tables in the list
k <- 10 # number of columns per data.table
rbindlist1_work <- lapply(setNames(nm = N), \(N)
  rep(list(setNames(as.list(1:k), letters[1:k])), N)
)
rbindlist1 <- atime_versions(
  pkg.path, N,
  expr = data.table::rbindlist(rbindlist1_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(rbindlist1_work); gc(full = TRUE)

# worst case: all columns different
N <- 10^seq(1, 5.5, .25) # number of data.tables in the list
k <- 10 # number of columns per data.table
rbindlist2_work <- lapply(setNames(nm = N), \(N)
  replicate(N, setNames(as.list(1:k), format(runif(k), digits = 16)), FALSE)
)
rbindlist2 <- atime_versions(
  pkg.path, N,
  expr = data.table::rbindlist(rbindlist2_work[[as.character(N)]], fill = TRUE),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(rbindlist2_work); gc(full = TRUE)

save(forderv1, forderv2, rbindlist1, rbindlist2, file = 'times.rda')

* Edit: Some of the memory use in data.table is invisible to Rprofmem(). For forder(), data.table is said to allocate as much memory as one extra column, i.e. 8 bytes per CHARSXP element on a 64-bit computer. The hash table preallocates enough space for 2*N (original + UTF-8) cells at 50% load factor (so twice more), with each cell consisting of a SEXP pointer and an R_xlen_t mark, so 2 * 2 * 16 = 64 bytes more per element, in addition to normal memory use. Measuring forder()'s memory use as getrusage() + ru_maxrss and subtracting the value before the function call gives coef(lm(forderuse_kb*1024 ~ N:version, x))[-1] as N:versionmaster = 8.993967 and N:versiontruehash = 72.178404, which is close to the theory.

I'll try profiling the code.

Thanks @SebKrantz for the link, a newer benchmark by the same author is also very instructive.

tdhock · 2024-12-31T19:50:05Z

thanks for proposing the performance test cases and sharing the atime benchmark graphs. I agree that we should try to avoid an order of magnitude constant factor increase in time/memory usage.

In forder() and rbindlist(), there is no good upper boundary on the number of elements in the hash known ahead of time. Grow the hash table dynamically. Since the R/W locks are far too slow and OpenMP atomics are too limited, rely on strategically placed flushes, which isn't really a solution.

aitap · 2025-01-01T18:16:10Z

Since profiling has shown that a noticeable amount of time is wasted initialising the giant pre-allocated hash tables, I was able to make the slowdown factor closer to 2 by dynamically re-allocating the hash table:

The memory use is significantly reduced (except for the worst cases), but cannot be measured with atime (just like other uses of calloc/malloc/realloc in data.table are invisible to Rprofmem).

library(atime)

pkg.path <- '.'
limit <- 1
# taken from .ci/atime/tests.R
pkg.edit.fun <- function(old.Package, new.Package, sha, new.pkg.path) {
  pkg_find_replace <- function(glob, FIND, REPLACE) {
    atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
  }
  Package_regex <- gsub(".", "_?", old.Package, fixed = TRUE)
  Package_ <- gsub(".", "_", old.Package, fixed = TRUE)
  new.Package_ <- paste0(Package_, "_", sha)
  pkg_find_replace(
    "DESCRIPTION",
    paste0("Package:\\s+", old.Package),
    paste("Package:", new.Package))
  pkg_find_replace(
    file.path("src", "Makevars.*in"),
    Package_regex,
    new.Package_)
  pkg_find_replace(
    file.path("R", "onLoad.R"),
    Package_regex,
    new.Package_)
  pkg_find_replace(
    file.path("R", "onLoad.R"),
    sprintf('packageVersion\\("%s"\\)', old.Package),
    sprintf('packageVersion\\("%s"\\)', new.Package))
  pkg_find_replace(
    file.path("src", "init.c"),
    paste0("R_init_", Package_regex),
    paste0("R_init_", gsub("[.]", "_", new.Package_)))
  pkg_find_replace(
    "NAMESPACE",
    sprintf('useDynLib\\("?%s"?', Package_regex),
    paste0('useDynLib(', new.Package_))
}

versions <- c(
  master   = '70c64ac08c6becae5847cd59ab1efcb4c46437ac',
  static_hash = '24e81785669e70caac31501bf4424ba14dbc90f9',
  dynamic_hash = 'd7a9a1707ec94ec4f2bd86a5dfb5609207029ba4'
)

N <- 10^seq(2, 7.5, .25)
# expected case: a few distinct strings
forderv1_work <- lapply(setNames(nm = N), \(N)
  sample(letters, N, TRUE)
)
forderv1 <- atime_versions(
  pkg.path, N,
  expr = data.table:::forderv(forderv1_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(forderv1_work); gc(full = TRUE)

# worst case: all strings different
# (a challenge for the allocator too due to many small immovable objects)
N <- 10^seq(2, 7.5, .25)
forderv2_work <- lapply(setNames(nm = N), \(N)
  format(runif(N), digits = 16)
)
forderv2 <- atime_versions(
  pkg.path, N,
  expr = data.table:::forderv(forderv2_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(forderv2_work); gc(full = TRUE)

# expected case: all columns named the same
N <- 10^seq(1, 5.5, .25) # number of data.tables in the list
k <- 10 # number of columns per data.table
rbindlist1_work <- lapply(setNames(nm = N), \(N)
  rep(list(setNames(as.list(1:k), letters[1:k])), N)
)
rbindlist1 <- atime_versions(
  pkg.path, N,
  expr = data.table::rbindlist(rbindlist1_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(rbindlist1_work); gc(full = TRUE)

# worst case: all columns different
N <- 10^seq(1, 4.5, .25) # number of data.tables in the list
k <- 10 # number of columns per data.table
rbindlist2_work <- lapply(setNames(nm = N), \(N)
  replicate(N, setNames(as.list(1:k), format(runif(k), digits = 16)), FALSE)
)
rbindlist2 <- atime_versions(
  pkg.path, N,
  expr = data.table::rbindlist(rbindlist2_work[[as.character(N)]], fill = TRUE),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(rbindlist2_work); gc(full = TRUE)

#save(forderv1, forderv2, rbindlist1, rbindlist2, file = 'times.rda')

The main problem with the current approach is that since the parallel loop in src/forder.c:range_str(...) needs to read the hash table from outside the critical section while hash elements are being set inside the critical section, it's possible that dhash_set() will need to reallocate the table while dhash_lookup() is using it. I have already tried using pthread R/W locks and OpenMP critical sections; they were too slow. I wasn't able to use OpenMP atomics to make atomic reads/writes through a pointer (maybe that's asking for too much). Any third-party hash table will also need to be outfitted with appropriate locking.

The current code keeps one previous hash table until the next reallocation cycle and hopes that dhash_enlarge writes the new table pointer into the object roughly at the same time as the metadata, so that when dhash_lookup reads the object, it will hopefully receive a consistent view of the object. This is not good enough for production because (1) the race window still exists and (2) there's no proof that there won't be a stale reader still using the previous table array when the current writer deallocates it. What can be done with it:

Don't deallocate the previous hash tables until the function is completely done with them. This will double the memory use (because the code doubles the buffer when re-allocating it and $\sum_{i=0}^{n} 2^i = 2^{n+1} - 1$). We'll be able to switch back from malloc to R_alloc and make the allocations visible to R's allocator.
Devise a read-copy-update scheme (see also: 1 2) to properly expire the previous versions of the hash table. These may require synchronisation primitives not easily available through OpenMP. Reference-counting the hash tables may result in contention and slowdowns similar to pthread R/W locks.

SebKrantz · 2025-01-01T20:23:48Z

In case it helps, {collapse}'s hash functions (https://github.com/SebKrantz/collapse/blob/master/src/kit_dup.c and https://github.com/SebKrantz/collapse/blob/master/src/match.c) are pretty fast as well - inspired by base R -> multiplication hash using unsigned integer prime number. It's bloody fast but requires a large table. But Calloc() is quite efficient. Anyway, would be great if you'd test the Google Hash function, curious to see it it can do much better.

PS: you can test collapse::fmatch() against your current attempts rewriting chmatch().

aitap · 2025-01-08T21:41:44Z

The abseil hash function is very slightly slower in my tests, although the difference may be not significant. Perhaps that's because my C port fails to inline some of the things that naturally inline in the original C++ with templates. I can try harder, but that's a lot of extra code to bring in properly.

library(atime)

pkg.path <- '.'
limit <- 1
# taken from .ci/atime/tests.R
pkg.edit.fun <- function(old.Package, new.Package, sha, new.pkg.path) {
  pkg_find_replace <- function(glob, FIND, REPLACE) {
    atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
  }
  Package_regex <- gsub(".", "_?", old.Package, fixed = TRUE)
  Package_ <- gsub(".", "_", old.Package, fixed = TRUE)
  new.Package_ <- paste0(Package_, "_", sha)
  pkg_find_replace(
    "DESCRIPTION",
    paste0("Package:\\s+", old.Package),
    paste("Package:", new.Package))
  pkg_find_replace(
    file.path("src", "Makevars.*in"),
    Package_regex,
    new.Package_)
  pkg_find_replace(
    file.path("R", "onLoad.R"),
    Package_regex,
    new.Package_)
  pkg_find_replace(
    file.path("R", "onLoad.R"),
    sprintf('packageVersion\\("%s"\\)', old.Package),
    sprintf('packageVersion\\("%s"\\)', new.Package))
  pkg_find_replace(
    file.path("src", "init.c"),
    paste0("R_init_", Package_regex),
    paste0("R_init_", gsub("[.]", "_", new.Package_)))
  pkg_find_replace(
    "NAMESPACE",
    sprintf('useDynLib\\("?%s"?', Package_regex),
    paste0('useDynLib(', new.Package_))
}

versions <- c(
  master = '70c64ac08c6becae5847cd59ab1efcb4c46437ac',
  'Knuth_hash' = 'd7a9a1707ec94ec4f2bd86a5dfb5609207029ba4',
  'abseil_hash' = '159e1d48926b72af9f212b8c645a8bc8ab6b20be'
)

N <- 10^seq(2, 7.5, .25)
# expected case: a few distinct strings
forderv1_work <- lapply(setNames(nm = N), \(N)
  sample(letters, N, TRUE)
)
forderv1 <- atime_versions(
  pkg.path, N,
  expr = data.table:::forderv(forderv1_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(forderv1_work); gc(full = TRUE)

# worst case: all strings different
# (a challenge for the allocator too due to many small immovable objects)
N <- 10^seq(2, 7.5, .25)
forderv2_work <- lapply(setNames(nm = N), \(N)
  format(runif(N), digits = 16)
)
forderv2 <- atime_versions(
  pkg.path, N,
  expr = data.table:::forderv(forderv2_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(forderv2_work); gc(full = TRUE)

# expected case: all columns named the same
N <- 10^seq(1, 5.5, .25) # number of data.tables in the list
k <- 10 # number of columns per data.table
rbindlist1_work <- lapply(setNames(nm = N), \(N)
  rep(list(setNames(as.list(1:k), letters[1:k])), N)
)
rbindlist1 <- atime_versions(
  pkg.path, N,
  expr = data.table::rbindlist(rbindlist1_work[[as.character(N)]]),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(rbindlist1_work); gc(full = TRUE)

# worst case: all columns different
N <- 10^seq(1, 4.5, .25) # number of data.tables in the list
k <- 10 # number of columns per data.table
rbindlist2_work <- lapply(setNames(nm = N), \(N)
  replicate(N, setNames(as.list(1:k), format(runif(k), digits = 16)), FALSE)
)
rbindlist2 <- atime_versions(
  pkg.path, N,
  expr = data.table::rbindlist(rbindlist2_work[[as.character(N)]], fill = TRUE),
  sha.vec = versions, seconds.limit = limit, verbose = TRUE,
  pkg.edit.fun = pkg.edit.fun
)
rm(rbindlist2_work); gc(full = TRUE)

save(forderv1, forderv2, rbindlist1, rbindlist2, file = 'times.rda')

collapse::fmatch() works very well, faster than data.table::chmatch() even in its current form:

library(atime)
library(data.table)

limit <- 1

# assumes that atime_versions() had pre-installed the packages
# master = '70c64ac08c6becae5847cd59ab1efcb4c46437ac',
# 'Knuth_hash' = 'd7a9a1707ec94ec4f2bd86a5dfb5609207029ba4',

N <- 10^seq(2, 7.5, .25)
# expected case: a few distinct strings
chmatch_work1 <- lapply(setNames(nm = N), \(N)
  sample(letters, N, TRUE)
)
chmatch1 <- atime(
  N,
  seconds.limit = limit, verbose = TRUE,
  master = data.table.70c64ac08c6becae5847cd59ab1efcb4c46437ac::chmatch(chmatch_work1[[as.character(N)]], letters),
  Knuth_hash = data.table.d7a9a1707ec94ec4f2bd86a5dfb5609207029ba4::chmatch(chmatch_work1[[as.character(N)]], letters),
  collapse = collapse::fmatch(chmatch_work1[[as.character(N)]], letters)
)
rm(chmatch_work1); gc(full = TRUE)

save(chmatch1, file = 'times_collapse.rda')

And the real memory cost isn't even that large:

library(atime)
library(data.table)
# assumes that atime_versions() had pre-installed the packages
# master = '70c64ac08c6becae5847cd59ab1efcb4c46437ac',
# 'Knuth_hash' = 'd7a9a1707ec94ec4f2bd86a5dfb5609207029ba4',
library(data.table.70c64ac08c6becae5847cd59ab1efcb4c46437ac)
library(data.table.d7a9a1707ec94ec4f2bd86a5dfb5609207029ba4)
library(parallel)

# only tested on a recent Linux system
# measures the _maximal_ amount of memory in kB used by the current process
writeLines('
#include <sys/resource.h>
void maxrss(double * kb) {
  struct rusage ru;
  int ret = getrusage(RUSAGE_SELF, &ru);
  *kb = ret ? -1 : ru.ru_maxrss;
}
', 'maxrss.c')
tools::Rcmd('SHLIB maxrss.c')
dyn.load(paste0('maxrss', .Platform$dynlib.ext))

limit <- 1

N <- 10^seq(2, 7.5, .25)
# expected case: a few distinct strings
chmatch_work1 <- lapply(setNames(nm = N), \(N)
  sample(letters, N, TRUE)
)
versions <- expression(
  master = data.table.70c64ac08c6becae5847cd59ab1efcb4c46437ac::chmatch(chmatch_work1[[as.character(N)]], letters),
  Knuth_hash = data.table.d7a9a1707ec94ec4f2bd86a5dfb5609207029ba4::chmatch(chmatch_work1[[as.character(N)]], letters),
  collapse = collapse::fmatch(chmatch_work1[[as.character(N)]], letters)
)

plan <- expand.grid(N = N, version = names(versions))
chmatch1 <- lapply(seq_len(nrow(plan)), \(i) {
  # use a disposable child process
  mccollect(mcparallel({
    eval(versions[[plan$version[[i]]]], list(N = plan$N[[i]]))
    .C('maxrss', kb = double(1))$kb
  }))
})

rm(chmatch_work1); gc(full = TRUE)

save(chmatch1, file = 'times_collapse.rda')
chmatch1p <- lattice::xyplot(
  maxrss_kb ~ N, cbind(plan, maxrss_kb = unlist(chmatch1)), group = version,
  auto.key = TRUE, scales = list(log = 10),
  par.settings=list(superpose.symbol=list(pch=19))
)

SebKrantz · 2025-01-08T22:20:59Z

Nice! Thanks. I always wondered about this tradeoff between the size of the table and the quality of the hash function. Looks like speed + large table still wins. Anyway, if you want to adopt, feel free to copy it under your MPL license. Just mention me in the top of the file and as a contributor.

SebKrantz · 2025-01-08T22:51:00Z

PS: I believe it also depends on the size of the table in R. letters is small. I believe for much larger tables chmatch() should be faster. Would be interesting to also check those hash functions.

tdhock · 2025-01-09T14:38:48Z

excellent work thank you very much

aitap marked this pull request as ready for review December 26, 2024 15:52

aitap requested review from HughParsonage, ben-schwen, tdhock and MichaelChirico as code owners December 26, 2024 15:52

aitap added 11 commits December 29, 2024 19:06

Implement the hash table

5782a7a

memrecycle(): replace TRUELENGTH marks with a hash

3846cc9

rbindlist(): replace 1/2 TRUELENGTH with hashing

b700bf6

Also avoid crashing when creating a 0-size hash.

rbindlist(): replace 2/2 TRUELENGTH with hashing

86511ab

This may likely require a dynamically growing hash of TRUELENGTHs instead of the current pre-allocation approach with a very conservative over-estimate.

chmatchMain(): replace TRUELENGTH marks with hash

486dd7a

copySharedColumns(): hash instead of TRUELENGTH

f26d043

combineFactorLevels(): hash instead of TRUELENGTH

cae6b02

anySpecialStatic(): hash instead of TRUELENGTH

a09f39b

forder(): hash instead of TRUELENGTH

962b272

The hash needs O(n) memory (actually 2*n/load_factor entries) which isn't great.

Remove savetl()

22981ad

Add codecov suppressions

24e8178

aitap force-pushed the truehash branch from 566a49e to 24e8178 Compare December 29, 2024 16:07

aitap mentioned this pull request Dec 29, 2024

Growable ALTREP vectors #6697

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start replacing `TRUELENGTH` markers with a hash #6694

Start replacing `TRUELENGTH` markers with a hash #6694

aitap commented Dec 26, 2024 •

edited

Loading

codecov bot commented Dec 26, 2024 •

edited

Loading

github-actions bot commented Dec 26, 2024 •

edited

Loading

tdhock commented Dec 29, 2024

aitap commented Dec 29, 2024

SebKrantz commented Dec 30, 2024

aitap commented Dec 31, 2024 •

edited

Loading

tdhock commented Dec 31, 2024

aitap commented Jan 1, 2025

SebKrantz commented Jan 1, 2025 •

edited

Loading

aitap commented Jan 8, 2025

SebKrantz commented Jan 8, 2025

SebKrantz commented Jan 8, 2025

tdhock commented Jan 9, 2025

Start replacing TRUELENGTH markers with a hash #6694

Are you sure you want to change the base?

Start replacing TRUELENGTH markers with a hash #6694

Conversation

aitap commented Dec 26, 2024 • edited Loading

codecov bot commented Dec 26, 2024 • edited Loading

Codecov Report

github-actions bot commented Dec 26, 2024 • edited Loading

tdhock commented Dec 29, 2024

aitap commented Dec 29, 2024

SebKrantz commented Dec 30, 2024

aitap commented Dec 31, 2024 • edited Loading

tdhock commented Dec 31, 2024

aitap commented Jan 1, 2025

SebKrantz commented Jan 1, 2025 • edited Loading

aitap commented Jan 8, 2025

SebKrantz commented Jan 8, 2025

SebKrantz commented Jan 8, 2025

tdhock commented Jan 9, 2025

Start replacing `TRUELENGTH` markers with a hash #6694

Start replacing `TRUELENGTH` markers with a hash #6694

aitap commented Dec 26, 2024 •

edited

Loading

codecov bot commented Dec 26, 2024 •

edited

Loading

github-actions bot commented Dec 26, 2024 •

edited

Loading

aitap commented Dec 31, 2024 •

edited

Loading

SebKrantz commented Jan 1, 2025 •

edited

Loading