-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large array conversion #30
Comments
@FeatureMan2 Unfortunately, this is a Java limitation, because in Java arrays can have at most 2^31-1 elements and are indexed by signed integers. Java does not support You could split the dataset and then Put simply, Java is not a language designed for data analysis, especially not large data, so you may have more luck reading the data directly from R instead if you can. |
Thanks for your insights. I think I'll use your fallback method with |
I tried the I split the large array into chunks then reassembled it in R with So then I switched to using the Renjin engine to export the Java dataset to their embedded R engine, then storing it to an I'm not quite clear what is producing these errors. I thought it was RAM usage but then I can run a <- rep(pi, 2^31-1)
b <- rep(pi, 2^31-1)
c <- c(a,b) using up >64GB of Rserve memory... |
Can you post your actual code? We can't really help you based on the anecdotes above. |
The code is long and unfortunately I'm not able to share it. It essentially fails at this point REngine/Rserve/RConnection.java Line 286 in ba09e3b
0.9.2 under R v3.6, but get same errors under 2.1.1 under R 4.2) where ro.isOK() returns false . I'm simply assigning a string to a variable here. I don't think it matters what statement I execute, it's just that it falls over irrespective. I cannot manually execute any statement in debug mode after such an error.
This error comes after loading a R Note that this same code works just fine on smaller datasets, say if I have 5 million elements... I was getting similar types of errors when doing the splitting arrays into smaller chunk and concatenating with |
It looks as if something goes wrong with the previous statement. It could also be that R simply runs out of memory. First, simply run the debug version of Rserve (either set |
Our problem was solved by splitting the datasets Java-> R using the suggested I would suggest making the |
Thanks. Since this is a Java limitation different applications may use different solutions to it depending on the use-case.
|
Hi,
I'm facing the issue that when converting Java data to the R representation, my datasets are too large and it fails.
Specifically, it fails here https://github.com/s-u/REngine/blame/master/Rserve/protocol/REXPFactory.java#L476 as
cont.asDoubles().length
is 700K which x8 is about 5 billion, which is larger than the max int integer value of 2 billion. Shouldn't we be using long instead of int throughout this package to support larger datasets?Thanks for any thoughts on this.
The text was updated successfully, but these errors were encountered: