Best practices for large data set backfill #24232
Unanswered
mitchpaulus
asked this question in
Q&A
Replies: 1 comment 3 replies
-
@mitchpaulus What version of InfluxDB are you using? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a large dataset that I am trying to backfill - it is 5-minute interval data, taken for approximately 2,500 sensors, over 6 years. So this is approximately 288 pts/day * 365 days/yr * 6 yrs * 2500 = 1.5 billion records.
I have tried to upload this through the CLI, but have run into issues in which the ./influxd process crashes with no error to stderr after approximately uploading 100 of these files (~60 million records, not rate limited).
Looking through the documentation, I could not find examples or a list of best practices for one-time backfilling of large amounts of data.
I found recommendations for optimizing writes like
Neither of which helped. I don't have a retention policy since we analyze this dataset in various different ways and would like the entire data set to be available.
So general questions that I have that I think would be useful to have answered in a single place would be:
Beta Was this translation helpful? Give feedback.
All reactions