Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory requirements for a national run #6

Open
dabreegster opened this issue Dec 30, 2021 · 1 comment
Open

Memory requirements for a national run #6

dabreegster opened this issue Dec 30, 2021 · 1 comment

Comments

@dabreegster
Copy link
Collaborator

By reading all of the time-use files, I've counted 42,672,348 people across 21,135,110 households and 6,787 MSOAs. In the current implementation, storing the population takes somewhere between 13 and 18GB, before flows_per_activity are filled out! (I got 18GB when the code was still using HashMaps everywhere, 13 after BTreeMap, but I'll repeat a few times and make sure the memory measurement is consistent.)

I have lots of ideas to squeeze this memory usage down, so we can run on a single machine and without paging stuff in and out of memory (which would hopefully simplify things). I'll record ideas in this issue.

@dabreegster
Copy link
Collaborator Author

dabreegster commented Dec 30, 2021

  • Don't store PersonID and HouseholdID per struct -- it's probably more convenient later on, but we also know this from the position in the vectors.
  • Don't keep orig_pid and sic1d07 per Person after initialization. The former is probably useful for debugging, not sure about the second yet (it's something to do with commuting)
  • age_years doesn't need to be a usize
  • We have lots of BTreeMap<Activity, ...>, and activity is just an enum. So logically, a fixed size array makes way more sense. Will try out the enum_map crate
  • We only keep the top 5 or 10 flows per activity. This has a huge effect on memory usage.
  • The flow data is per (activity, MSOA). But when we copy it to each person, seemingly without any variation per person! If we don't actually need to do this, we'll have a huge win from not repeating it everywhere.
  • IDs are usize, which is 64 bytes each. 2^32 affords us 4,294,967,296 (~4 billion) entries. That covers households, venues, and people easily.

dabreegster referenced this issue in alan-turing-institute/uatk-aspics Dec 30, 2021
Memory usage didn't actually budge -- still 13.52GiB. But this should
certainly help serialized size.
dabreegster referenced this issue in alan-turing-institute/uatk-aspics Dec 30, 2021
Measuring 18.07 GiB now, which is _up_ from the 13.52 earlier. I... do
not trust the memory measurement though.
@dabreegster dabreegster transferred this issue from alan-turing-institute/uatk-aspics Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant