Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization #18

Open
ShadenSmith opened this issue Dec 20, 2017 · 0 comments
Open

Parallelization #18

ShadenSmith opened this issue Dec 20, 2017 · 0 comments

Comments

@ShadenSmith
Copy link
Contributor

tensor_parser.builder.build_tensor() should really be parallelized. Parallelism comes from either parsing multiple CSV files at once, or from beginning with a split of the data and then parallelizing over splits.

  • Index maps can be constructed in parallel as long as the process-local sets are unioned and counts are summed.
  • Tensor non-zeros can similarly be done in parallel as long as the various tensor files are concatenated.
  • Sorting is already parallel.

Merging duplicates will take some more thought, as duplicates could cross partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant