Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks hashing for duplicates detection #5

Open
kxepal opened this issue Jul 30, 2017 · 0 comments
Open

Tasks hashing for duplicates detection #5

kxepal opened this issue Jul 30, 2017 · 0 comments
Milestone

Comments

@kxepal
Copy link
Member

kxepal commented Jul 30, 2017

Since we operate with plain data, we can easily detect duplicate tasks across multiple DAGs. For instance, several of our DAGs may have the same sensor which awaits the same partition of the same table. This means that we can run not several sensor processes, but only one if our scheduler could support deduplication feature. Quite good opportinuty to save some slots.

Which deduplication feature is a lot of scheduler one, scheduler itself couldn't implement it without additional help. And that help we should provide for it.

@kxepal kxepal added this to the 1.2 milestone Jul 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant