Tasks hashing for duplicates detection #5

kxepal · 2017-07-30T11:58:22Z

Since we operate with plain data, we can easily detect duplicate tasks across multiple DAGs. For instance, several of our DAGs may have the same sensor which awaits the same partition of the same table. This means that we can run not several sensor processes, but only one if our scheduler could support deduplication feature. Quite good opportinuty to save some slots.

Which deduplication feature is a lot of scheduler one, scheduler itself couldn't implement it without additional help. And that help we should provide for it.

kxepal added the enhancement label Jul 30, 2017

kxepal added this to the 1.2 milestone Jul 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks hashing for duplicates detection #5

Tasks hashing for duplicates detection #5

kxepal commented Jul 30, 2017

Tasks hashing for duplicates detection #5

Tasks hashing for duplicates detection #5

Comments

kxepal commented Jul 30, 2017