Change _insert_tasks to use add_async #112

tannermiller-wf · 2013-10-07T18:11:05Z

In _insert_tasks(), we now add each task individually using
Queue.add_async(), instead of all at once with Queue.add(). This allows
us to insert each task only once and not have to worry about splitting
and retrying until we find the bad tasks. Unfortunately this method
prevents us from determining exactly how many tasks were successfully
inserted. This will help however when a large number of duplicated tasks
are trying to be added and it keeps splitting instead of just quitting.

In _insert_tasks(), we now add each task individually using Queue.add_async(), instead of all at once with Queue.add(). This allows us to insert each task only once and not have to worry about splitting and retrying until we find the bad tasks. Unfortunately this method prevents us from determining exactly how many tasks were successfully inserted. This will help however when a large number of duplicated tasks are trying to be added and it keeps splitting instead of just quitting.

tannermiller-wf · 2013-10-07T18:11:20Z

@robertkluin-wf @beaulyddon-wf @ericolson-wf @tylertreat-wf

ericolson-wf · 2013-10-07T20:28:31Z

This looks pretty good. I assume this helps speedup some of our worst cases.

Do we want to make this optional? I'd hope we wouldn't need to make it optional if this works well.

Does the developer sometimes want to ensure the tasks have been inserted? Maybe storing the futures on the class would allow a developer to make get_result calls? - I'm not sure if keeping those handles would use more memory.

Also, any memory bloat by using many single add_async() instead of one group add()?

beaulyddon-wf · 2013-10-08T22:45:36Z

furious/context/context.py

@@ -247,21 +247,22 @@ def _insert_tasks(tasks, queue, transactional=False):
    if not tasks:
        return 0

-    try:
-        taskqueue.Queue(name=queue).add(tasks, transactional=transactional)


I'd like to see more statistics on this. Just some basic scenario testing....

Old style (batch and split)
All Async
Batch and Async (take X tasks and split into Y batches and insert that batches async)
Batch and Async on failure (Do the batch insert and if it fails then fallback to all async)
Batch and Async with Async on Failure (take X tasks and split into Y batches and insert that batches async and if those fail split into single tasks to insert async)

Then run those scenarios against 1, 10, 100, 1000 tasks, etc. Also run with some tasks failing and hitting the different split scenarios.

FYI @tannermiller-wf @ericolson-wf @johnlockwood-wf

ghost · 2013-10-08T22:52:16Z

As @beaulyddon-wf noted, you need to run some more tests. Specifically I would like to see "normal" case tests, meaning there are no splits, just a single successful batch insert.

I would also like to see how the async-per-task compares with one, ten, one hundred, one thousand tasks. If you add the get_result call in, how does this compare?

beaulyddon-wf reviewed Oct 8, 2013
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change _insert_tasks to use add_async #112

Change _insert_tasks to use add_async #112

tannermiller-wf commented Oct 7, 2013

tannermiller-wf commented Oct 7, 2013

ericolson-wf commented Oct 7, 2013

beaulyddon-wf Oct 8, 2013

beaulyddon-wf Oct 8, 2013

ghost commented Oct 8, 2013

Change _insert_tasks to use add_async #112

Are you sure you want to change the base?

Change _insert_tasks to use add_async #112

Conversation

tannermiller-wf commented Oct 7, 2013

tannermiller-wf commented Oct 7, 2013

ericolson-wf commented Oct 7, 2013

beaulyddon-wf Oct 8, 2013

Choose a reason for hiding this comment

beaulyddon-wf Oct 8, 2013

Choose a reason for hiding this comment

ghost commented Oct 8, 2013