Tasks can hang indefinitely if app encounters a critical error #121

dolsysmith · 2021-07-08T19:54:45Z

The data extract for the full Coronavirus dataset seems to have gotten hung up sometime after March 25, probably either when the shared /storage drive ran out of space, or when the server had to be restarted after a network outage. TweetSets read the task as still processing, although no files were being produced. In order to restart the task, it's necessary to delete the pertinent folder in /storage/full_datasets.

We need a way to recover gracefully from such errors.

If we continue using Celery, look at the call to _generate_tasks.AsyncResult(task_id), which was returning a "Pending" status even in the absence of a viable task.

If we are able to use Spark for extracts, consider exposing the Spark jobs UI from the container (for monitoring and disabling of jobs).

The text was updated successfully, but these errors were encountered:

dolsysmith mentioned this issue Jul 14, 2021

Refactor ingest/extract process #122

Open

dolsysmith added bug medium effort level labels Jul 26, 2021

dolsysmith added this to the 2.2 milestone Jul 26, 2021

lwrubel modified the milestones: 2.2, 2.3 Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks can hang indefinitely if app encounters a critical error #121

Tasks can hang indefinitely if app encounters a critical error #121

dolsysmith commented Jul 8, 2021

Tasks can hang indefinitely if app encounters a critical error #121

Tasks can hang indefinitely if app encounters a critical error #121

Comments

dolsysmith commented Jul 8, 2021