Error handling and retry behaviour improved #42

dmichal · 2020-07-15T12:26:45Z

Several changes in error handling:

Support for infinite retries added - better fits logstash philosophy of not dropping events in case of errors in order to provide data integrity,
Properly use 'retry_times' in case of failovers - previously 'retry_times' parameter was ignored in case of failovers thus leading to infinite retries,
Reducing unnecessary failovers due to conn errors - solves Unnecessary failovers for HDFS namenode #36
Optional limit for retry interval added - useful in case of infinite retries as retry interval increases with each attempt potentially resulting in extremely high values,
Docs improved - better description of retry behaviour and retry_interval increments,
Properly handle errors during file creation - previously errors encountered during file creation were not handled at all; now they are treated just like any other write error

Previously, 'retry_times' parameter was ignored after failovers causing unconditional and possibly infinite retries.

Previously, the plugin performed failovers in case of datanode connection errors. This behaviour has been changed by checking if host and port for which connection error occurred match the namenode host and port.

dmichal · 2020-07-15T14:35:37Z

One more change I'm wondering about is to add sleep before retry in case of failovers, just as it is in case of other errors. This may prove beneficial when both namenodes are down or inaccessible - in current implementation retries are performed immediately after error thus using resources, producing a lot of unnecessary log messages and quickly using up retry limit. On the other hand, in case of one namenode working properly sleeping for a while after failover won't have significant impact on performance since it is going to be performed only once.

dmichal added 7 commits July 9, 2020 14:56

Support for infinite retries added

449537e

Properly use 'retry_times' in case of failovers

06cd276

Previously, 'retry_times' parameter was ignored after failovers causing unconditional and possibly infinite retries.

Reducing unnecessary failovers due to conn errors

f56cc7c

Previously, the plugin performed failovers in case of datanode connection errors. This behaviour has been changed by checking if host and port for which connection error occurred match the namenode host and port.

Optional limit for retry interval added

8aa9ce3

Minor changes in comments and docs

0c77624

Properly handle errors during file creation

58d8fb8

Minor changes

9bce911

Unit test for 'retry_max_interval' default added

aad6660

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling and retry behaviour improved #42

Error handling and retry behaviour improved #42

dmichal commented Jul 15, 2020

dmichal commented Jul 15, 2020

Error handling and retry behaviour improved #42

Are you sure you want to change the base?

Error handling and retry behaviour improved #42

Conversation

dmichal commented Jul 15, 2020

dmichal commented Jul 15, 2020