Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary failovers for HDFS namenode #36

Open
dmichal opened this issue Feb 27, 2019 · 0 comments
Open

Unnecessary failovers for HDFS namenode #36

dmichal opened this issue Feb 27, 2019 · 0 comments

Comments

@dmichal
Copy link

dmichal commented Feb 27, 2019

Hello,
I've noticed that in my cluster the webhdfs output plugin from time to time performs failover between my HDFS namenodes, despite the fact that the namenodes themselves do not failover. After a bit of investigation I've found that the actual exception causing the failover in plugin is: "Failed to connect to host hd4.local:50075, Net::ReadTimeout.", where hd4 is one of my datanodes, ie. the plugin performs failover even in case of datanode connection error. It is so because the plugin just searches the error string for pattern "Failed to connect". Maybe some more specific matching should be performed, eg. searching for namenode port as well? Unnecessary failovers cause a lot of problems for me, as they sometimes result in HDFS lease problems.

Observed in logstash 6.6.0, HDFS 2.7.3; logstash and hadoop machines are running on CentOS 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant