title	description	services	documentationcenter	author	manager	ms.service	ms.custom	ms.devlang	ms.topic	ms.tgt_pltfrm	ms.workload	ms.date	ms.author
HBase Region Server fails to Restart \| Microsoft Docs	Troubleshooting the cause of failure of HBase regionserver restart operation.	hdinsight		nitinver	ashitg	hdinsight	hdinsightactive	na	article	na	big-data	04/11/2017	nitinver

The restart operation on HBase Region Server fails to complete

First of all, the situation like this could be prevented by following best practices. It is advisable to pause the heavy workload activity when planning to restart HBase Region Servers. If the application continues to connect with region servers when shutdown is in progress, it will slow down the region server restart operation by several minutes. Also, it is advised the users to flush all the tables by following HDInsight HBase: How to Improve HBase cluster restart time by Flushing tables as a reference.

If a user initiates the restart operation on HBase region server's from Ambari UI. He would immediately see the region servers went down, but not coming back up for too long.

Below is what happens behind the scenes:

Ambari agent will send a stop request to region server.
The Ambari agent then waits for 30 seconds for region server to shutdown gracefully.
If the customer's application continues to connect with region server, it will not shutdown immediately and hence 30 seconds timeout will expire sooner.

After expiration of 30 seconds, Ambari agent will send a force kill (kill -9) to region server. One can observe this in ambari-agent log (in /var/log/ directory of respective workernode) as below:

  2017-03-21 13:22:09,171 - Execute['/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf stop regionserver'] {'only_if': 'ambari-sudo.sh  -H -E t
  est -f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1', 'on_timeout': '! ( ambari-sudo.sh  -H -E test -
  f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H 
  -E cat /var/run/hbase/hbase-hbase-regionserver.pid`', 'timeout': 30, 'user': 'hbase'}
  2017-03-21 13:22:40,268 - Executing '! ( ambari-sudo.sh  -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >
  /dev/null 2>&1 ) || ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid`'. Reason: Execution of 'ambari-sudo.sh su hbase -l -s /bin/bash -c 'export  
  PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/var/lib/ambari-agent ; /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/curre
  nt/hbase-regionserver/conf stop regionserver was killed due timeout after 30 seconds
  2017-03-21 13:22:40,285 - File['/var/run/hbase/hbase-hbase-regionserver.pid'] {'action': ['delete']}
  2017-03-21 13:22:40,285 - Deleting File['/var/run/hbase/hbase-hbase-regionserver.pid']

Due to this abrupt shutdown, although the region server process gets killed, the port associated with the process may not be released, which eventually leads to AddressBindException as shown in the logs below while starting region server. One can verify this in region-server.log in /var/log/hbase directory on the worker nodes where region server start fails.

2017-03-21 13:25:47,061 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2636)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651)

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2634)
... 5 more

Caused by: java.net.BindException: Problem binding to /10.2.0.4:16020 : Address already in use
at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2497)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.<init>(RpcServer.java:580)
at org.apache.hadoop.hbase.ipc.RpcServer.<init>(RpcServer.java:1982)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:863)
at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:632)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:532)
... 10 more

Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2495)
... 15 more

During such cases, the workaround below can be tried:

Try to reduce the load on the HBase region servers before initiating a restart.

Alternatively (if step above doesn't help), try and manually restart region servers on the worker nodes using following commands:

sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh stop regionserver"
sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh start regionserver"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hbase-regionserver-restart-failed.md

hbase-regionserver-restart-failed.md

The restart operation on HBase Region Server fails to complete

Files

hbase-regionserver-restart-failed.md

Latest commit

History

hbase-regionserver-restart-failed.md

File metadata and controls

The restart operation on HBase Region Server fails to complete