HBase Regions in Transition RIT 排查

问题描述:

HBase 启动后Web UI 可以看到多个表处于RIT状态,一段时间后HRegionServer跟HMaster进程全部终止,期间HBase RegionServer请求都连接超时

问题追踪:

找到HRegionServer报错信息

2023-12-14T14:03:45,390 ERROR [RS_OPEN_REGION-regionserver/172.26.58.53:16020-1] regionserver.HRegionServer: ***** ABORTING region server 172.26.58.53,16020,1702533818390: **The coprocessor com.mt.hbase.chpt08.coprocessor.SumOrderEndpointV3** threw java.lang.**NoSuchMethodError**: com.google.protobuf.Descriptors$FileDescriptor.**internalBuildGeneratedFileFrom**([Ljava/lang/String;[Lcom/google/protobuf/Descriptors$FileDescriptor;)Lcom/google/protobuf/Descriptors$FileDescriptor; *****
java.lang.NoSuchMethodError: com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom([Ljava/lang/String;[Lcom/google/protobuf/Descriptors$FileDescriptor;)Lcom/google/protobuf/Descriptors$FileDescriptor;
    at com.mt.hbase.chpt08.coprocessor.v3.generated.SumDTOV3.<clinit>(SumDTOV3.java:1454) ~[referencebook-1.0-SNAPSHOT.jar:?]
    at com.mt.hbase.chpt08.coprocessor.v3.generated.SumDTOV3$SumService.getDescriptor(SumDTOV3.java:1299) ~[referencebook-1.0-SNAPSHOT.jar:?]
    at com.mt.hbase.chpt08.coprocessor.v3.generated.SumDTOV3$SumService.getDescriptorForType(SumDTOV3.java:1303) ~[referencebook-1.0-SNAPSHOT.jar:?]
    at org.apache.hadoop.hbase.regionserver.HRegion.registerService(HRegion.java:7874) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.createEnvironment(RegionCoprocessorHost.java:413) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.createEnvironment(RegionCoprocessorHost.java:92) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.checkAndLoadInstance(CoprocessorHost.java:283) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:249) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:200) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:388) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.<init>(RegionCoprocessorHost.java:278) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:859) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:734) ~[hbase-server-2.5.5.jar:2.5.5]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_371]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_371]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_371]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_371]
    at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:6971) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7184) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7161) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7120) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7076) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:149) ~[hbase-server-2.5.5.jar:2.5.5]
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) ~[hbase-server-2.5.5.jar:2.5.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_371]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_371]

发现报错 java.lang.NoSuchMethodError: com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom

这个报错从上下文可以发现是由于自定义的协处理器 The coprocessor com.mt.hbase.chpt08.coprocessor.SumOrderEndpointV3 引起的,报错原因是protobuf包冲突引起的,这个协处理器的java对象生成是由protobuf3生成的,而hbase-server-2.5.x系列里面依赖的protobuf还是2.5.0,其方法internalBuildGeneratedFileFrom入参包含3个,但是protobuf3对应的包里面,该方法3个参数的方法已经被Deprecated了,然后引入了一个2个参数的同名方法,该协处理器使用的协议对象里面调用的就是这个2个参数的重载的同名方法,但是HBase RegionServer依赖的protobuf 2.5.0版本这个类还没有这个2个参数的重载同名方法,所以报错NoSuchMethodError

                                                    protobuf2.5.0版本方法签名

                                                    protobuf3.25.1版本方法签名

解决方案:

1、动态加载的协处理器:如果是动态加载的协处理器,重新用protobuf2版本生成一下java类,重新打包成jar,替换掉之前协处理器加载的jar包,然后重启HBase,然后视情况看看是否需要卸载协处理器。

--加载协处理器
hbase:027:0> disable 's_order'
Took 0.9659 seconds
hbase:028:0> alter 's_order', 'coprocessor'=>'file:///Users/xupeng/dev/hbase-2.5.5/lib/referencebook-1.0-SNAPSHOT.jar|com.mt.hbase.chpt08.coprocessor.SumOrderEndpoint|100|arg1=x'
Updating all regions with the new schema...
All regions updated.
Done.
Took 1.4826 seconds
hbase:029:0> enable 's_order'
Took 0.7218 seconds


--卸载协处理器
hbase:072:0> disable 's_order'
Took 0.7320 seconds
hbase:073:0> alter 's_order', METHOD => 'table_att_unset', NAME => 'COPROCESSOR$1'
Took 0.0364 seconds
hbase:074:0> enable 's_order'
Took 0.6709 seconds 

2、静态加载的协处理器:移除hbase-site.xml里面配置的协处理器property,移除lib下的jar包

<property>
    <name>hbase.coprocessor.region.classes</name>
    <value>com.mt.hbase.chpt08.coprocessor.SumOrderEndpoint</value>
  </property>

发表回复

Copyright © 2024 aiapaas.com 粤ICP备 18086566号