VCS集群状态为 STALE_ADMIN_WAIT的解决
时间:2010-09-20 来源:guyuanli
今天发现有一VCS集群状态变为 STALE_ADMIN_WAIT,解决方法如下:
1.首先查看两台机器的当前状态
cp-etl01:/etc/VRTSvcs/conf/config # hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A cp-etl01 STALE_ADMIN_WAIT 0
A cp-etl02 STALE_ADMIN_WAIT 0
2.在两台机器上查看当前的进程
cp-etl01:/etc/VRTSvcs/conf/config # ps -ef |grep had 查看HA进程
root 7243 1 0 2009 ? 00:00:00 /opt/VRTSvcs/bin/hashadow
root 4683 1 0 Aug24 ? 00:00:02 /opt/VRTSvcs/bin/had -restart
root 19294 17911 0 11:21 pts/7 00:00:00 grep had
cp-etl02:~ # ps -ef | grep had
root 7278 1 0 2009 ? 00:00:00 /opt/VRTSvcs/bin/hashadow
root 23411 1 0 Aug24 ? 00:00:01 /opt/VRTSvcs/bin/had -restart
root 7012 6981 0 11:22 pts/0 00:00:00 grep had
可以看到两台机器的进程都正常启动,但是had进程不是正常状态,需要restart
3.看看能不能识别对方未尾是01
cp-etl01:/etc/VRTSvcs/conf/config # gabconfig –a
GAB Port Memberships
===============================================================
Port a gen 1bc510 membership 01
Port h gen 1bc51b membership 01
cp-etl02:~ # gabconfig –a 看看能不能识别对方未尾是01
GAB Port Memberships
===============================================================
Port a gen 1bc510 membership 01
Port h gen 1bc51b membership 01
可以看到两台机器都能识别到对方
4.重启集群,在任一台机器上执行
cp-etl01:/etc/VRTSvcs/conf/config # hastop -all -force
cp-etl01:/etc/VRTSvcs/conf/config # ps -ef |grep had
root 20025 17911 0 11:25 pts/7 00:00:00 grep had
在两台机器上启动集群
cp-etl01:/etc/VRTSvcs/conf/config # hastart
cp-etl02:~ # hastart
5.查看状态
cp-etl01:/etc/VRTSvcs/conf/config # ps -ef |grep had
root 20034 1 0 11:25 ? 00:00:00 /opt/VRTSvcs/bin/had
root 20036 1 0 11:25 ? 00:00:00 /opt/VRTSvcs/bin/hashadow
root 20049 17911 0 11:26 pts/7 00:00:00 grep had
cp-etl01:/etc/VRTSvcs/conf/config # hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A cp-etl01 STALE_ADMIN_WAIT 0
6.使用第一台机器强制拉动集群
cp-etl01:/etc/VRTSvcs/conf/config # hostname
cp-etl01
cp-etl01:/etc/VRTSvcs/conf/config # hasys -force cp-etl01
You have new mail in /var/spool/mail/root
cp-etl01:/etc/VRTSvcs/conf/config # hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A cp-etl01 RUNNING 0
A cp-etl02 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ETL01 cp-etl01 Y N PARTIAL
B ETL01 cp-etl02 Y N OFFLINE
B ETL02 cp-etl01 Y N OFFLINE
B ETL02 cp-etl02 Y N ONLINE
可以看到集群已经是正常状态了,但是现在集群还不能保护应用,如果应用进程断掉,则不会进行切换