Mongodb副本集主(PRIMARY)从(SECONDARY)切换

        当使用mongodb(mdb)副本集作为高可用解决方案时,主从切换根据发起的情景不一样分为主动切换和被动切换.主
动切换主要是由PRIMARY发起的,而被动切换是当PRIMARY无法联系时,直接由SECONDARY发起的强制激活.

PRIMARY:192.168.56.11
SECONDARY:192.168.56.10
MDB:3.0.3

查看当前的复制状态
rep-test:PRIMARY> rs.status()
{
 "set" : "rep-test",
 "date" : ISODate("2016-02-26T03:15:36.810Z"),
 "myState" : 1,
 "members" : [
  {
   "_id" : 1,
   "name" : "192.168.56.11:63105",
   "health" : 1,
   "state" : 1,
   "stateStr" : "PRIMARY",
   "uptime" : 51164,
   "optime" : Timestamp(1456456239, 1),
   "optimeDate" : ISODate("2016-02-26T03:10:39Z"),
   "electionTime" : Timestamp(1456456241, 1),
   "electionDate" : ISODate("2016-02-26T03:10:41Z"),
   "configVersion" : 300282,
   "self" : true
  },
  {
   "_id" : 2,
   "name" : "192.168.56.10:63105",
   "health" : 1,
   "state" : 2,
   "stateStr" : "SECONDARY",
   "uptime" : 295,
   "optime" : Timestamp(1456456239, 1),
   "optimeDate" : ISODate("2016-02-26T03:10:39Z"),
   "lastHeartbeat" : ISODate("2016-02-26T03:15:35.483Z"),
   "lastHeartbeatRecv" : ISODate("2016-02-26T03:15:35.483Z"),
   "pingMs" : 0,
   "configVersion" : 300282
  }
 ],
 "ok" : 1
}


主动切换比较简单,通过在主库上执行rs.stepDown()方法即可


rep-test:PRIMARY> rs.stepDown()

日志中有显示已转换成SECONDARY

2016-02-26T11:17:08.912+0800 I COMMAND  [conn313] Attempting to step down in response to replSetStepDown command
2016-02-26T11:17:08.912+0800 I REPL     [ReplicationExecutor] transition to SECONDARY
2016-02-26T11:17:08.913+0800 I NETWORK  [conn313] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server

[127.0.0.1:41741]
2016-02-26T11:17:08.920+0800 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:41754 #329 (2 connections now open)
2016-02-26T11:17:09.583+0800 I NETWORK  [conn328] end connection 192.168.56.10:58104 (1 connection now open)
2016-02-26T11:17:09.584+0800 I NETWORK  [initandlisten] connection accepted from 192.168.56.10:58105 #330 (2 connections now open)
2016-02-26T11:17:10.379+0800 I REPL     [ReplicationExecutor] replSetElect voting yea for 192.168.56.10:63105 (2)
2016-02-26T11:17:11.587+0800 I REPL     [ReplicationExecutor] Member 192.168.56.10:63105 is now in state PRIMARY

rep-test:SECONDARY> rs.status()
{
 "set" : "rep-test",
 "date" : ISODate("2016-02-26T03:18:11.642Z"),
 "myState" : 2,
 "members" : [
  {
   "_id" : 1,
   "name" : "192.168.56.11:63105",
   "health" : 1,
   "state" : 2,
   "stateStr" : "SECONDARY",
   "uptime" : 51319,
   "optime" : Timestamp(1456456239, 1),
   "optimeDate" : ISODate("2016-02-26T03:10:39Z"),
   "configVersion" : 300282,
   "self" : true
  },
  {
   "_id" : 2,
   "name" : "192.168.56.10:63105",
   "health" : 1,
   "state" : 1,
   "stateStr" : "PRIMARY",
   "uptime" : 450,
   "optime" : Timestamp(1456456239, 1),
   "optimeDate" : ISODate("2016-02-26T03:10:39Z"),
   "lastHeartbeat" : ISODate("2016-02-26T03:18:11.640Z"),
   "lastHeartbeatRecv" : ISODate("2016-02-26T03:18:11.639Z"),
   "pingMs" : 1,
   "electionTime" : Timestamp(1456456239, 2),
   "electionDate" : ISODate("2016-02-26T03:10:39Z"),
   "configVersion" : 300282
  }
 ],
 "ok" : 1
}


被动切换稍微麻烦一些,假设当前PRIMARY无法连接上了,需要手动删除成员.

rep-test:SECONDARY> rs.status()
{
 "set" : "rep-test",
 "date" : ISODate("2016-01-17T06:54:11.083Z"),
 "myState" : 2,
 "members" : [
  {
   "_id" : 1,
   "name" : "192.168.56.11:63105",
   "health" : 0,
   "state" : 8,
   "stateStr" : "(not reachable/healthy)",
   "uptime" : 0,
   "optime" : Timestamp(0, 0),
   "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
   "lastHeartbeat" : ISODate("2016-01-17T06:54:10.781Z"),
   "lastHeartbeatRecv" : ISODate("2016-01-17T06:54:10.772Z"),
   "pingMs" : 0,
   "lastHeartbeatMessage" : "Failed attempt to connect to 192.168.56.11:63105; couldn't connect to server 192.168.56.11:63105

(192.168.56.11), connection attempt failed",
   "configVersion" : -1
  },
  {
   "_id" : 2,
   "name" : "192.168.56.10:63105",
   "health" : 1,
   "state" : 2,
   "stateStr" : "SECONDARY",
   "uptime" : 60691,
   "optime" : Timestamp(1456456239, 1),
   "optimeDate" : ISODate("2016-02-26T03:10:39Z"),
   "configVersion" : 300282,
   "self" : true
  }
 ],
 "ok" : 1
}
可以看到状态已经无法连接上了.需要直接强制激活第二个(状态为SECONDARY)的成员

rep-test:SECONDARY> cfg=rs.conf()

rep-test:SECONDARY> cfg.members=[cfg.members[1]]
[
 {
  "_id" : 2,
  "host" : "192.168.56.10:63105",
  "arbiterOnly" : false,
  "buildIndexes" : true,
  "hidden" : false,
  "priority" : 1,
  "tags" : {
   
  },
  "slaveDelay" : 0,
  "votes" : 1
 }
]

只给一个成员的数据,注意数组的下标是从0开始,本例是id为2,所以数组下标为1

rep-test:SECONDARY> rs.reconfig(cfg, {force: true});
{ "ok" : 1 }

强制重新配置复制

rep-test:SECONDARY> rs.status()
{
 "set" : "rep-test",
 "date" : ISODate("2016-01-17T06:56:33.838Z"),
 "myState" : 1,
 "members" : [
  {
   "_id" : 2,
   "name" : "192.168.56.10:63105",
   "health" : 1,
   "state" : 1,
   "stateStr" : "PRIMARY",
   "uptime" : 60833,
   "optime" : Timestamp(1456456239, 1),
   "optimeDate" : ISODate("2016-02-26T03:10:39Z"),
   "electionTime" : Timestamp(1456456239, 3),
   "electionDate" : ISODate("2016-02-26T03:10:39Z"),
   "configVersion" : 339433,
   "self" : true
  }
 ],
 "ok" : 1
}


可以看到我们当前已经成功激活了id为2的成员.


注意:在主动和被动切换的过程中都会造成应用的短暂中断.

10-10 03:13