MongoDB is designed to support distributed operations and as such can tolerate network failures. In this notebook, we'll see how a MongoDB replica set fares in the face of simulated network failures.
(More technical details about how the MongoDB replica set is configured can be found in the notebook Dockering ).
from dockeringMongo import *
c = docker.Client(base_url='unix://var/run/docker.sock',
version='1.10',
timeout=10)
#Fire up three MongoDB containers that we'll use in a replica set
#STUB is an identifier used to label the nodes
#Superstition - short stub required?
STUB='rs3'
rsc=rs_config(c,STUB,num=3)
rsc
showContainers(c)
#tidyAwayContainers(c,['rs3_srv0','rs3_srv1','rs3_srv2'])
docker_ps(c)
#Initialise the replica set
from pymongo import MongoClient
#We'll use the 0th server in the set as a the node
mc = MongoClient('localhost', get27017tcp_port(c,STUB+'_srv0'),replicaset=STUB)
#Set up connections to the other members of the replica set
client1=MongoClient('localhost', get27017tcp_port(c,STUB+'_srv1'),replicaset=STUB)
client2=MongoClient('localhost', get27017tcp_port(c,STUB+'_srv2'),replicaset=STUB)
#In the mongo console, we would typically use the command MONGOCLIENT.config() to initialise the replica set
#Here, we use the replSetInitiate admin command, applying it with the desired configuration
mc.admin.command( "replSetInitiate",rsc);
#We may need to wait a minute or two for the configuration to come up
#If you get an error message that suggests the configuration isn't up yet, wait a few seconds then rerun the cell
mc.admin.command('replSetGetStatus')
#The statusReport() command lets you ask a mongoDB node what it thinks the state of the world is
statusReport(mc)
#If we're quick, we can watch the machines come up into the fold
statusReport(client1)
statusReport(client1)
statusReport(client1)
In this section we'll use firewall rules between containers to model network partitions.
#The blockade library contains functions for setting up and tearing down iptable firewall rulesets
from blockade import *
#As we define blockade rules by IP address, we need to know those addresses
rsc
#Maybe worth creating an object that lists stuff like this by instance name?
memberIPaddress(rsc,'rs3_srv2')
The partition_containers()
function creates a named ruleset that puts listed IP addresses into separate partitions.
Let's start by splitting off one of the secondary machines - rs3_srv2, on IP address 172.17.0.4:27017 - into a partition on its own:
#The machines in each partition are put into their own lists
#partition_containers(RULESET,[ PARTITION_1_LIST, PARTITION_2_LIST ]
partition_containers('test1w2s2', [ [netobj('172.17.0.4')],[netobj('172.17.0.2'),netobj('172.17.0.3')]])
Wait a few seconds for the machines to realise all is not well before proceeding...
...then check their statuses:
statusReport(mc)
statusReport(client1)
statusReport(client2)
Let's fix the network...
clear_iptables('test1w2s2')
#Wait a second or too then see how things are...
statusReport(mc)
What happens if we knock out the connection betweem the primary server and the secondary servers?
#Put the primary server into a partition on it's own
partition_containers('test1w2s2', [ [netobj('172.17.0.2')],[netobj('172.17.0.3'),netobj('172.17.0.4')]])
Again, wait a few seconds for things to happen... then check the statuses:
statusReport(mc)
statusReport(client1)
statusReport(client2)
clear_iptables('test1w2s2')
statusReport(mc)
statusReport(client1)
statusReport(client2)
Grab a copy of the logs from each of the servers to see what happened...
#In an ssh shell, the following sort of command displays a real time stream of stdio log messages from the container
#!docker.io logs --follow=true {STUB}_srv1
!docker.io logs {STUB}_srv0 > {STUB}_srv0_log.txt
!docker.io logs {STUB}_srv1 > {STUB}_srv1_log.txt
!docker.io logs {STUB}_srv2 > {STUB}_srv2_log.txt
#?we may also be able to use mongostat - http://docs.mongodb.org/manual/reference/program/mongostat/
Maybe we need to use a log parser or highlighter to pull out interesting messages? Maybe we need to visualise the communication between the servers, eg with a combined timeline?
See below for the log for rs3_srv0, the original primary. There are several things to note:
!tail -500 {STUB}_srv0_log.txt
How about rs3_srv1's story?
!tail -500 {STUB}_srv1_log.txt
How did things look from the perspective of rs3_2?
!tail -500 {STUB}_srv2_log.txt
#Tidy up
#Remove the blockade rules
clear_iptables('test1w2s2')
#Shut down the containers
tidyAwayContainers(c,['rs3_srv0','rs3_srv1','rs3_srv2'])
Need examples of writing to different servers in the replica set under different partion conditions to show what happens to the replication in each case? Eg in matters of eventual consistency, etc?
I think there's a lot to be said for giving students a transcript such as this showing a worked example. Having got their eye in, we can perhaps get them to run slightly different scenarios, eg a disconnect between rs3_srv0 and rs3_srv2, with connections still up between rs3_srv0 and rs3_srv1, and rs3_srv1 and rs3_srv2. I'm not sure if we can do one-way breaks? (eg so rs3_srv1 can see rs3_srv0 but rs3_srv1 can't see rs3_srv0?)
Also, students could run scenarios of inserting data into different machines under different network conditions?
testclient= MongoReplicaSetClient('{0}:{1}'.format(getContainIPaddress(c,STUB+'_srv0'),27017), replicaSet=STUB)
testdb=testclient.testdb
testcollection=testdb.testcollection
testcollection.insert({'name':'test1'})
testcollection.find()