This is a set of Mongo containers for creating clusters using Docker
Check specific images used in this example at
- http://github.com/stutzlab/mongo-cluster-router
- http://github.com/stutzlab/mongo-cluster-configsrv
- http://github.com/stutzlab/mongo-cluster-shard
For a more complete example, check docker-compose.yml
-
In this example we will create a cluster with:
- 2 routers
- 3 config servers
- 2 shards, each with 2 replicas
-
We had some issues running the shards in Docker for Mac. Some shards would be freezed and Docker had to be restarted. Use a VirtualBox VM if needed in local machine.
-
Although this example starts with two-replica shards, it is recommended to use three replicas minimum to ensure automatic failover in case one of the nodes comes down. See more at https://docs.mongodb.com/manual/core/replica-set-architecture-three-members/
-
Create docker-compose.yml
version: '3.5'
services:
mongo-express:
image: mongo-express:0.54.0
ports:
- 8081:8081
environment:
- ME_CONFIG_MONGODB_SERVER=router
# - ME_CONFIG_BASICAUTH_USERNAME=
# - ME_CONFIG_BASICAUTH_PASSWORD=
restart: always
router:
image: stutzlab/mongo-cluster-router
environment:
- CONFIG_REPLICA_SET=configsrv
- CONFIG_SERVER_NODES=configsrv1
- ADD_SHARD_NAME_PREFIX=shard
- ADD_SHARD_1_NODES=shard1a
- ADD_SHARD_2_NODES=shard2a
configsrv1:
image: stutzlab/mongo-cluster-configsrv
environment:
- CONFIG_REPLICA_SET=configsrv
- INIT_CONFIG_NODES=configsrv1
shard1a:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_REPLICA_SET=shard1
- INIT_SHARD_NODES=shard1a
shard2a:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_REPLICA_SET=shard2
- INIT_SHARD_NODES=shard2a
docker-compose up
-
This may take a while. Check when logs stop going crazy!
-
Connect to mongo-express and see some internal collections
- open browser at http://localhost:8081
-
Show cluster status
docker-compose exec router mongo --port 27017
sh.status()
- Enable sharding of a collection in a database
docker-compose exec router mongo --port 27017
>
//create sampledb
use sampledb
//switch to admin db for sharding operation
use admin
//enable sharding for database
sh.enableSharding("sampledb")
//enable sharding for collection 'sample-collection'
db.adminCommand( { shardCollection: "sampledb.collection1", key: { mykey: "hashed" } } )
db.adminCommand( { shardCollection: "sampledb.collection2", key: { _id: "hashed" } } )
//if collection already has data, you may need to create the sharded index manually before calling the command above
//db['collection2'].createIndex({ "_id" : "hashed" })
//add some data
for(i=0;i<1000;i++) {
db.collection2.insert({"name": _rand(), "nbr": i})
}
//show details about qtty of records per shard
db.collection2.find().explain(true)
//inspect shard status
sh.status()
- Explore shard structure
docker-compose exec shard1a mongo --port 27017
>
//verify shard replication nodes/configuration
rs.conf()
-
mongo-cluster-configsrv volume is at "/data"
- Mount volumes as "myvolume:/data"
-
mongo-cluster-shard volume is at "/data"
- Mount volumes as "myvolume:/data"
-
The original mongo image has volumes at /data/db and /data/configdb but they are not used by this image because those paths are used differently depending if it is a shard or configsrv instance, so we simplyfied to be just "/data" on both type of instances to avoid catastrofic errors (once I mapped just /data and Swarm created individual instance volume for /data/db and I lost my data - lucky it was a test cluster!). Swarm will still create those volumes per instance (because they are declared at parent Dockerfile) but you can ignore them.
-
Like every good database, MongoDB tries to use all available memory from the host it is running on. BE AWARE of resources exhaustion when placing two container instances of mongo on the same VM, because both will compete for memory.
-
To minimize this risk, this container image configures mongo to limit its internal cache to 1/2 of available memory (according to container's cgroup limit). But this only works if you use resource limiting on the container (deploy: resources: limits: memory on docker-compose etc).
-
For example, if you have a 8GB VM and place two instances of the shard container with mem resource limit set to 3.5GB each, you will probably be safe, because the container will automatically set the WiredTiger cache limit to 1.25GB for each one (it will use more memory than this because it does other tasks, but the real hit here is regarding to cache size).
-
If you want to fine control the size o WiredTiger cache, you can set the value using WIRED_TIGER_CACHE_SIZE_GB ENV. It is available on shard and configsrv instances.
-
See more at https://docs.mongodb.com/manual/reference/program/mongod/#bin.mongod
- Add the new services to docker-compose.yml
...
router:
image: stutzlab/mongo-cluster-router
environment:
- CONFIG_SERVER_NAME=configsrv
- CONFIG_SERVER_NODES=configsrv1
- ADD_SHARD_NAME_PREFIX=shard
- ADD_SHARD_1_NODES=shard1a
- ADD_SHARD_2_NODES=shard2a
- ADD_SHARD_3_NODES=shard3a,shard3b
shard3a:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_NAME=shard3
- INIT_SHARD_NODES=shard3a,shard3b
shard3b:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_NAME=shard3
...
- Start new shard
docker-compose up shard3a shard3b
- Add shards to cluster
docker-compose up router
- Some data from shard1 and shard2 will be migrated to shard3 now. This may take a while.
- Check if all is OK with "rs.status()" on router
-
This is the case when you mount the Block Storage directly to the VM you run the node by using a "placement" to force that container to run only on that host and want to move it to another node (probably to expand the cluster size). If using NFS or another distributed volume manager, you don't need to worry about this.
-
Deactivate mongo nodes:
docker service scale mongo_shard1c=0
-
Remove Block Storage from current VM and mount it to the new VM
- If using local storage, just copy the volume contents to the new VM using SCP
- Verify the commands you are meant to perform on the VM according to your cloud provider in order to mount the volume in filesystem
-
Change the container placement in docker-compose.yml to point to the new host. Ex:
shard1c:
image: stutzlab/mongo-cluster-shard:4.4.0.8
environment:
- SHARD_REPLICA_SET=shard1
deploy:
placement:
constraints: [node.hostname == server13]
networks:
- mongo
volumes:
- /mnt/mongo1_shard1c:/data
-
Update service definitions
docker stack deploy --compose-file docker-compose.yml mongo
-
Check if scale for moved services is '1'
-
Check if node went online successfuly
- Enter newly instantiated container and execute:
mongo
rs.status()
- Verify if current node is OK
- Create the new shard replicas services in docker-compose.yml
...
shard3a:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_REPLICA_SET=shard3
- INIT_SHARD_NODES=shard3a,shard3b
volumes:
- shard3a:/data
shard3b:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_REPLICA_SET=shard3
volumes:
- shard3b:/data
...
-
Create new volumes and mount to the host that will execute the shard (if using Swarm, don't forget to add a placement constraint if needed)
-
Change router service and add the new SHARD config environment variables
router:
image: stutzlab/mongo-cluster-router:4.4.0.8
environment:
- CONFIG_REPLICA_SET=configsrv
- CONFIG_SERVER_NODES=configsrv1,configsrv2,configsrv3
- ADD_SHARD_REPLICA_SET_PREFIX=shard
- ADD_SHARD_1_NODES=shard1a,shard1b,shard1c
- ADD_SHARD_2_NODES=shard2a,shard2b,shard2c
- ADD_SHARD_3_NODES=shard3a,shard3b
- Instantiate new replica nodes and add shard to cluster
docker-compose up
#check replicaset status
docker-compose exec shard3a mongo --eval "rs.status()"
#check shard status
docker-compose exec router mongo --eval "sh.status()"
-
This should end adding the new shard and replicas automatically
-
(OPTIONAL) If you want to perform the above operations step by step do
#create replicaset instances for shard3
docker-compose up shard3a shard3b
#check new replicaset status
docker-compose exec shard3a mongo --eval "rs.status()"
#change router to add the new shard to configsrv
docker-compose up router
#check if new shard was added successfuly
docker-compose exec router mongo --eval "sh.status()"
- Add the new service do docker-compose.yml
...
shard1c:
image: stutzlab/mongo-cluster-shard
environment:
- SHARD_REPLICA_SET=shard1
volumes:
- shard1c:/data
...
```sh
docker-compose up shard1c
-
Add the new node to an existing shard (new replica node)
- Discover which node is currently the master by
docker-compose exec shard1a mongo --eval "rs.isMaster()"
#look in response for "ismaster: true"
docker-compose exec shard1b mongo --eval "rs.isMaster()"
#look in response for "ismaster: true"
- Execute the "add" command on the master node. If shard1b is the master:
docker-compose exec shard1b mongo --eval "rs.add( { host: \"shard1c\", priority: 0, votes: 0 } )"
-
When a shard has only two replicas and one goes down, no primary will be elected and the database will be freezed until you take action to force the usage of its state in despite of the other copy (no consensus takes place and you may lose data if the remaining node was behind the node that went down).
-
Create a new shard replica node service in docker-compose, and "up" it
-
Enter the mongo cli in the last shard node (probably not the primary one) and then reconfigure the entire replica set with the nodes you want to be present now, adding the new shard node and use "force:true".
docker-compose exec shard1d mongo --eval "rs.reconfig( { \"_id\": \"shard1\", members: [ { \"_id\": 0, host: \"shard1a\" }, { \"_id\": 4, host: \"shard1d\"} ]}, {force:true} )"
- Certify that you already have a "hashed" index in collection
use edidatico
//"_id" is the collection attribute key
db['testes-estudantes'].createIndex({ "_id" : "hashed" })
//now an index with name "_id_hashed" was created and will be used on the next step
db.adminCommand( { shardCollection: "edidatico.testes-estudantes", key: { _id: "hashed" } })
- Create database dump to a file
mongodump -u admin --authenticationDatabase admin --archive=/tmp/sample1-dump.db --db=sample1
- Restore database from a dump file (same target database name)
mongorestore -u admin --authenticationDatabase admin --archive=/tmp/sample1-dump.db
- Restore database from a dump file to a different database name
mongorestore -u admin --authenticationDatabase admin --archive=/tmp/sample1.db --nsFrom sample1.* --nsTo sample1copy.*
- Check if all indexes were copied too. In some cases they were not.
#create user in admin database so that it will be global and grant access to "sample1" database
use admin
db.createUser( { user: "app", pwd: "anypass", roles: [{role: "readWrite", db: "sample1"}] } )
##add admin roles to the user
use admin
db.grantRolesToUser(
"user1",
[ { role: "readAnyDatabase", db: "admin" } ]
)
- When you take a dump for moving database to another place, remember to deny access to applications while you are in movement to avoid losing data users place on the "old" database
use admin
db.revokeRolesFromUser( "user1",
[ { role: "readAnyDatabase", db: "admin" } ]
)
-
Pay attention to the level of isolation during Write and Read operations so that you achieve the most optimal performance vs data integrity for your application. Take a look at
-
Knowledge about the CAP Theorem is useful to help you decide on this.
- Login in container shell
- Execute
mongo
db.printCollectionStats()
db.printReplicationInfo()
db.printShardingStatus()
db.printSlaveReplicationInfo()
- Slow Queries are automatically logged to in-memory
- Connect to mongos and execute
use admin
db.adminCommand( { getLog: "global" } )
-
Look for "slow query" entries and view the executed query along with its time
-
In order to send some metrics from the in-memory logs to Prometheus, use http://github.com/stutzlab/mongo-log-exporter
Enter console on primary container of
- configsrv
- shard1
- shard2
On each node, configure free monitoring
mongo
> db.enableFreeMonitoring()
Get provided URL in log and load in browser
-
We had a situation where all resources exausted and containers got restarting by OOM
-
Some locks on volume where kept so that some containers wouldn't start OK because of locks (this is a expected behavior)
-
Docker daemon was strange (Swarm was not keeping the number of service instances), so we restarted the VMs
-
scale=0 the services that are not restarting
-
delete
/mongod.lock
from each mongo cluster volume -
scale=1 the services again
-
Another option is to stop all docker services and restart again (not needed but we solved this way because things were too ugly and it worked!)
docker stack rm mongo
- be SURE all volumes are mounted OUTSIDE instances- remove locks
- reboot VMs
docker stack deploy --compose-file docker-compose-mongo.yml mongo
- this is the same procedure as restoring a backup (!)