What is a cluster in MongoDB?
I am someone new to mongoDB and has absolutely no knowledge regarding databases so would like to know what is a cluster in MongoDB and what is the point of connecting to one in MongoDB? Is it a must to connect to one or can we just connect to the localhost?
A mongodb cluster is the word usually used for sharded cluster in mongodb. The main purposes of a sharded mongodb are:
- Scale reads and writes along several nodes
- Each node does not handle the whole data so you can separate data along all the nodes of the shard. Each node is a member of a shard (which is a replicaset, see below for the explanation) and the data are separated on all shards.
This is the representation of a mongodb sharded cluster from the official doc.
If you are starting with mongodb, I do not recommend you to shard your data. Shards are way more complicated to maintain and handle than replicasets are. You should have a look at a basic replicaset. It is fault tolerant and sufficient for simple needs.
The ideas of a replicaset are :
- Every data are repartited on each node
- Only one node accept writes
A replicaset representation from the official doc
For simple apps there is no problem to have your mongodb cluster on the same host than your application. You can even have them on a single member replicaset but you won't be fault tolerant anymore.
Atlas-managed MongoDB deployments, or “clusters,” can be either a replica set or a sharded cluster.
replica set: a cluster of MongoDB servers that implements replication and automated failover. (Failover is a method of protecting computer systems from failure, in which standby equipment automatically takes over when the main system fails.)
– replication: a feature allowing multiple database servers to share the same data, thereby ensuring redundancy and facilitating load balancing. (Redundancy is a system design in which a component is duplicated so if it fails there will be a backup.)
sharded cluster: the set of nodes comprising a sharded MongoDB deployment. A sharded cluster consists of config servers, shards, and one or more mongos routing processes.
– sharding: a method for distributing data across multiple machines.
A sharded cluster consists of ten servers: one server for the mongos [interface between the client applications and the sharded cluster] and three servers each for the first replica set, the second replica set, and the config server replica set
– shard: a single mongod instance or replica set that stores some portion of a sharded cluster’s total data set. In production, all shards should be replica sets.
Sources: MongoDB docs: Glossary, Create a Cluster, Convert a Replica Set to a Sharded Cluster