Solr Collection vs Cores

From the SolrCloud Documentation

Collection: A single search index.

Shard: A logical section of a single collection (also called Slice). Sometimes people will talk about "Shard" in a physical sense (a manifestation of a logical shard)

Replica: A physical manifestation of a logical Shard, implemented as a single Lucene index on a SolrCore

Leader: One Replica of every Shard will be designated as a Leader to coordinate indexing for that Shard

SolrCore: Encapsulates a single physical index. One or more make up logical shards (or slices) which make up a collection.

Node: A single instance of Solr. A single Solr instance can have multiple SolrCores that can be part of any number of collections.

Cluster: All of the nodes you are using to host SolrCores.

So basically a Collection (Logical group) has multiple cores (physical indexes).

Also, check the discussion


Core

In Solr, a core is composed of a set of configuration files, Lucene index files, and Solr’s transaction log.

a Solr core is a uniquely named, managed, and configured index running in a Solr server; a Solr server can host one or more cores. A core is typically used to separate documents that have different schemas

collection

Solr also uses the term collection, which only has meaning in the context of a Solr cluster in which a single index is distributed across multiple servers.

SolrCloud introduces the concept of a collection, which extends the concept of a uniquely named, managed, and configured index to one that is split into shards and distributed across multiple servers.


As per my understanding:

In distributed search,

Collection is a logical index spread across multiple servers. Core is that part of server which runs one collection.

In non-distributed search,

Single server running the Solr can have multiple collections and each of those collection is also a core. So collection and core are same if search is not distributed.

Summary

  1. Collection per server is called a core.
  2. Collection is same as an index.
  3. One Solr server can have many cores.
  4. Collection is a logical index (Example usage for multiple collections: Say two teams in same group are not big enough to justify a full Solr server of their own. But they also do not want to mix their data in a single index. They can then create separate collections/indexes which will keep their data separate).
  5. Its better to use a separate Solr Cloud rather than create collections if the data for a collection is big enough (not sure, comments please?)

Single instance

On a single instance, Solr has something called a SolrCore that is essentially a single index. If you want multiple indexes, you create multiple SolrCores.

Solr Cloud

With SolrCloud, a single index can span multiple Solr instances. This means that a single index can be made up of multiple SolrCore's on different machines. We call all of these SolrCores that make up one logical index a collection.

A collection is a essentially a single index that spans many SolrCore's, both for index scaling as well as redundancy. If you wanted to move your 2 SolrCore Solr setup to SolrCloud, you would have 2 collections, each made up of multiple individual SolrCores.


From Solr Wiki:

Collections are made up of one or more shards. Shards have one or more replicas. Each replica is a core. A single collection represents a single logical index.