Estimate cluster backup size

Follow the calculation explained below for Cassandra and Elasticsearch clusters. The overall size of a cluster backup can be roughly estimated as the sum of your backup estimates for Cassandra or Elasticsearch clusters.

Estimate Cassandra cluster backup size

The estimate of the required backup size is based on the metrics storage size. Typically, it's 20% of the sum of the metrics storage on all the nodes (the number of nodes doesn't affect the formula).

Estimate Elasticsearch cluster backup size

The estimate of the required backup size is based on the ElasticSearch storage size. Typically, it's less than the sum of the ElasticSearch storage on all the nodes.

Example estimate

For example, let's assume that you want to estimate a backup size for 3 node cluster:

Calculate the Elasticsearch storage for the whole cluster.
Check the size of Elasticsearch storage on disk on each node (The size should vary only slightly between the nodes).

$ du -sh /var/opt/desk-managed/elasticsearch/
1.5TB

For the 3-node example cluster, it's 3 times 1.5TB of total Elasticsearch storage:

Elasticsearch storage = 3 * 1.5TB
Elasticsearch storage = 4.5TB

Calculate the Cassandra storage for the whole cluster.
Check the size of metrics storage on disk on each node (The size should vary only slightly between the nodes).

$ du -sh /var/opt/desk-managed/cassandra/
885GB

For the 3-node example cluster, it is 3 times 885GB of total Cassandra storage:

Cassandra storage = 3 * 885GB
Cassandra storage = 2.6TB

Calculate the backup estimate according to this formula:
Backup estimate = Elasticsearch storage + 20% of Cassandra storage

 Backup estimate = 4.5TB + (2.6TB * 0.2)
 Backup estimate = 5.02TB

The minimum size you should provision for this backup example is 5.02TB.

Estimate accuracy

This estimate won't be accurate if it was calculated on a new cluster that just started storing data. As the storage on disk grows, so will the size of the backup.