Had a couple of recent inquires from the field about estimating OneFS’ protection overhead and usable capacity, so thought it would make an interesting article.


Let’s take, for example, a five node S210 cluster configured with the recommended protection level of +2d:1n and a dataset comprising medium and large files. What sort of usable capacity could be expected?


The protection policy of +2d:1n on this cluster means that it can survive two simultaneous drive failures or one entire node failure without data loss or unavailability.


The chart below answers such storage overhead questions across a range of OneFS protection level options and node counts.


For each field in this chart, the storage overhead numbers are calculated by dividing the sum of the two numbers by the number on the right.


n+m => m/(n+m)


So, for the 5 nodes @ +2d:1n example above, the chart shows that it’s an 8+2 layout (see green highlighted field below):


8+2 => 2/(8+2) = 20%


Number

of nodes

[+1n]

[+2d:1n]

[+2n]

[+3d:1n]

[+3d:1n1d]

[+3n]

[+4d:1n]

[+4d:2n]

[+4n]

3

2 +1

(67%)

4 + 2

(67%)

6 + 3

(67%)

3 + 3

(50%)

8 + 4

(67%)

4

3 +1

(75%)

6 + 2

(75%)

9 + 3

(75%)

5 + 3

(62%)

12 + 4

(75%)

4 + 4

(50%)

5

4 +1

(80%)

8 + 2

(80%)

3 + 2

(60%)

12 + 3

(80%)

7 + 3

(70%)

16 + 4

(80%)

6 + 4

(60%)

6

5 +1

(83%)

10 + 2

(83%)

4 + 2

(67%)

15 + 3

(83%)

9 + 3

(75%)

16 + 4

(80%)

8 + 4

(67%)

 

This translates to 20% protection overhead and 80% usable capacity.


The m+n numbers in each field in the table also represent how files are striped across a cluster for each node count and protection level.


Storage_efficiency_1.png


For example, with +2d:1n protection on a 5-node cluster, OneFS will write a double stripe across all 5 nodes (total of 10 stripe units), and use and eight for data (m) and two of these stripe units for ECC parity (n). This is illustrated in the following diagram:


The general storage efficiency will look something like the percentages in the table below.


Be aware that the estimated storage usable capacity (% value in brackets) is a very rough guide and will vary considerably across different datasets, depending on the quantity of small files, etc.

 


Number

of nodes

[+1n]

[+2d:1n]

[+2n]

[+3d:1n]

[+3d:1n1d]

[+3n]

[+4d:1n]

[+4d:2n]

[+4n]

3

2 +1

(67%)

4 + 2

(67%)

6 + 3

(67%)

3 + 3

(50%)

8 + 4

(67%)

4

3 +1

(75%)

6 + 2

(75%)

9 + 3

(75%)

5 + 3

(62%)

12 + 4

(75%)

4 + 4

(50%)

5

4 +1

(80%)

8 + 2

(80%)

3 + 2

(60%)

12 + 3

(80%)

7 + 3

(70%)

16 + 4

(80%)

6 + 4

(60%)

6

5 +1

(83%)

10 + 2

(83%)

4 + 2

(67%)

15 + 3

(83%)

9 + 3

(75%)

16 + 4

(80%)

8 + 4

(67%)

7

6 +1

(14%)

12 + 2

(86%)

5 + 2

(71%)

15 + 3

(83%)

11 + 3

(79%)

4 + 3

(67%)

16 + 4

(80%)

10 + 4

(71%)

8

7 +1

(87%)

14 + 2

(87.5%)

6 + 2

(75%)

15 + 3

(83%)

13 + 3

(81%)

5 + 3

(62%)

16 + 4

(80%)

12 + 4

(75%)

9

8 +1

(89%)

16 + 2

(89%)

7 + 2

(78%)

15 + 3

(83%)

15 + 3

(83%)

6 + 3

(67%)

16 + 4

(80%)

14 + 4

(78%)

5 + 4

(66%)

10

9 +1

(90%)

16 + 2

(89%)

8 + 2

(80%)

15 + 3

(83%)

15 + 3

(83%)

7 + 3

(70%)

16 + 4

(80%)

16 + 4

(80%)

6 + 4

(60%)

12

11 +1

(8%)

16 + 2

(89%)

10 + 2

(83%)

15 + 3

(83%)

15 + 3

(83%)

9 + 3

(75%)

16 + 4

(80%)

16 + 4

(80%)

6 + 4

(60%)

14

13 +1

(7%)

16 + 2

(89%)

12 + 2

(86%)

15 + 3

(83%)

15 + 3

(83%)

11 + 3

(79%)

16 + 4

(80%)

16 + 4

(80%)

10 + 4

(71%)

16

15 +1

(6%)

16 + 2

(89%)

14 + 2

(87%)

15 + 3

(83%)

15 + 3

(83%)

13 + 3

(81%)

16 + 4

(80%)

16 + 4

(80%)

12 + 4

(75%)

18

16 +1

(6%)

16 + 2

(89%)

16 + 2

(89%)

15 + 3

(83%)

15 + 3

(83%)

15 + 3

(83%)

16 + 4

(80%)

16 + 4

(80%)

14 + 4

(78%)

20

16 +1

(6%)

16 + 2

(89%)

16 + 2

(89%)

16 + 3

(84%)

16 + 3

(84%)

16 + 3

(84%)

16 + 4

(80%)

16 + 4

(80%)

14 + 4

(78%)

30

16 +1

(6%)

16 + 2

(89%)

16 + 2

(89%)

16 + 3 (

84%)

16 + 3

(84%)

16 + 3

(84%)

16 + 4

(80%)

16 + 4

(80%)

14 + 4

(78%)