How Safe Is Your Data?
By Kris Land, CTO, InoStor Corp.
In today’s business world, no issue is more critical to large storage centers than
the preservation and integrity of their data. Government applications are no
different. A small cluster of physical or electrical malfunctions can, in the blink of
an eye, cause more economic damage than a devastating warehouse fire.


Given routine use, every disk will fail eventually. Knowing this, people who
manage multiple-disk storage systems usually try to protect their data with some
form of failure recovery. In most cases, the primary defense against disk failures
dates back to two simple RAID algorithms that were developed over fifteen years
ago - one based on mirroring and the other on parity summing. While it is true
that either of these algorithms can protect against the loss of data should any
single disk fail in a disk array, from the moment of failure until the RAID array is
fully reconstructed a second disk failure can occur at any time.


For several reasons, the likelihood of multiple disk failures is worth considering:

􀂃 An event that damages one disk is likely to damage more than one disk −
such as an environment subject to excess moisture or temperature, a heavy
impact to the storage device, or a voltage spike.

􀂃 Long delays can occur. A failure can happen where a replacement disk is not
available, in a remote location, or during an odd shift. The longer it takes to
restore a failed disk, the greater the odds that another disk will fail during the
unprotected interval. What’s more, operating with a failed disk actually
increases the load and stress level on the surviving disks.

􀂃 The cost of added protection has fallen. Compared with SCSI drives,
IDE/ATA drives have higher capacity but shorter life expectancy, and they
take longer to rebuild – placing larger amounts of data at higher risk of disk
failure. However, since disk space is getting cheaper, the cost of added
protection is also getting cheaper.

􀂃 The more disks in an array, the higher the probability that more than one disk
will be down at any time, especially if all the disks in the array are reaching
their normal life expectancy.

􀂃 People make errors. During the vulnerable period of disk replacement, a
worker can remove the wrong disk.

􀂃 The constant need for available data may be too critical for risk taking. Even
with tape backup, substantial time losses can occur if the network storage
system must be rebuilt from tape, and tape restoration usually cannot restore
changes that were made after the most current tape backup.

The most commonly used mirroring algorithms are RAID-1 and RAID-10. They
are often confused with one another. RAID-1 uses one disk for data and one or
more disks for mirroring, while RAID-10 (sometimes called 0 + 1) uses a set of
mirror disks to back up an equal number of disks with striped data (striping is
used to increase speed over single-disk performance).

The down side with any type of mirroring, as described in more detail below, is
that, while protecting against the random loss of only a single disk, the usable
capacity is reduced to half of the total number of disks in the array. Another way
of looking at this is that the disk space required to make mirrored copies will cost
some multiple of the amount of disk space required for backup protection with a
RAID 5 system that provides the same usable capacity and level of protection.
RAID 5 requires the equivalent of only one disk for parity to protect any number
of disks against a single disk failure. While this is more efficient than mirroring,
there is still the problem that all of the data is lost if any second disk fails before a
degraded RAID is restored.
 

The importance of protecting against multiple disk failures has generated much
experimentation. Linux software RAID can recover after a multiple-disk failure by
layering one RAID array on top of another, combining some mix of mirroring,
parity, and data striping to raise the level of protection. The two combinations
most commonly referenced are RAID 5+1 (a mirrored pair of RAID-5 arrays), and
RAID 5+5 (a RAID-5 each of whose “component disks” is a RAID 5). Several
complex encryption-type algorithms have also been developed to protect against
the loss of more than one disk.
 

The problem with both compound RAIDs and encryption algorithms is that they
lose so much storage space to redundancy and/or so much processing speed to
calculation that they are rarely used or even offered as options with NAS
products – forcing users who need protection to choose either parity or mirroring.
Instead of forcing a pattern of protection (and vulnerability) on the user based on
the capabilities of mirroring or single-disk parity protection, a more ideal form of
data protection would adjust the amount of disk-loss insurance to the user’s
needs. A new product based on this concept already exists. Called RAIDn, this
patented advance in RAID technology allows the user to select a desired amount
of disk-loss insurance, ranging from zero (identical to conventional RAID 0 and
offering no drive-loss data protection) to protection from the loss of any number
of disks. Between these two polar options there is also single disk-loss insurance
(very similar to RAID 5 which allows for one drive to fail with no data loss) and
two disk-loss insurance (currently available only on rare systems where
combination RAIDs such as RAID 1+5 or 51 are offered).
 

While it is possible to achieve protection against the loss of two or even three
disks with conventional RAID combinations, RAIDn can recover from multiple,
concurrent failures without a substantial sacrifice of usable disk space or a
significant decline in performance.
 

In a disk array of any size, the number of random failures a user can recover
from with RAIDn always equals the amount of disk space reserved for insurance
data (parity). The RAIDn insurance data can be used, if needed, to reconstruct
the data from any random set of lost disks. For example, to protect against the
 

Figure 1: Stripped RAIDn parity insurance
loss of up to two concurrent
disk failures, a
user reserves the
equivalent space of two
disks for RAIDn
insurance data, which
are actually striped
across all of the disks
in the array, as shown
in Figure 1.
 

Figure 2 shows how
RAIDn uses far less
disk space for its
parity insurance than
RAID 5+1 uses to
combine mirroring
with parity. The added
redundancy required
by compound
algorithms like RAID
5+1 is wasted disk
space. RAIDn has
made all compound
RAID algorithms
obsolete.
Figure 2: Comparing RAIDn to RAID 5 + 1
 

Roughly 80% of the RAID-5 arrays used in today’s data applications have one or
more “Hot Spares.” These drives run all day while providing no direct benefit to
the drive’s associated array. The result is a potentially dangerous false sense of
security. If a second drive fails, or if the wrong drive is removed accidentally
while the array is rebuilding, all of the data on that volume is lost! RAIDn
technology allows the user to set the insurance to 1 + (# of hot spares) without
the need to purchase additional drives. As a result, the RAIDn array can tolerate
a higher total number of random simultaneous drive failures without data loss.
 

Any number of hot spares can also be used with RAIDn, allowing an equal
number of failed disks to rebuild automatically without the need for human
intervention. However, if the purpose of using hot spares is to quickly restore
disk-loss protection to a failed RAID, it should be noted that with RAIDn disk-loss
protection never ceases even temporarily if a single disk should fail. Raising the
level of RAIDn insurance provides better protection than adding hot spares.
 

Almost the entire remainder of the RAID market consists of RAID 10 (or RAID
1+0) systems. Where system performance and safety are of foremost concern,
RAID 10 systems have two serious drawbacks: the cost of mirroring (or inverse
capacity) and the overall level of guaranteed safety.
Mirroring by itself delivers
even less benefit for the
disk space it consumes
than compound RAIDs.
 

Note in Figure 3 how RAID
10 not only requires more
drives than RAIDn for the
same amount of storage
capacity, but it also offers
less protection. In any
context where RAID-10 or
a compound RAID is worth
considering, RAIDn would
be far more economical.
For single disk-loss
protection, RAID 5 would
be a better choice. To
match the capacity of
RAID-5, any system that
 

Figure 3: Comparing RAIDn to RAID 10 mirroring
uses mirroring, such as the one shown here, will require twice the number of
drives for protection, less one. For example, at today’s price of $844.00 per
181GB drive, a two-terabyte RAID-5 system would cost $10,128. Matching the
storage capacity of that same two-terabyte system using RAID 10 would cost
$18,568, without gaining any write performance benefits from the extra ten
drives.
 

Many people think that the drive redundancy is much greater when every drive is
mirrored. But that is not the case. It is true that, if the entire top half or the entire
bottom half of the mirrored drives failed together, the data could be restored; but
the entire array is lost if any mirrored pair of drives should fail. It is a very risky
gamble to bet on which drive will or will not fail.
 

As can be seen in the examples above, current mirroring and parity summing
techniques, even when joined into compound RAIDs, have significant draw backs
which force RAID users to make very hard choices. This is sometimes called the
RAID Triangle, as shown in Table 1 below.

RAID Type Pro’s Con’s
 

RAID 0 Fast Reads & Writes,
Largest Capacity
No Safety if any drive fails
RAID 3,4,5 1 Drive-loss Insurance,
Capacity= All Drives - 1
Much Slower than RAID 0
RAID 1+0 (10) Fast Reads, ½ Write perf. 1
Drive-loss Insurance
Double cost, or half usable
Capacity
 

RAID 1+5 (15) 3 Drive-loss Insurance Double cost, or half usable
Capacity, Slowest perf.
Table 1: The RAID triangle
Table 1 shows that conventional RAID choices provide a very coarse level of
control. And since the compound versions of conventional RAIDs are not even
offered as options with most NAS products, users are rarely protected against
the loss of more than one disk. It is also important to note that any disk-loss
insurance above 3 is not possible with conventional RAID combinations. Some
Government sites and banking institutions do have RAID 5+1 for information that
absolutely must remain available, but when RAID 5+1 is compared to RAIDn, the
performance losses and dollar cost of lost capacity is very large.
 

The RAIDn package starts out matching the most popular conventional RAIDs,
but it also offers several new options. As seen in Table 2, RAIDn allows users to
increase the level of disk-loss insurance while consuming far less disk space for
protection than the combination RAIDs. In addition, RAIDn allows for a precise
choice of disk-loss insurance, requiring just one additional disk for each level of
increased protection, a feature unmatched by any conventional alternative.
 

RAIDn Level Pros Cons
 

RAID0 Fast Reads & Writes, Largest
Capacity
No Safety if any drive fails
RAID1 1 Drive-loss Insurance,
Capacity= All Drives – 1, Fast
Reads
Slower than RAID 0 in Writes
RAID2 2 Drive-loss Insurance,
Capacity= All Drives – 2, Fast
Reads
Slower than RAID 0 in Writes
RAID3 3 Drive-loss Insurance,
Capacity= All Drives – 3, Fast
Reads
Slower than RAID 0 in Writes
RAID4 4 Drive-loss Insurance,
Capacity= All Drives – 4, Fast
Reads
Slower than RAID 0 in Writes
RAID5 5 Drive-loss Insurance,
Capacity= All Drives – 5, Fast
Reads
Slower than RAID 0 in Writes
 

Table 2: RAIDn insurance levels and tradeoffs
Another dramatic change that remains hidden until one starts using RAIDn is its
ability to rebuild multiple drives as quickly as RAID 5 can rebuild a single drive.
What’s more, a conversion feature will soon be available that allows the user to
adjust the level of RAIDn insurance to the amount of unused disk space currently
available. Disk space that is wasting for lack of use is converted to increased
backup protection. And should that disk space later be needed for data storage,
the level of protection can then be reduced by the same dynamic process. And at
any point in the conversion process, the data is fully protected against the
possibility of random disk loss.
 

Considering that, for any level of protection a user desires, RAIDn insurance
either matches or exceeds the benefits of conventional RAID, we expect that
RAIDn will soon become the standard against which all new RAID technologies
are compared.