| Article Index |
|---|
| RAID 101 - Introduction to RAID |
| RAID 0 - Striping |
| RAID 1 - Mirroring |
| RAID 3 - Striping with Dedicated Parity |
| RAID 5 - Striping with Distributed Parity |
| Wrap-Up |
| All Pages |
Introduction to RAID 101
Welcome to this introduction to RAID technologies. Once the domain of expensive UNIX servers, mainframes and Storage Area Networks (SANs), RAID is now available in most computers either in software (Windows, Linux and BSD all support RAID in one form or another) or in hardware (such as a dedicated RAID card). A third category is "Driver-Based" RAID, where there is hardware configuration but the hard work is performed by a software driver rather than a hardware chip or the operating system.
RAID takes 2 or more hard disks and combines them using special algorithms so that to the user, they appear to be a single disk.
Why do we use RAID?
RAID generally provides one or both of the following three key benefits:
- Performance: Because RAID uses more than one hard disk at the same time, RAID systems can generally access data faster than a single hard disk;
- Redundancy: Most RAID types are designed so that if one of the disks in the RAID "set" fails, the rest of the disks are able to continue operating and the computer does not crash;
- Size: While the biggest hard disk available at the time of writing this article is a 2TB drive, combining hard disks in a RAID set can create a virtual hard disk that is 10TB, 20TB or even bigger.
Why don't we use RAID everywhere then?
Today, the primary reasons not to have RAID in almost all systems are cost and complexity. Specifically:
- You need to buy at least 2 disks, not just one;
- You need a motherboard that supports RAID, or;
- You need to know how to configure RAID in your operating system;
- RAID is more complex to troubleshoot if something goes wrong. And something will go wrong.
Note: RAID is not backup. Nothing you do with a RAID set can ever protect your data from a virus, prevent someone from deleting the only copy of a photograph, or save the world after you realise you didn't really want to delete that folder after all.
For the purposes of this article, we shall focus on the basic RAID types; the next articles to come in this series, RAID 201 - Advanced RAID I and RAID 202 - Advanced RAID II, discusses more complex configurations.
Let's start our journey through the world of RAID with the most primitive type of RAID, Striping.
RAID 0 - Striping
RAID 0, or Striping, requires at least 2 disks, but many RAID controllers allow the creation of a RAID 0 set from up to 8 or 16 disks.
The RAID 0 set is created using the same amount of space on each disk, and the total size is calculated as the size on each disk times the number of disks in the set. If we had a 250GB disk, a 320GB disk and a 500GB disk, the RAID 0 set would use 250GB on each disk, and the total size for the virtual disk would be 750GB.
RAID 0 is the only form of RAID that offers performance without any form of data protection. RAID 0 takes the data on the computer and lays it out across the disks so that some parts of the data are on one disk, and some parts are on other disks.
For our examples, we will use a small portion of binary data - a stream of zeroes and ones. We'll use a simple pattern of 3 ones followed by 3 zeroes for simplicity:
111000111000111000111000111000111000111000111000
Let's store our data on a RAID 0 set of 4 disks. First, we break up the data into smaller chunks, called stripes, which are always sized as blocks of Kilobytes (210 or 1024 bytes) For our example, we'll use stripes of 16 digits each instead:
1110001110001110
0011100011100011
1000111000111000
Now, we divide each stripe into equal parts, and each part is stored on a different disk, one at a time:
| Disk 1 |
Disk 2 |
Disk 3 |
DIsk 4 |
|
|
|
|
Now let's look at what happens if we lose one of the disks, in this case disk 2, and read our data back from the remaining good disks:
| Disk 1 |
Disk 2 |
Disk 3 |
DIsk 4 |
|
|
|
|
When we read our data we get "111010001110001111100011100000111000" - which is not the right data! We're missing the second chunk of each stripe, amounting to 1/4 of the data. One whole disk's worth of data is simply lost, and as a result all the data on the RAID 0 set is gone.
In summary, RAID 0 offers performance and size, but cannot protect the data from the failure of a disk - hardly redundant, in spite of the name.
Where Does RAID 0 Make Sense
RAID 0 has it's place, perhaps surprising when all things are considered.
RAID 0 is perfect for temporary files, especially the large temporary files created when editing audio and video. It also has a place in corporate systems where performance is more important than safety of the data, although such configurations are rare.
Let's move on and take a look at RAID 1, or as it's sometimes known, mirroring.
RAID 1 - Mirroring
RAID 1, or Mirroring, by definition requires exactly 2 disks, although some advanced controllers, or some fancy manual configuration, can make a multiple mirror set from 3 or more disks too.
The RAID 1 set is created using the same amount of space on each disk, and the total size is the same as the size used on each disk. If we had a 250GB disk and a 500GB disk, the RAID 1 set would use 250GB on each disk, and the total size for the virtual disk would also be 250GB.
RAID 1 offers data protection but with a significant cost - half the disk space is lost for redundancy. Furthermore, when reading data, the disk controller can read from either of the disks, which is faster than a single disk, but has to write the data to both disks at the same time, which is slower. RAID 1 can therefore be slower than a single disk for some tasks.
RAID 1 takes the data on the computer and lays it out across the disks so that each part of the data is stored in the same place on each disk.
Here’s our example data - a simple pattern of 3 ones followed by 3 zeroes for simplicity: 111000111000111000111000111000111000111000111000
Let's store the following sentence on a RAID 1 set of 2 disks. RAID 1 sets do not break the data into stripes or chunks - it is simply written to each disk in the same place on each:
| Disk 1 | Disk 2 |
![]() |
|
Now let's look at what happens if we lose one of the disks and try to read our data back from the remaining good disk:
| Disk 1 | Disk 2 |
![]() |
|
We get 111000111000 - none of the data is lost! We can continue using the computer, although it might be prudent to arrange to replace the failed drive sooner rather than later:
In summary, RAID 1 offers redundancy for the data, but at the cost of buying 2GB of disk space for every 1GB of data.
Where Does RAID 1 Make Sense
RAID 1 is a common RAID type, perhaps the most common of all, due to its use for the operating system drive on servers.
RAID 1 is perfect for small data sets, where the amounts of data written are small or in sequence, the amounts of data read are large, and reliability is crucial. Examples are the disks for an operating system, disks for database and email logs, web servers and most importantly, the Windows paging file or Linux swap space.
Let's move on and take a look at RAID 3, Striping with Dedicated Parity.
RAID 3 - Striping with Dedicated Parity
RAID 3, or Striping with Dedicated Parity, requires at least 2 disks for data and one more for the parity information, for a total of 3. Many controllers permit a RAID 3 set to be as many as 16 disks - 15 for data and 1 for parity, although in practice a system designer will limit the number of the disks to 3, 5 or 9. This is because the risk of a disk failure increases when you have more disks.
RAID 3 blends the redundancy of RAID 1 with the performance of RAID 0. RAID 3 starts off with a RAID 0 set, as described previously, and adds one more disk that stores a calculated form of the original data blocks, called parity.
The choice of 3, 5 or 9 disks is no coincidence - since the stripe size will be a number of Kilobytes (210 bytes), the number of data disks is chosen to divide the stripe evenly, and 1 more is added for the parity information. For some RAID controllers, this can provide a big performance increase.
The RAID 3 set is created using the same amount of space on each disk, and the total size is the same as the size used on each disk, times the number of data disks (N-1 disks). If we had a 250GB disk, a 320GB disk and a 500GB disk, the RAID 3 set would use 250GB on each disk, and the total size for the virtual disk would also be 500GB.
RAID 3 offers data protection but without the cost of RAID 1 - at most, RAID 3 is 50% more expensive than RAID 0, but that drops to 11% for a 9 disk set. Furthermore, when reading data, the disk controller reads from all of the disks, which is faster than a single disk, but is limited when writing by the parity disk, which is about the same as writing to a single disk.
Here’s our example data - a simple pattern of 3 ones followed by 3 zeroes for simplicity:
111000111000111000111000
Let’s store that data on our RAID 3 set. Just like RAID 0, we split the data into stripes:
11100011
10001110
00111000
Disk 1 and Disk 2 are our data disks, and Disk P is our parity disk. First we write the data to the data disks, half on each, and then we calculate parity and write it to the third disk: The parity disk is shown in green and the data disks in blue.
In this example, each digit of P is calculated by counting the number of "1" digits in the same place on the data disks. If the number of "1" digits is odd, then we store another 1 on the parity disk to show this. If the number of "1" digits is even, then we store a 0 instead. For the first stripe:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Data Disk 1 |
1 |
1 |
1 |
0 |
| Data Disk 2 |
0 |
0 |
1 |
1 |
| Parity Disk P |
1 |
1 |
0 |
1 |
Then we write that stripe to the first stripe on the disks, and repeat for the other two stripes of data:
| Disk 1 |
Disk 2 |
Disk P |
|
|
|
In practice the disk controller uses an operation called "XOR" for "Exclusive OR", which is a faster way of gathering the same data and calculating the parity – but the result is identical to the example above.
Let’s see what happens if the parity disk fails. Take away Disk P and we have:
| Disk 1 |
Disk 2 |
Disk P |
|
|
|
Our original data is still there on the data disks: "111000111000111000".
Now let's see what happens if we lose a data disk instead:
| Disk 1 |
Disk 2 |
Disk P |
|
|
|
When we try to read our data, we get the first half back, "1110". Then the controller takes that data, compares it to the parity, and applies the same algorithm as before:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Data Disk 1 |
1 |
1 |
1 |
0 |
| Parity Disk P | 1 |
1 |
0 |
1 |
| Data Disk 2 |
0 |
0 |
1 |
1 |
We've recovered our lost data by calculating it from the remaining disks.
So we see that in a RAID 3 set, if any one of the disks fails, the data is safe.
Where Does RAID 3 Make Sense
RAID 3 is a fairly rare RAID type, primarily because the performance of the RAID set during writes is low thanks to the single parity disk. As a result it is usually overlooked for its close cousin, RAID 5.
Still, RAID 3 is perfect for large read-only data sets, or where the amounts of data written are small or insignificant and reliability is crucial. Implementations of RAID 3 are usually special-purpose systems, often databases and data warehouses.
Let's move on and take a look at RAID 5, Striping with Distributed Parity.
RAID 5 - Striping with Distributed Parity
RAID 5, or Striping with Distributed Parity, requires at least 2 disks for data and one more to count for the parity information, for a total of 3. Many controllers permit a RAID 5 set to be as many as 16 disks - 15 for data and 1 for parity, although in practice a system designer will limit the number of the disks to 3, 5 or 9.
RAID 5 blends the redundancy of RAID 1 with the performance of RAID 0, and also attempts to solve the write performance problem exhibited with RAID 3. RAID 5 is similar to RAID 3, except that instead of storing all the parity information on one specific disk, the parity is stored equally on each disk in the set.
The choice of 3, 5 or 9 disks is no coincidence - since the stripe size will be a number of Kilobytes (210 bytes), the number of data disks is chosen to divide the stripe evenly, and 1 more is added for the parity information. For some RAID controllers, this can provide a big performance increase.
The RAID 5 set is created using the same amount of space on each disk, and the total size is the same as the size used on each disk, times the number of data disks (N-1 disks). If we had a 250GB disk, a 320GB disk and a 500GB disk, the RAID 5 set would use 250GB on each disk, and the total size for the virtual disk would also be 500GB.
RAID 5 offers data protection just like RAID 3 - at most, RAID 5 is 50% more expensive than RAID 0, but that drops to 11% for a 9 disk set. Furthermore, when reading data, the disk controller reads from all of the disks, which is faster than a single disk, but unlike RAID 3, writes about the same amount to every disk in the array, so is not limited in performance the same way.
Here’s our example data - a simple pattern of 3 ones followed by 3 zeroes for simplicity:
111000111000111000111000
Let’s store that data on our RAID 5 set. Just like RAID 0, we split the data into stripes:
11100011
10001110
00111000
Just like RAID 3, each digit of P is calculated by counting the number of "1" digits in the same place on the data disks. If the number of "1" digits is odd, then we store another 1 on the parity disk to show this. If the number of "1" digits is even, then we store a 0 instead. For the first stripe:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Disk 1 |
1 |
1 |
1 |
0 |
| Disk 2 |
0 |
0 |
1 |
1 |
| Disk 3 - Parity |
1 |
1 |
0 |
1 |
In practice the disk controller uses an operation called "XOR" for "Exclusive OR", which is a faster way of gathering the same data and calculating the parity - and the result matches the example above.
Now we write the second stripe of data, but this time, the space for the parity block will be located on drive B:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Disk 1 |
1 |
0 |
0 |
0 |
| Disk 2 - Parity | 0 | 1 |
1 |
0 |
| Disk 3 |
1 | 1 |
1 |
0 |
Finally we write the third stripe of data, but this time, the space for the parity block will be located on drive A:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Disk 1 - Parity |
1 | 0 |
1 | 1 |
| Disk 2 | 0 | 0 | 1 | 1 |
| Disk 3 |
1 | 0 | 0 | 0 |
The final state of our disks is shown here, with the data in blue and the parity in green:
| Disk 1 |
Disk 2 |
Disk 3 |
|
|
|
Let's examine what happens when we lose a disk and we try to read the data:
| Disk 1 |
Disk 2 |
Disk 3 |
|
|
|
When we try to read our data, we get the first half back from disk 1, "1110". Then the controller takes that data, compares it to the parity on disk 3, and applies the same process as before:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Disk 1 - Data |
1 | 1 | 1 | 0 |
| Disk 3 - Parity |
1 |
1 | 0 | 1 |
| Disk 2 - Calculated Data |
0 | 0 | 1 | 1 |
Now we read stripe 2, which is missing the parity information - so we have the data already, "10001110". When we read stripe 3, we have the opposite problem - we need to calculate the first part of the data:
| Block 1 |
Block 2 |
Block 3 |
Block 4 |
|
| Disk 1 - Parity | 1 |
0 | 1 | 1 |
| Disk 3 - Data | 1 |
0 | 0 | 0 |
| Disk 2 - Calculated Data |
0 | 0 | 1 | 1 |
And then we append the last part from Disk 3, "1000". We've successfully read or calculated all the data on the RAID set: "111000111000111000111000"
Now we can see that in a RAID 5 set, if one of the disks fails, the data is safe (or at least, can be recalculated).
Where Does RAID 5 Make Sense
RAID 5 is a very common RAID type, primarily because it presents a very good balance between cost and performance, without the drawback of the single parity disk in a RAID 3 set. It does perform faster for reading data than writing, and there are certain specific cases where it will perform badly, but overall balance is excellent.
RAID 5 is used for file servers, large web servers, mail servers, database servers, data warehouses, and for any data sets which do not require high numbers of small data writes.
RAID 101 - Wrap Up
We hope you've enjoyed RAID 101 - Introduction to RAID. We've covered the 4 basic RAID types:
| RAID Level |
Name |
Redundancy |
Great For |
Not Great For |
| RAID 0 |
Striping |
None |
Temporary files |
Data you want to protect |
| RAID 1 |
Mirroring | Yes - 1 Disk |
Operating Systems Database Logs |
Data sets bigger than one disk Large amounts of cheap storage |
| RAID 3 |
Striping with Dedicated Parity |
Yes - 1 Disk |
Large Data Sets that are mostly read from |
Data Sets that are mostly written to |
| RAID 5 |
Striping with Distributed Parity |
Yes - 1 Disk |
Large Data Sets |
Data sets with many small random disk writes |
So what if your data doesn't fit these models? What if you need to survive the simultaneous failure of 2 disks, or handle lots of small random writes? What if you have a system that uses the disks very heavily and you need high performance? We'll look at solutions for these systems in RAID 201 - Advanced RAID Levels I and RAID 202 - Advanced RAID Levels II.
-
|192.101.136.xxx |2009-05-28 06:46:55 doublemint - homebuilt NAS ownerI'd also like to know what the recovery process is like, especially if the RAID controller fails. Can a failed RAID controller be replaced with one of a different brand and still recover? As a novice, the RAID BIOS is a bit confusing, some pointers would be helpful. I had a motherboard with RAID fail and assumed I could take one of my RAID 1 drives, plug it into another PC and read it however I couldn’t, why?
-
|70.17.178.xxx |2009-05-28 13:53:44 zaphod - replyIm pretty sure youd need both HDs for that to work... not just one.
-
|134.211.129.xxx |2009-05-29 17:13:22 David Rawling - Your Mileage Will VaryBasically, the answer is no - you need the same model of RAID controller to replace a failed one.
This applies from the virtually-free Intel RAID controllers on the motherboard to the thousand (and multiple-thousand) dollar add-in cards.
Each controller has its own way of marking the disks as being part of a RAID set - the disk "signature". One brand might write "RAID1-1" to the first disk and "RAID1-2" to the second. Others might use numbers with the brand (MYRAID-716825).
I guess it comes back to the same comment I made before. RAID is not backup. It only protects you from disk failure.
-
|71.198.81.xxx |2009-06-01 02:54:12 doublemint - homebuilt NAS ownerDavid,
While unfortunate there isn't more standardization, I understand why this might be. I can't however understand why this would apply to RAID 1 (mirroring). Why do these drives need to be treated any different than non-RAID drives? Just write the same data two places rather than just one.



