RAID 5 vs. RAID 0+1: In database environments.

Wow, here’s one from the archives; no, quite literally… the archives — back in 2003 I was asked to explain Raid 5 and Raid 0+1 to a client, This was my answer. Enjoy!

Please note: While the bulk of this note is still technically correct, technology has changed. You should discuss all of your options with a trusted adviser (VAR/Integrator, Consultant, or email me at Jerry [at] JerryGilreath.com) before deciding on a RAID technology to use for your specific application.

When considering databases, there are direct performance implications with RAID 5, which are negated with the implementation of RAID 0+1. Let’s talk about these:

RAID 5 is essentially striping with parity, what this means is that if I have five disks to stripe across, one of those disks will be used for parity (actually, parity is spread across all the disks, but the space used for parity is equivalent to one disk.)

Under RAID 5, reads are done at the same speed as RAID 0+1, because there are no computations for either type of RAID (except in certain circumstances, where RAID 0+1 can actually be faster, more about that in a bit.)

Performance issues crop up when you begin to consider writes. Because of the parity computation with RAID 5, partial stripe writes are slow. If my stripe size is 32k, and I want to write 18k to the stripe, that leaves 14k in the original stripe that doesn’t get changed. First I have to read the data from the stripe, alter the stripe in memory, then write the stripe back, and update the parity. So, as you can see, with RAID 5, every write transaction that is not a full stripe write requires one read, a computation, then a write — one transaction for the price of three.

Many of the RAID 5 performance issues are overcome by caching the disk updates in memory, until a complete stripe is available for writing — because a full stripe write requires only a computation for parity, and a write transaction (no read transaction is required). The problem with using RAID 5 caching is that the “trend” is for data to pile up in cache, to eventually be purged in one “fell swoop” to the disks. This can substantially slow down the read and write transactions during that “purge” of data. When the RAID 5 cache accumulates enough data, it can actually overwhelm the disks with this data. The underlying disks need to be able to keep up with the load imposed upon them by the caching controller, or the cache will be overloaded, and must disable itself until the cache purges enough data to the disks to free enough memory to become useful again for future I/Os. This means that all writes become concurrent writes, thus overwhelming the disks even more as the data drains from the cache, and directly from the system into the array. Reads are, of course, also impaired in a similar manner.

RAID 5 takes care of this slow down by using something known as “read caching”, however caching often-read data also has its drawbacks, because you end up using cache memory (which could be used for caching writeable data) for data that is not always accurate. You will not get a high cache “hit rate” (successful reads from cache memory) on random I/O. What this means is that if your database reads are highly erratic (typical in transaction based databases, or in databases that are have small to medium sized data records) you are going to be “thrashing” your read cache, cycling data in and out of cache memory that could better be used to cache write data.

Typically, “read caching” is only useful is when it is used as “read ahead caching”. Meaning that if I am reading data in a stripe, the probability that I will want the data in an adjacent stripe is high if my database records are large, or typically read in sequence. (This is a presumption, and not always true, but it works for this example — actually the probability that I will want data that is linked to this data is high, but my example assumes no fragmentation across the stripe or disk, which is another performance issue altogether.) Again, with random I/O, read ahead caching serves no real purpose, because it will actually impair performance. Little of this cached data is actually used when the data signature is small (again, transaction based databases, or databases that have small to medium sized data records). Read ahead caching could be useful in certain applications, such as FTP servers with large files, or in streaming audio or video servers.

There are also performance issues when a disk is lost. The system can run in a mode that is impaired, which means that your data is still available, however, every time data is read, the missing piece of data is re-computed based on the parity of the data on disk. So, for every read we now have a computation. Also, for every write, we must now omit a significant piece of the information, or find another place to write it. Rebuilds of the underlying file system (when the faulty disk is replaced) is also more difficult, because not only must the system handle its normal load, now it must recalculate the parity for every stripe to replace the missing data, and concurrently update data as it is written to the array. Reads become significantly slower, and writes become excruciatingly slow (thus causing the cache to fill up, and become useless). RAID 5 cannot handle the loss of more than one disk at a time, ever. It is not recommended that a RAID 5 stripe set be expanded across multiple RAID chassis, because the loss of a single power supply will cause the loss of the entire file system. If multiple RAID chaises are available, and RAID 5 is to be used, it is recommended that multiple file systems should be configured.

Data and disk de-fragmentation require an intensive amount of read and writes. Although these are typically done when the system is not busy, on systems with unpredictable load, defragmentation can cause a pile-up of data in the write cache that can impair availability during those sudden “bursts” in performance.

So, now let’s talk about RAID 0+1. Raid 0+1 is called a “nested” or “multiple” RAID level, because it consists of both RAID 0 (striping across disks) and RAID 1 (mirroring of disks). People often make the mistake of interchanging the terms RAID 1+0 (RAID 1/0, RAID 10) and 0+1 (RAID 0/1, RAID 01); these are not the same, and the terms should not be used vicariously. RAID 0+1 is one of the most popular RAID levels, because of its flexibility in both its capabilities as well as its availability.

RAID 0+1 gives both the speed of the RAID 0’s stripe writes (because of the lack of the parity calculation,) and the redundancy of RAID 1’s mirroring. RAID 0+1 consists of two stripe sets that are mirrored, this means that data is first written to one stripe set, then that data is mirrored to another stripe set (usually in another array chassis).

Writes are not typically impaired because the mirrored data is usually replicated across multiple controllers. Under a single controller environment, one half of the available data bandwidth is lost due to the mirrored data being sent to two disks rather than one. Writes are more balanced, and trend charts should show a more “level” data flow than the “bursty” one represented by RAID 5 with caching.

Reads are not impaired at all, because only one set of disks is read from. (Some proprietary hardware based controllers will actually multiplex read requests across the two mirror sets to double the available data bandwidth!) In fact, reads are typically faster and better balanced on a RAID 0+1 (or 1+0) because the system is not busy computing parity, or purging data from the cache.

RAID 0+1 can also take advantage of the read-ahead and read caching, if desired and appropriate for the type of data.

If a disk fails in the RAID 0+1 set, the mirror fails and all writes and reads are delegated to the other completely available set. Like RAID 5, RAID 0+1 cannot handle the loss of more than one disk, however it IS recommended that if more than one RAID chassis is available that mirrors be spanned across them. The loss of a single power-supply will not result in the loss of data, because the second array chassis will still be available, with an intact copy of the data. Rebuilds are much faster in RAID 0+1. With the failure of a single disk, the array will switch the system over to the complete disk set, one mirror set will be “offlined”, or made available for repair. Upon the replacement of the faulty disk, the data will be replicated from the active mirror set to the other. It is worthy of note that in the case of a single controller system, performance will actually increase (!) because of the loss of the mirror set, however redundancy of data will be lost. Rebuilds do not significantly increase the load on the controller, because the writes require no computations — however the bandwidth utilized will be increased. The affect of this bandwidth utilization is typically negated because mirror sets are usually split across two controllers, and reads from the “active” set will usually be done when the system has “idle” time. Priority is given to reads and writes of live data to the primary “active” mirror set.

RAID 1+0, (AKA RAID 10) is mirroring with striping. What this means is that each disk is mirrored to another disk, then a stripe is written across the resulting disk “pairs”. Performance is similar to that found in 0+1, however, due to the complexity of this method, it is not often used. The reason I find it interesting, and worthy of note, is because of the “high availability” designed into this data storage method. If a disk fails in a pair, the mirror set will fail over to the second disk in the set. Rather than losing an entire stripe set (one complete set of disks) only one actual disk is lost! When that disk is replaced, only the data on the single remaining disk in the pair must be replicated, resulting in a much quicker re-build time. The controllers that drive this method are usually more expensive because of this resiliency.

It is also worthy of note that in RAID 1+0, redundancy is king. Due to the fact that data is striped across mirror sets (not mirrored across stripe sets) the loss of a single disk is no issue, nor is the loss of two disks, nor is the loss of one half of the disks. As long as the maximum loss of disks does not exceed one half of the disks, and no two partners in any given “pair” is lost, the system will continue to operate. Yes, the loss of two disks in a pair will cause the loss of the entire array, however the added redundancy due to the ability to withstand a normally catastrophic loss of disks is one reason this method has been successful.

RAID 1+0 is difficult to maintain, because of the massive number of mirror sets involved. I do not normally recommend that this be implemented in software RAID.

So, to summarize:

RAID 5 = n+1 (amount of disks + 1 disk), cost savings, bad for databases because of performance issues. Works well for read only databases, or databases that are not updated very often. Also works well for streaming servers, or servers whose data does not change, and must be served up in large chunks.

RAID 0+1 = n+n (amount of disks + amount of disks), costs more, much better for databases. Not recommended for streaming servers, because of the un-necessary overhead.

RAID 1+0 = n+n often mistaken for RAID 0+1. Typically more expensive and complex configuration, performance should be equivalent to 0+1, but redundancy is improved. See RAID 0+1 for recommended use.

In any instance, there is no significant performance increase in breaking up data across multiple file systems. Database Administrators are known for requesting multiple file systems (/u01, /u02, /u03, etc.) This should be done more for the sake of organization than for performance reasons. Unless the arrays must physically be configured as separate file systems, there is no reason to do so (as in the instance of RAID 5 with multiple chassis).

There is also no reason to set up multiple stripes across a single set of disks, as the loss of a disk will still result in loss of data/performance. Mirroring data across two stripes across the same set of disks is also useless, as the loss of a single disk will cause a complete loss of data and redundancy.

My recommendations: For streaming servers or content servers, RAID 5. For typical databases, and transaction-based systems, RAID 0+1.

Be the first to comment!

Leave a reply

boinkme