20110530

RAID Concept


What is RAID? 
In 1987, Patterson, Gibson and Katz at the University of California Berkeley, published a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)" . This paper described various types of disk arrays, referred to by the acronym RAID. The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives which yields performance exceeding that of a Single Large Expensive Drive (SLED). Additionally, this array of drives appears to the computer as a single logical storage unit or drive.

RAID stands for Redundant Array of Independent Disks and it basically involves 
combining two or more drives together to improve the performance and the fault tolerance. 
Combining two or more drives together also offers improved reliability and larger data volume sizes.A RAID distributes the data across several disks and the operating system considers this array as a single disk. 
Using Multiple Hard Drives for Performance and Reliability.

Types of RAID :

RAID 0 - Striping:
It is the Stripped Disk Array with no fault tolerance and it requires at least 2 drives to be implemented. Due to no redundancy feature, RAID 0 is considered to be the lowest ranked RAID level. Striped data mapping technique is implemented for high performance at low cost. 
The I/O performance is also improved as it is loaded across many channels. Regeneration, Rebuilding and functional redundancy are some salient features of RAID 0.

Raid1: Disk mirroring is the basic function occurs.
      1. It creats exact copy of one physical harddisk to another.
      2. It uses one controller
      3. If one drive fails system will boot with other drive.
      4. slow performance.
      5. Increased cost every mirror must be seperate physical device thus you must purchase   twice the storage capacity.
      6. no protection from controller failure.: if controller failure , the mirrored drives as just   inaccessible.

RAID 0+1:
It is the RAID array providing high data transference performance with at least 4 disks needed to implement the RAID 0+1 level. 
It's a unique combination of stripping and mirroring with all the best features of RAID 0 and RAID 1 included such as fast data access and fault tolerance at single drive level. The multiple stripe segments have added high I/O rates to the RAID performance and it is the best solution for maximum reliability.

RAID 2 (ECC):
It is the combination of Inherently Parallel Mapping and Protection RAID array. It's also known as ECC RAID because each data word bit is written to data disk which is verified for correct data or correct disk error when the RAID disk is read. Due to special disk features required, RAID 2 is not very popular among the corporate data storage masses, despite the extremely high data transference rates.

RAID 3:
RAID 3 works on the Parallel Transfer with Parity technique. The least number of disks required to implement the RAID array is 3 disks. 
In the RAID 3, data blocks are striped and written on data drives and then the stripe parity is generated, saved and afterwards used to verify the disk reads. Read and write data transfer rate is very high in RAID 3 array and disk failure causes insignificant effects on the overall performance of the RAID.

RAID 4:
RAID 4 requires a minimum of 3 drives to be implemented. It is composed of independent disks with shared parity to protect the data. Data transaction rate for Read is exceptionally high and highly aggregated. Similarly, the low ratio of parity disks to data disks indicates 
high efficiency.

RAID 5:
RAIDS 5 is Independent Distributed parity block of data disks with a minimum requirement of at least 3 drives to be implemented and N-1 array capacity. It helps in reducing the write inherence found in RAID 4. RAID 5 array offers highest data transaction Read rate, medium data transaction Write rate and good cumulative transfer rate.

Raid 5: Disk stripping with parity.It is completely Software based and higly secured  technology.
      1. Raid 5 is in-expensive, but very convinient.
      2.The parity information is stored distributed in different disk .
      3.If one of the disk fails , it is hot swappable.
      4.Parity information is stored in other Harddisk is automatically 
 updated to failed one.
      5.If more than one disk fails, it should be restored from backup.


RAID 6:
RAIDS 6 is Independent Data Disk array with Independent Distributed parity. It is known to be an extension of RAID level 5 with extra fault tolerance and distributed parity scheme added. RAID 6 is the best available RAID array for mission critical applications and data storage needs, though the controller design is very complex and overheads are extremely high.

RAID 7:
RAID 7 is the Optimized Asynchrony array for high I/O and data transfer rates and is considered to be the most manageable RAID controller available. The overall write performance is also known to be 50% to 90% better and improved than the single spindle 
array levels with no extra data transference required for parity handling. RAID 7 is registered as a standard trademark of Storage Computer Corporation.

RAID 10:
RAID 10 is classified as the futuristic RAID controller with extremely high Reliability and performance embedded in a single RAID controller. 
The minimum requirement to form a RAID level 10 controller is 4 data disks. The implementation of RAID 10 is based on a striped array  of RAID 1 array segments, with almost the same fault tolerance level as RAID 1. RAID 10 controllers and arrays are suitable for 
uncompromising availability and extremely high throughput required systems an environment.

With all the significant RAID levels discussed here briefly, another important point to add is that whichever level of RAID is used regular and consistent data backup maintenance using tape storage is must as the regular tape storage is best media to recover from lost data scene.


RAID 1:
RAID 1 uses mirroring to write the data to the drives. It also offers fault tolerance from the disk errors and the array continues to operate efficiently as long as at least one drive is functioning properly.

The trade-off associated with the RAID 1 level is the cost required to purchase the additional disks to store data.

RAID 2:
It uses Hamming Codes for error correction. In RAID 2, the disks are synchronized and they're striped in very small stripes. It requires multiple parity disks.

RAID 3:
This level uses a dedicated parity disk instead of rotated parity stripes and offers improved performance and fault tolerance. 
The benefit of the dedicated parity disk is that the operation continues without parity if the parity drive stops working during the operation.

RAID 4:
It is similar to RAID 3 but it does block-level stripping instead of the byte-level stripping and as a result, a single file can be stored in blocks. RAID 4 allows multiple I/O requests in parallel but the data transfer speed will be less. 
Block level parity is used to perform the error detection.

RAID 5:
RAID 5 uses block-level stripping with distributed parity and it requires all drives but one to be present to operate correctly. 
The reads are calculated from the distributed parity upon the drive failure and the entire array is not destroyed by a single drive failure. 
However, the array will lose some data in the event of the second drive failure.

The above standard RAID levels can be combined together in different ways to create Nested RAID Levels which offer improved performance.
 Some of the known Nested RAID Levels are -

      RAID 0+1
      RAID 1+0
      RAID 3+0
      RAID 0+3
      RAID 10+0
      RAID 5+0
      RAID 6+0

Hardware RAID
  • A conventional Hardware RAID consists of a RAID controller that is installed into the PC or server, and the array drives are connected to it.
  • In high end external intelligent RAID controllers, the RAID controller is removed completely from the system to a separate box. Within the box the RAID controller manages the drives in the array, typically using SCSI, and then presents the logical drives of the array over a standard interface (again, typically a variant of SCSI) to the server using the array.


Software RAID:

In software RAID a software does the work of RAID controller in place of a hardware. Instead of using dedicated hardware controllers or intelligent boxes, we use a particular software that manages and implements RAID array with a system software routine.

Comparing Hardware RAID & Software RAID

Portability

  • OS Portability

    Software RAID Is Not Usable Across Operating Systems. So You Cannot, For Example, Use Two RAID Disks Configured In Linux With Windows XP And Vice Versa. This Is Big Issue For Dual Booting Systems Where You Will Either Have To Provide A Non-RAID Disk For Data Sharing Between The Two Operating System Or Use Hardware RAID Instead.
    As You Know, Dual Booting Is Mostly Obsolete These Days As You Can Run Multiple Operating Systems On The Same Machine Using Virtualization Software Like Vmware & Xen.
  • Hardware Portability


    Software RAID
    In Linux You Can Mirror Two Disks Using RAID-1, Including The Boot Partition. If For Any Reason The Hardware Goes Bad, You Can Simply Take The Harddisk To A Different Machine And It Will Just Run Fine On The New Hardware. Also With A RAID-1 Array, Each Of The Harddisk Will Have Full Copy Of The Operating System And Data, Effectively Providing You With Two Backups, Each Of Which Can Be Run From A Different Hardware.
    Unfortunately In Windows It Is Not So Easy To Switch A Operating Systrem From One Hardware To Another, But That Is The Story Of Priprietary Licenses And We Will Keep It For Another Day.

    Hardware RAID
    Hardware RAID Is Not So Portable. You Cannot Just Swap The Hardware To A Different Machine And Hope It Will Work. You Have To Find A Motherboard Which Is Compatible With Your RAID Controller Card; Otherwise You Can Kiss Your Data Goodbye. Also There Is A Bigger Issue Of Problem With The RAID Controller Itself. If It Fails And You Cannot Get The Same Controller From The Market (And It Has Probably Become Obsolete By Then), Then Again You Can Kiss Your Data Goodbye.

Easy & Speedy Recovery

It may seem trivial but trust me, for a busy and loaded server, an easy and speedy recovery,  that too inside the operating system without having to reboot is what one can dream of.  Imagine if during the peak hours, your RAID system crashes and you are forced to reboot  the machine to make changes to it to restore your data! Software RAID's like in Linux, not  only continues working even when the hardware has failed, but also starts restoring the  RAID array, should any spare disk be available. All of these happens in the background and  without affecting your users. This is where software RAID shines brilliantly.

System Performance

Software RAID uses the CPU to do the work of the RAID controller. This is why high-end  hardware RAID controller outperforms software RAID, especially for RAID-5, because it  has a high powered dedicated processor. However for low end hardware RAID, the  difference may be neglible to non-existent. In fact it is possible for the software RAID  perform better than low end hardware RAID controller simply because today's desktop's  and workstations are powered by very powerful processors and the task is trivial to them.

Support For RAID Standards

High-end Hardware RAID may be slightly more versatile than Software RAID in support  for various RAID levels. Software RAID is normally support levels 01, 5 and 10 (which is a  combination of RAID 0 and RAID 1) whereas many Hardware RAID controller can also  support esoteric RAID levels such as RAID 3 or RAID 1+0. But frankly who uses them?

Cost

This is where software RAID again scores over hardware RAID. Software RAID is free.  Hardware RAID is moderate to high priced and can put a strain on your budget if deployed  widely.
But over the years the cost of hardware RAID has come down  exponentially. So it may not be too far when more affordable RAID-5 cards will be built-in  on newer motherboards.


Future Proof

Gone are the days when we could associate software RAIDs with bugs and OS problems.  Nowadays software RAIDs are almost flawless. We are using software RAID in linux  operating system for several years and haven't experienced any problem whatsoever. On  the contrary, hardware RAID has a single point of failure and that is its hardware  controller. If it crashes then your only option is to find another equivalent RAID controller  from the market; by this time the model may become obsolete and you may not even find  anything compatible. You are as such faced with the haunting prospect of losing all your  data, should the RAID controller fail. Software RAID will never become obsolete and will  continue to get updated with updated versions of your operating system.

Why Use RAID? Benefits and Costs, Tradeoffs and Limitations
RAID offers many advantages over the use of single hard disks, but it is clearly not for  everyone. The potential for increased capacity, performance and reliability are attractive,  but they come with real costs. Nothing in life is free. In this section I take an overview look  at RAID, to help explain its benefits, costs, tradeoffs and limitations. This should give you a  better idea if RAID is for you, and help you to understand what RAID can do--and what it  can't do.
As you read on, it's essential to keep in mind that with RAID, it's definitely the  case that "the devil is in the details". Most common blanket statements made about RAID  like "RAID improves availability" or "RAID is for companies that need fast database  service" or "RAID level 5 is better than RAID level 0" are only true at best part of the time.  In almost every case, it depends. Usually, what RAID is and what it does for you depends  on what type you choose and how you implement and manage it. For example, for  some applications RAID 5 is better than RAID 0; for others, RAID 0 is vastly superior to  RAID 5! There are situations where a RAID design, hardware and software that would  normally result in high reliability could result instead in disaster if they are not properly  controlled.

RAID Benefits
Alright, let's take a look at the good stuff first. :^) RAID really does offer a wealth of significant advantages that would be attractive to almost any serious PC user . (Unfortunately, there are still those pesky costs ,tradeoffs  and limitations  to be dealt with... :^) ) The degree that you realize the various benefits below does depend on the exact type of RAID that is set up and how you do it, but you are always going to get some combination of the following:
Higher Data Security: Through the use of redundancy, most RAID levels provide  protection for the data stored on the array. This means that the data on the array can  withstand even the complete failure of one hard disk (or sometimes more) without any  data loss, and without requiring any data to be restored from backup. This security feature  is a key benefit of RAID and probably the aspect that drives the creation of more RAID  arrays than any other. All RAID levels  provide some degree of data protection , depending  on the exact implementation, except RAID level 0 .

Fault Tolerance: RAID implementations that include redundancy provide 
a much more reliable overall storage subsystem than can be achieved by a single disk. 
This means there is a lower chance of the storage subsystem as a whole failing 
due to hardware failures. (At the same time though, the added hardware used in 
RAID means the chances of having a hardware problem of some sort 
with an individual component, even if it doesn't take down the storage subsystem, is increased

Improved Availability: Availability refers to access to data. Good RAID systems improve availability both by providing fault tolerance and by providing special features that allow for recovery from hardware faults without disruption. 

Increased, Integrated Capacity: By turning a number of smaller drives into a larger array, 
you add their capacity together (though a percentage of total capacity is lost to overhead or 
redundancy in most implementations). This facilitates applications that require large
 amounts of contiguous disk space, and also makes disk space management simpler. 
Let's suppose you need 300 GB of space for a large database. Unfortunately, no hard disk 
manufacturer makes a drive nearly that large. You could put five 72 GB drives into the system,
 but then you'd have to find some way to split the database into five pieces, and you'd be 
stuck with trying to remember what was were. Instead, you could set up a RAID 0 array 
containing those five 72 GB hard disks; this will appear to the operating system as a single,
 360 GB hard disk! All RAID implementations provide this "combining" benefit, though the 
ones that include redundancy of course "waste" some of the space on that redundant information.

Improved Performance: Last, but certainly not least, RAID systems improve performance by allowing  the controller to exploit the capabilities of multiple hard disks to get around performance-limiting mechanical issues that plague individual hard disks. Different RAID implementations improve performance in different ways and to different degrees, but all improve it in some way.

No comments:

Post a Comment