RAID Documentation: Difference between revisions
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Raid Usage == | == Raid Usage == | ||
In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal | In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal controller seemed to cap the capacity at 2TB. | ||
So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane. | So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane. | ||
Line 84: | Line 84: | ||
== RAID Maintenance == | == RAID Maintenance == | ||
We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap, and it | We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap. Typically, to replace a bad hard drive, you would log onto the afflicted server and type the following: | ||
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli''' | |||
//spitfire> '''/c1 show''' | |||
That would produce a list of statuses for each drive in the array and clearly identify the bad drive. Once you know the name of the bad drive (eg: P15), you remove it (P15) from the array, and replace it with a new drive. It will automatically commence a RAID-rebuild. You can use '''/c1 show''' later on to check on the array status. | |||
Lately, however, the replaced drive is showing up as a new Unit, '''u?'''. Not helpful :-( but here's what to do: | |||
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli''' | <font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli''' | ||
Line 90: | Line 98: | ||
//spitfire> '''/c1/u2 del''' ''do not use '''remove'''; '''del''' will keep the drive, but un-assign it'' | //spitfire> '''/c1/u2 del''' ''do not use '''remove'''; '''del''' will keep the drive, but un-assign it'' | ||
//spitfire> '''maint rebuild c1 u0 p15''' ''example, to add replaced drive p15 into Unit u0 on Controller c1'' | //spitfire> '''maint rebuild c1 u0 p15''' ''example, to add replaced drive p15 into Unit u0 on Controller c1'' | ||
Some tips: | |||
* '''tw_cli''' doesn't need to be typed on a line by itself. It can be combined with parameters such as: '''tw_cli /c1 show''' | |||
* The '''/c1''' argument specifies the RAID controller. On some servers, such as spitfire, c1 is the correct controller identifier. But other servers might use c0, or c2, etc. So, when using the '''/c1 show''' command, don't be surprised if you get an error, saying that the controller isn't found. You can go ahead and try a different identifier, such as c0. | |||
Change stripe size on an existing RAID-5 "live" system: | |||
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli /c1/u0 migrate type=raid5 stripe=256''' | |||
Update the firmware using the command-line - first download (and unzip if necessary) the firmware file. This goes pretty quickly (less than a minute) and will '''not''' take effect until after reboot: | |||
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli /c1 update fw=~/prom0006.img''' |
Latest revision as of 23:13, 16 June 2015
Raid Usage
In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal controller seemed to cap the capacity at 2TB.
So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane.
Our Chenbro-chassis in-house-built RAID arrays are found on Spitfire and Hurricane. Here is how we use them now:
Spitfire
Controller /c1 contains two RAID arrays:
- /u0 for /home/users
- /c1/u0 shows up as a single partition /dev/sda1, mounted at /home/users
- formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
- uses XFS filesystem
- /u0 for /home/users
- /u1 for the Gentoo GNU/Linux operating system
- /c1/u1 shows up as three partitions, following our classical layout:
- /dev/sdb1 is mountable at /boot
- /dev/sdb2 is swap
- /dev/sdb3 is /, type ext3
- formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
- /c1/u1 shows up as three partitions, following our classical layout:
- /u1 for the Gentoo GNU/Linux operating system
spitfire ~ # tw_cli /c1 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 OK - - 64K 6053.47 ON OFF u1 RAID-1 OK - - - 139.688 ON OFF VPort Status Unit Size Type Phy Encl-Slot Model ------------------------------------------------------------------------------ p0 OK u1 139.73 GB SATA 0 - WDC WD1500ADFD-00NL p1 OK u1 139.73 GB SATA 1 - WDC WD1500ADFD-00NL p2 OK u0 465.76 GB SATA 2 - WDC WD5000ABYS-01TN p3 OK u0 465.76 GB SATA 3 - ST3500320NS p4 OK u0 465.76 GB SATA 4 - ST3500320NS p5 OK u0 465.76 GB SATA 5 - ST3500320NS p6 OK u0 465.76 GB SATA 6 - ST500NM0011 p7 OK u0 465.76 GB SATA 7 - ST3500320NS p8 OK u0 465.76 GB SATA 8 - ST500NM0011 p9 OK u0 465.76 GB SATA 9 - ST3500320NS p10 OK u0 465.76 GB SATA 10 - ST3500320NS p11 OK u0 465.76 GB SATA 11 - ST3500320NS p12 OK u0 465.76 GB SATA 12 - ST3500320NS p13 OK u0 465.76 GB SATA 13 - ST500NM0011 p14 OK u0 465.76 GB SATA 14 - ST3500320NS p15 OK u0 465.76 GB SATA 15 - ST3500320NS Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 229 06-Nov-2011
Hurricane
Controller /c1 contains two RAID arrays:
- /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
- /c1/u0 shows up as a single partition /dev/sda1, mounted at /mnt/raid
- formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
- uses XFS filesystem
- Usage, and breakdown as of June 2012:
- /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
hurricane raid # cd /mnt/raid/ ; du -h --max-depth=1 15G ./svn 4.0K ./holding 3.5G ./cvs 214G ./projects 763G ./software 995G .
- /u1 for the Gentoo GNU/Linux operating system
- /c1/u1 shows up as three partitions, following our classical layout:
- /dev/sdb1 is mountable at /boot
- /dev/sdb2 is swap
- /dev/sdb3 is /, type ext3
- formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
- /c1/u1 shows up as three partitions, following our classical layout:
- /u1 for the Gentoo GNU/Linux operating system
Musashi
Still using the older external Arena EX3 RAID-array.
- shows up as a single-partition /dev/sdc1 mounted on /mnt/raid4, type XFS
- /mnt/raid4/mirror is bind-mounted to /export/mirror where it is NFS-exported
- holds our Gentoo GNU/Linux Mirror, for local Portage/distfile sync
Hercules
- to be added
RAID Maintenance
We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap. Typically, to replace a bad hard drive, you would log onto the afflicted server and type the following:
spitfire ~ # tw_cli //spitfire> /c1 show
That would produce a list of statuses for each drive in the array and clearly identify the bad drive. Once you know the name of the bad drive (eg: P15), you remove it (P15) from the array, and replace it with a new drive. It will automatically commence a RAID-rebuild. You can use /c1 show later on to check on the array status.
Lately, however, the replaced drive is showing up as a new Unit, u?. Not helpful :-( but here's what to do:
spitfire ~ # tw_cli //spitfire> /c1 rescan will find the replaced drive, assign it to a new Unit u2 //spitfire> /c1/u2 del do not use remove; del will keep the drive, but un-assign it //spitfire> maint rebuild c1 u0 p15 example, to add replaced drive p15 into Unit u0 on Controller c1
Some tips:
- tw_cli doesn't need to be typed on a line by itself. It can be combined with parameters such as: tw_cli /c1 show
- The /c1 argument specifies the RAID controller. On some servers, such as spitfire, c1 is the correct controller identifier. But other servers might use c0, or c2, etc. So, when using the /c1 show command, don't be surprised if you get an error, saying that the controller isn't found. You can go ahead and try a different identifier, such as c0.
Change stripe size on an existing RAID-5 "live" system:
spitfire ~ # tw_cli /c1/u0 migrate type=raid5 stripe=256
Update the firmware using the command-line - first download (and unzip if necessary) the firmware file. This goes pretty quickly (less than a minute) and will not take effect until after reboot:
spitfire ~ # tw_cli /c1 update fw=~/prom0006.img