RAID Documentation
Raid Usage
In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal controller seemed to cap the capacity at 2TB.
So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane.
Our Chenbro-chassis in-house-built RAID arrays are found on Spitfire and Hurricane. Here is how we use them now:
Spitfire
Controller /c1 contains two RAID arrays:
- /u0 for /home/users
- /c1/u0 shows up as a single partition /dev/sda1, mounted at /home/users
- formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
- uses XFS filesystem
- /u0 for /home/users
- /u1 for the Gentoo GNU/Linux operating system
- /c1/u1 shows up as three partitions, following our classical layout:
- /dev/sdb1 is mountable at /boot
- /dev/sdb2 is swap
- /dev/sdb3 is /, type ext3
- formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
- /c1/u1 shows up as three partitions, following our classical layout:
- /u1 for the Gentoo GNU/Linux operating system
spitfire ~ # tw_cli /c1 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 OK - - 64K 6053.47 ON OFF u1 RAID-1 OK - - - 139.688 ON OFF VPort Status Unit Size Type Phy Encl-Slot Model ------------------------------------------------------------------------------ p0 OK u1 139.73 GB SATA 0 - WDC WD1500ADFD-00NL p1 OK u1 139.73 GB SATA 1 - WDC WD1500ADFD-00NL p2 OK u0 465.76 GB SATA 2 - WDC WD5000ABYS-01TN p3 OK u0 465.76 GB SATA 3 - ST3500320NS p4 OK u0 465.76 GB SATA 4 - ST3500320NS p5 OK u0 465.76 GB SATA 5 - ST3500320NS p6 OK u0 465.76 GB SATA 6 - ST500NM0011 p7 OK u0 465.76 GB SATA 7 - ST3500320NS p8 OK u0 465.76 GB SATA 8 - ST500NM0011 p9 OK u0 465.76 GB SATA 9 - ST3500320NS p10 OK u0 465.76 GB SATA 10 - ST3500320NS p11 OK u0 465.76 GB SATA 11 - ST3500320NS p12 OK u0 465.76 GB SATA 12 - ST3500320NS p13 OK u0 465.76 GB SATA 13 - ST500NM0011 p14 OK u0 465.76 GB SATA 14 - ST3500320NS p15 OK u0 465.76 GB SATA 15 - ST3500320NS Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 229 06-Nov-2011
Hurricane
Controller /c1 contains two RAID arrays:
- /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
- /c1/u0 shows up as a single partition /dev/sda1, mounted at /mnt/raid
- formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
- uses XFS filesystem
- Usage, and breakdown as of June 2012:
- /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
hurricane raid # cd /mnt/raid/ ; du -h --max-depth=1 15G ./svn 4.0K ./holding 3.5G ./cvs 214G ./projects 763G ./software 995G .
- /u1 for the Gentoo GNU/Linux operating system
- /c1/u1 shows up as three partitions, following our classical layout:
- /dev/sdb1 is mountable at /boot
- /dev/sdb2 is swap
- /dev/sdb3 is /, type ext3
- formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
- /c1/u1 shows up as three partitions, following our classical layout:
- /u1 for the Gentoo GNU/Linux operating system
Musashi
Still using the older external Arena EX3 RAID-array.
- shows up as a single-partition /dev/sdc1 mounted on /mnt/raid4, type XFS
- /mnt/raid4/mirror is bind-mounted to /export/mirror where it is NFS-exported
- holds our Gentoo GNU/Linux Mirror, for local Portage/distfile sync
Hercules
- to be added
RAID Maintenance
We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap. Typically, to replace a bad hard drive, you would log onto the afflicted server and type the following:
spitfire ~ # tw_cli //spitfire> /c1 show
That would produce a list of statuses for each drive in the array and clearly identify the bad drive. Once you know the name of the bad drive (eg: P15), you remove it (P15) from the array, and replace it with a new drive. It will automatically commence a RAID-rebuild. You can use /c1 show later on to check on the array status.
Lately, however, the replaced drive is showing up as a new Unit, u?. Not helpful :-( but here's what to do:
spitfire ~ # tw_cli //spitfire> /c1 rescan will find the replaced drive, assign it to a new Unit u2 //spitfire> /c1/u2 del do not use remove; del will keep the drive, but un-assign it //spitfire> maint rebuild c1 u0 p15 example, to add replaced drive p15 into Unit u0 on Controller c1
Some tips:
- tw_cli doesn't need to be typed on a line by itself. It can be combined with parameters such as: tw_cli /c1 show
- The /c1 argument specifies the RAID controller. On some servers, such as spitfire, c1 is the correct controller identifier. But other servers might use c0, or c2, etc. So, when using the /c1 show command, don't be surprised if you get an error, saying that the controller isn't found. You can go ahead and try a different identifier, such as c0.
Change stripe size on an existing RAID-5 "live" system:
spitfire ~ # tw_cli /c1/u0 migrate type=raid5 stripe=256
Update the firmware using the command-line - first download (and unzip if necessary) the firmware file. This goes pretty quickly (less than a minute) and will not take effect until after reboot:
spitfire ~ # tw_cli /c1 update fw=~/prom0006.img