RAID Documentation: Difference between revisions
Line 81: | Line 81: | ||
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli''' | <font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli''' | ||
//spitfire> '''/c1 rescan''' ''will find the replaced drive, assign it to a new Unit u2'' | |||
//spitfire> '''/c1/u2 del''' ''do not use '''remove'''; '''del''' will keep the drive, but un-assign it'' | |||
//spitfire> '''maint rebuild c1 u0 p15''' ''example, to add replaced drive p15 into Unit u0 on Controller c1'' | |||
== Redistributing users across the Raids == | == Redistributing users across the Raids == |
Revision as of 17:37, 20 June 2012
Raid Usage
In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal 32-bit controller seemed to cap the capacity at 2TB.
So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane.
Our Chenbro-chassis in-house-built RAID arrays are found on Spitfire and Hurricane. Here is how we use them now:
Spitfire
Controller /c1 contains two RAID arrays:
- /u0 for /home/users
- /c1/u0 shows up as a single partition /dev/sda1, mounted at /home/users
- formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
- uses XFS filesystem
- /u0 for /home/users
- /u1 for the Gentoo GNU/Linux operating system
- /c1/u1 shows up as three partitions, following our classical layout:
- /dev/sdb1 is mountable at /boot
- /dev/sdb2 is swap
- /dev/sdb3 is /, type ext3
- formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
- /c1/u1 shows up as three partitions, following our classical layout:
- /u1 for the Gentoo GNU/Linux operating system
spitfire ~ # tw_cli /c1 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 OK - - 64K 6053.47 ON OFF u1 RAID-1 OK - - - 139.688 ON OFF VPort Status Unit Size Type Phy Encl-Slot Model ------------------------------------------------------------------------------ p0 OK u1 139.73 GB SATA 0 - WDC WD1500ADFD-00NL p1 OK u1 139.73 GB SATA 1 - WDC WD1500ADFD-00NL p2 OK u0 465.76 GB SATA 2 - WDC WD5000ABYS-01TN p3 OK u0 465.76 GB SATA 3 - ST3500320NS p4 OK u0 465.76 GB SATA 4 - ST3500320NS p5 OK u0 465.76 GB SATA 5 - ST3500320NS p6 OK u0 465.76 GB SATA 6 - ST500NM0011 p7 OK u0 465.76 GB SATA 7 - ST3500320NS p8 OK u0 465.76 GB SATA 8 - ST500NM0011 p9 OK u0 465.76 GB SATA 9 - ST3500320NS p10 OK u0 465.76 GB SATA 10 - ST3500320NS p11 OK u0 465.76 GB SATA 11 - ST3500320NS p12 OK u0 465.76 GB SATA 12 - ST3500320NS p13 OK u0 465.76 GB SATA 13 - ST500NM0011 p14 OK u0 465.76 GB SATA 14 - ST3500320NS p15 OK u0 465.76 GB SATA 15 - ST3500320NS Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 229 06-Nov-2011
Hurricane
Controller /c1 contains two RAID arrays:
- /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
- /c1/u0 shows up as a single partition /dev/sda1, mounted at /mnt/raid
- formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
- uses XFS filesystem
- Usage, and breakdown as of June 2012:
- /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
hurricane raid # cd /mnt/raid/ ; du -h --max-depth=1 15G ./svn 4.0K ./holding 3.5G ./cvs 214G ./projects 763G ./software 995G .
- /u1 for the Gentoo GNU/Linux operating system
- /c1/u1 shows up as three partitions, following our classical layout:
- /dev/sdb1 is mountable at /boot
- /dev/sdb2 is swap
- /dev/sdb3 is /, type ext3
- formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
- /c1/u1 shows up as three partitions, following our classical layout:
- /u1 for the Gentoo GNU/Linux operating system
Musashi
- /mnt/raid4
- Gentoo GNU/Linux Mirror
RAID Maintenance
We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap, and it used to be that the 3Ware controller would then automatically commence a RAID-rebuild. Lately, however, the replaced drive is showing up as a new Unit, u?. Not helpful :-( but here's what to do:
spitfire ~ # tw_cli //spitfire> /c1 rescan will find the replaced drive, assign it to a new Unit u2 //spitfire> /c1/u2 del do not use remove; del will keep the drive, but un-assign it //spitfire> maint rebuild c1 u0 p15 example, to add replaced drive p15 into Unit u0 on Controller c1
Redistributing users across the Raids
To move a user from one raid to another:
- Make sure the user is not logged in anywhere
- Move the files from one raid to another in the appropriate directory for example
- mv /mnt/raid0/home/m/mdeepwel /mnt/raid1/home/m
- Update LDAP with the new root location
- In ou=AutoFS,ou=home.users,cn=username,automountInformation, and update to new location. (using phpldapadmin is fine)
- Restart autofs on any computer or service the user could be using, otherwise they will be able to login, but won't have a home directory.
- If they use the cluster, you must restart autofs on each node, or restart the whole cluster.
- Restart teleport, so they can use ssh/ftp to get their files from off site.
- Update amanda to make sure the new user directory is backed up.