RAID Documentation: Difference between revisions

From Research
Jump to navigation Jump to search
Mdeepwel (talk | contribs)
No edit summary
 
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Raid Usage ==
== Raid Usage ==
Our raids are split over Yamato and Musashi. Here is how we use them now.
In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays.  These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming.  This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal controller seemed to cap the capacity at 2TB.


==== Yamato ====
So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives.  A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane.
:'''/mnt/raid0'''
:* Research users and projects
:'''/mnt/raid1'''
:* Research users
:'''/mnt/raid2'''
:* CVS and SVN


==== Musashi ====
Our Chenbro-chassis in-house-built RAID arrays are found on Spitfire and Hurricane. Here is how we use them now:
:'''/mnt/raid3'''
:* x
:'''/mnt/raid4'''
:* Mirror, distfiles, iso's, etc.


== Redistributing users across the Raids ==
==== Spitfire ====
Controller '''/c1''' contains two RAID arrays:
:*'''/u0''' for '''/home/users'''
:**'''/c1/u0''' shows up as a single partition '''/dev/sda1''', mounted at '''/home/users'''
:**formed from 14x 500GB drives, occupying physical slots p2-p15.  RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
:**uses XFS filesystem
<br>
:*'''/u1''' for the Gentoo GNU/Linux operating system
:**'''/c1/u1''' shows up as three partitions, following our classical layout:
:***'''/dev/sdb1''' is mountable at '''/boot'''
:***/dev/sdb2 is swap
:***'''/dev/sdb3''' is '''/''', type ext3
:**formed from 2x 150GB drives, occupying physical slots p0-p1.  RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli /c1 show '''
Unit  UnitType  Status        %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    OK            -      -      64K    6053.47  ON    OFF   
u1    RAID-1    OK            -      -      -      139.688  ON    OFF   
VPort Status        Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK            u1  139.73 GB SATA  0  -            WDC WD1500ADFD-00NL
p1    OK            u1  139.73 GB SATA  1  -            WDC WD1500ADFD-00NL
p2    OK            u0  465.76 GB SATA  2  -            WDC WD5000ABYS-01TN
p3    OK            u0  465.76 GB SATA  3  -            ST3500320NS
p4    OK            u0  465.76 GB SATA  4  -            ST3500320NS
p5    OK            u0  465.76 GB SATA  5  -            ST3500320NS
p6    OK            u0  465.76 GB SATA  6  -            ST500NM0011
p7    OK            u0  465.76 GB SATA  7  -            ST3500320NS       
p8    OK            u0  465.76 GB SATA  8  -            ST500NM0011
p9    OK            u0  465.76 GB SATA  9  -            ST3500320NS
p10  OK            u0  465.76 GB SATA  10  -            ST3500320NS
p11  OK            u0  465.76 GB SATA  11  -            ST3500320NS
p12  OK            u0  465.76 GB SATA  12  -            ST3500320NS
p13  OK            u0  465.76 GB SATA  13  -            ST500NM0011
p14  OK            u0  465.76 GB SATA  14  -            ST3500320NS
p15  OK            u0  465.76 GB SATA  15  -            ST3500320NS
Name  OnlineState  BBUReady  Status    Volt    Temp    Hours  LastCapTest
---------------------------------------------------------------------------
bbu  On          Yes      OK        OK      OK      229    06-Nov-2011
<br>


To move a user from one raid to another:
==== Hurricane ====
# Make sure the user is not logged in anywhere
Controller '''/c1''' contains two RAID arrays:
# Move the files from one raid to another in the appropriate directory for example
:*'''/u0''' for '''projects''', which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
#* mv /mnt/raid0/home/m/mdeepwel /mnt/raid1/home/m
:**'''/c1/u0''' shows up as a single partition '''/dev/sda1''', mounted at '''/mnt/raid'''
# Update LDAP with the new root location
:**formed from 14x 500GB drives, occupying physical slots p2-p15.  RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
#*In ou=AutoFS,ou=home.users,cn=username,automountInformation, and update to new location. (using phpldapadmin is fine)
:**uses XFS filesystem
# Restart '''autofs''' on any computer or service the user could be using, otherwise they will be able to login, but won't have a home directory.
:**Usage, and breakdown as of June 2012:
#* If they use the cluster, you must restart autofs on each node, or restart the whole cluster.
<font color=red>hurricane</font> <font color=blue>raid #</font> '''cd /mnt/raid/ ; du -h --max-depth=1'''
#* Restart teleport, so they can use ssh/ftp to get their files from off site.
15G ./svn
# Update amanda to make sure the new user directory is backed up.
4.0K ./holding
3.5G ./cvs
214G ./projects
763G ./software
995G .
<br>
:*'''/u1''' for the Gentoo GNU/Linux operating system
:**'''/c1/u1''' shows up as three partitions, following our classical layout:
:***'''/dev/sdb1''' is mountable at '''/boot'''
:***/dev/sdb2 is swap
:***'''/dev/sdb3''' is '''/''', type ext3
:**formed from 2x 150GB drives, occupying physical slots p0-p1.  RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
 
 
=== Musashi ===
Still using the older external Arena EX3 RAID-array.
:*shows up as a single-partition '''/dev/sdc1''' mounted on '''/mnt/raid4''', type '''XFS'''
:*'''/mnt/raid4/mirror''' is bind-mounted to '''/export/mirror''' where it is NFS-exported
:*holds our Gentoo GNU/Linux Mirror, for local Portage/distfile sync
<br>
 
=== Hercules ===
:*to be added
 
== RAID Maintenance ==
We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails.  These must be replaced promptly, to avoid data-loss!  The Chenbro chassis supports drive-hot-swap. Typically, to replace a bad hard drive, you would log onto the afflicted server and type the following:
 
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli'''
//spitfire> '''/c1 show'''
 
That would produce a list of statuses for each drive in the array and clearly identify the bad drive.  Once you know the name of the bad drive (eg: P15), you remove it (P15) from the array, and replace it with a new drive.  It will automatically commence a RAID-rebuild.  You can use '''/c1 show''' later on to check on the array status.
 
 
Lately, however, the replaced drive is showing up as a new Unit, '''u?'''. Not helpful :-( but here's what to do:
 
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli'''
//spitfire> '''/c1 rescan'''     ''will find the replaced drive, assign it to a new Unit u2''
//spitfire> '''/c1/u2 del'''    ''do not use '''remove'''; '''del''' will keep the drive, but un-assign it''
//spitfire> '''maint rebuild c1 u0 p15'''    ''example, to add replaced drive p15 into Unit u0 on Controller c1''
 
 
Some tips:
* '''tw_cli''' doesn't need to be typed on a line by itself. It can be combined with parameters such as: '''tw_cli /c1 show'''
* The '''/c1''' argument specifies the RAID controller.  On some servers, such as spitfire, c1 is the correct controller identifier.  But other servers might use c0, or c2, etc.  So, when using the '''/c1 show''' command, don't be surprised if you get an error, saying that the controller isn't found.  You can go ahead and try a different identifier, such as c0.
 
 
Change stripe size on an existing RAID-5 "live" system:
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli /c1/u0 migrate type=raid5 stripe=256'''
 
 
Update the firmware using the command-line - first download (and unzip if necessary) the firmware file. This goes pretty quickly (less than a minute) and will '''not''' take effect until after reboot:
<font color=red>spitfire</font> <font color=blue>~ #</font> '''tw_cli /c1 update fw=~/prom0006.img'''

Latest revision as of 23:13, 16 June 2015

Raid Usage

In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal controller seemed to cap the capacity at 2TB.

So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane.

Our Chenbro-chassis in-house-built RAID arrays are found on Spitfire and Hurricane. Here is how we use them now:

Spitfire

Controller /c1 contains two RAID arrays:

  • /u0 for /home/users
    • /c1/u0 shows up as a single partition /dev/sda1, mounted at /home/users
    • formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
    • uses XFS filesystem


  • /u1 for the Gentoo GNU/Linux operating system
    • /c1/u1 shows up as three partitions, following our classical layout:
      • /dev/sdb1 is mountable at /boot
      • /dev/sdb2 is swap
      • /dev/sdb3 is /, type ext3
    • formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
spitfire ~ # tw_cli /c1 show 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    OK             -       -       64K     6053.47   ON     OFF    
u1    RAID-1    OK             -       -       -       139.688   ON     OFF    

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u1   139.73 GB SATA  0   -            WDC WD1500ADFD-00NL
p1    OK             u1   139.73 GB SATA  1   -            WDC WD1500ADFD-00NL
p2    OK             u0   465.76 GB SATA  2   -            WDC WD5000ABYS-01TN
p3    OK             u0   465.76 GB SATA  3   -            ST3500320NS
p4    OK             u0   465.76 GB SATA  4   -            ST3500320NS
p5    OK             u0   465.76 GB SATA  5   -            ST3500320NS
p6    OK             u0   465.76 GB SATA  6   -            ST500NM0011
p7    OK             u0   465.76 GB SATA  7   -            ST3500320NS         
p8    OK             u0   465.76 GB SATA  8   -            ST500NM0011
p9    OK             u0   465.76 GB SATA  9   -            ST3500320NS
p10   OK             u0   465.76 GB SATA  10  -            ST3500320NS
p11   OK             u0   465.76 GB SATA  11  -            ST3500320NS
p12   OK             u0   465.76 GB SATA  12  -            ST3500320NS
p13   OK             u0   465.76 GB SATA  13  -            ST500NM0011
p14   OK             u0   465.76 GB SATA  14  -            ST3500320NS
p15   OK             u0   465.76 GB SATA  15  -            ST3500320NS

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       229    06-Nov-2011


Hurricane

Controller /c1 contains two RAID arrays:

  • /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
    • /c1/u0 shows up as a single partition /dev/sda1, mounted at /mnt/raid
    • formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
    • uses XFS filesystem
    • Usage, and breakdown as of June 2012:
hurricane raid # cd /mnt/raid/ ; du -h --max-depth=1
15G	./svn
4.0K	./holding
3.5G	./cvs
214G	./projects
763G	./software
995G	.


  • /u1 for the Gentoo GNU/Linux operating system
    • /c1/u1 shows up as three partitions, following our classical layout:
      • /dev/sdb1 is mountable at /boot
      • /dev/sdb2 is swap
      • /dev/sdb3 is /, type ext3
    • formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)


Musashi

Still using the older external Arena EX3 RAID-array.

  • shows up as a single-partition /dev/sdc1 mounted on /mnt/raid4, type XFS
  • /mnt/raid4/mirror is bind-mounted to /export/mirror where it is NFS-exported
  • holds our Gentoo GNU/Linux Mirror, for local Portage/distfile sync


Hercules

  • to be added

RAID Maintenance

We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap. Typically, to replace a bad hard drive, you would log onto the afflicted server and type the following:

spitfire ~ # tw_cli
//spitfire> /c1 show

That would produce a list of statuses for each drive in the array and clearly identify the bad drive. Once you know the name of the bad drive (eg: P15), you remove it (P15) from the array, and replace it with a new drive. It will automatically commence a RAID-rebuild. You can use /c1 show later on to check on the array status.


Lately, however, the replaced drive is showing up as a new Unit, u?. Not helpful :-( but here's what to do:

spitfire ~ # tw_cli
//spitfire> /c1 rescan     will find the replaced drive, assign it to a new Unit u2
//spitfire> /c1/u2 del     do not use remove; del will keep the drive, but un-assign it
//spitfire> maint rebuild c1 u0 p15     example, to add replaced drive p15 into Unit u0 on Controller c1


Some tips:

  • tw_cli doesn't need to be typed on a line by itself. It can be combined with parameters such as: tw_cli /c1 show
  • The /c1 argument specifies the RAID controller. On some servers, such as spitfire, c1 is the correct controller identifier. But other servers might use c0, or c2, etc. So, when using the /c1 show command, don't be surprised if you get an error, saying that the controller isn't found. You can go ahead and try a different identifier, such as c0.


Change stripe size on an existing RAID-5 "live" system:

spitfire ~ # tw_cli /c1/u0 migrate type=raid5 stripe=256


Update the firmware using the command-line - first download (and unzip if necessary) the firmware file. This goes pretty quickly (less than a minute) and will not take effect until after reboot:

spitfire ~ # tw_cli /c1 update fw=~/prom0006.img