====== Multipath on Debian ======
===== Installation =====
To make multipath working on Debian, you'll need 'multipath-tools-initramfs' and 'multipath-tools' packages. But as said in 'mulitpath-tools-initramfs' bug's list, you need to correct '/usr/share/initramfs/hooks/multipath_hook'.
When you look at the bug list [[http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=multipath-tools-initramfs;dist=unstable]], you see there are some tools missing.
The first very important things to add in 'multipath_hook' is :
manual_add_modules dm-multipath
manual_add_modules dm-mod
manual_add_modules dm-round-robin
Then you need to add this
for helper in /sbin/mpath_prio_*; do
copy_exec $helper /sbin
done
And finally if you want to use 'alias', add :
copy_exec /etc/multipath.conf /etc/
Optionally you can comment out the line
copy_exec /bin/readlink /bin/
And you're done with it. Your file should looks like :
#!/bin/sh
# The environment contains at least:
#
# CONFDIR -- usually /etc/mkinitramfs, can be set on mkinitramfs
# command line.
#
# DESTDIR -- The staging directory where we are building the image.
#
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
# get pre-requisites
prereqs)
prereqs
exit 0
;;
esac
# You can do anything you need to from here on.
#
# Source the optional 'hook-functions' scriptlet, if you need the
# functions defined within it. Read it to see what is available to
# you. It contains functions for copying dynamically linked program
# binaries, and kernel modules into the DESTDIR.
#
. /usr/share/initramfs-tools/hook-functions
copy_exec /sbin/multipathd /sbin/
copy_exec /sbin/scsi_id /sbin/
copy_exec /sbin/kpartx /sbin/
copy_exec /bin/mountpoint /bin/
copy_exec /sbin/devmap_name /sbin/
copy_exec /sbin/multipath /sbin/
# Modified by tchetch
#copy_exec /bin/readlink /bin/
# Added by tchetch
copy_exec /etc/multipath.conf /etc/
for helper in /sbin/mpath_prio_*; do
copy_exec $helper /sbin
done
manual_add_modules dm-multipath
manual_add_modules dm-mod
manual_add_modules dm-round-robin
mkdir -p $DESTDIR/lib || true
cp /lib/libgcc_s.so.1 $DESTDIR/lib/
exit 0
===== Configuration =====
//This part depends on your hardware, I've been working only with a [[wp>Storage_area_network|SAN]] from [[http://www.ibm.com|IBM]].//
Now you'll need to configure your file '/etc/multipath.conf'. You'll first create alias for your device :
multipaths {
multipath {
wwid 3600a0b8000177d9400002e61463f2ed3
alias system
}
multipath {
wwid 3600a0b8000177bcc0000256645f7f166
alias data
}
}
* ''alias'' : name you want to give to the Logical Drive attached to your [[wp>Blade_server|Blade]].
* ''wwid'' : World Wide ID. It's a unique ID assigned to each Logical Drive.
Now you can configure the some options, like ''devices''. For each devices connected to your system you can define options. I've got only one SAN attached to my system so it's easy :
devices {
device
{
vendor "IBM.*"
product "1722-600"
path_grouping_policy group_by_serial
path_checker tur
path_selector "round-robin 0"
prio_callout "/sbin/mpath_prio_tpc /dev/%n"
failback immediate
features "1 queue_if_no_path"
no_path_retry 300
}
}
* ''vendor'' is the name of the vendor of your system. This will be used to identify your [[wp>Storage_area_network|SAN]]. For [[http://www.ibm.com/|IBM]], ''IBM.*'' works.
* ''product'' product name of your [[wp>Storage_area_network|SAN]]. Mine is a [[http://www-03.ibm.com/systems/storage/disk/ds4000/ds4700/index.html|DS4700]], but in the Storage Manager reports ''Product ID : 1722-600''.
* ''path_grouping_policy''. Depend on how you want to use your SAN. For example ''multibus'' doesn't work on my SAN. I use ''group_by_serial'' because I've seen a document for [[http://www.ibm.com|IBM]] [[wp>Storage_area_network|SAN]] that use this. Other options are ''failover'' and ''multibus''. To find the best one, test (I personnaly have tested all of them, and for me ''group_by_serial'' work best).
* ''path_checker''. Can be ''readsector0'' and ''tur''. On my [[wp>Storage_area_network|SAN]], ''readsector0'' make path switching and so my [[wp>Storage_area_network|SAN]] is not happy with it and it reports a problem.
* ''prio_callout'', this where I've spend much of my time testing. To know which ''prio_callout'' you've got, go to ''/sbin'' and list all the ''mpath_prio_*''. Then test them and choose the one which return the right value, but do the test in the initrd environment, because behavior is different than in initrd. I'll explain more later.
* ''failback'' define when to come back to the original path when it comes up. Set to ''immediate'', a value in second or ''manual'' if you want to disable path failback.
* ''features'' I don't know what it is, but it was used by someone working on [[http://www.ibm.com|IBM]] [[wp>Storage_area_network|SAN]].
* ''no_path_retry'' how many times before failling. Can be a number of try, ''fail'' for immediate failling and ''queue'' to keep trying forever.
Now you can add default values for all the devices :
defaults {
udev_dir /dev
polling_interval 2
default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
user_friendly_names yes
}
* ''udev_dir'' where is the ''devfs''.
* ''polling_interval'' time in second between two check on a path
* ''default_getuid_callout'' command to get ''WWID''.
* ''user_friendly_names'' if no aliases are set, this define if the name choose will be user friendly (''mpathX'') or system friendly (using the ''WWID'' instead).
And finally you should add this, taken from the example file from [[http://www.debian.org|Debian]] :
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
This just set devices that won't be taken into account when building the multipath.
You files should looks like this :
##
## This is a template multipath-tools configuration file
## Uncomment the lines relevent to your environment
##
defaults {
udev_dir /dev
polling_interval 2
default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
user_friendly_names yes
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
devices {
device
{
vendor "IBM.*"
product "1722-600"
path_grouping_policy group_by_serial
path_checker tur
path_selector "round-robin 0"
prio_callout "/sbin/mpath_prio_tpc /dev/%n"
failback immediate
features "1 queue_if_no_path"
no_path_retry 300
}
}
multipaths {
multipath {
wwid 3600a0b8000177d9400002e61463f2ed3
alias system
}
multipath {
wwid 3600a0b8000177bcc0000256645f7f166
alias data
}
}
==== How to get the WWID ====
This is done with ''scsi_id''. For example with sda device, you'll do like this :
/sbin/scsi_id -g -u -s /block/sda
In the actual [[http://www.debian.org/News/2009/20090214|Debian Lenny]], this command has changed. See [[debian/maintenance/upgrade_to_lenny|Lenny Upgrade]] on this wiki for more informations about changes in [[http://www.debian.org/News/2009/20090214|Debian Lenny]].
//Don't ask me why it's ''/block'' and not ''/dev'' !//\\
You might notice something. For example on my system ''sda'' and ''sdc'' report the same ID. That's normal, ''sda'' is the first path and ''sdc'' is the second path, but the logical drive is the same.\\
===== Building initrd =====
Now you're ready to build the ''initrd''. Use, if possible, the same tools that is used by your distribution when upgrading the kernel. For Debian you can do this :
update-initramfs -u -k all
//''linux-image-2.6.18-4-686'' is the package I installed for the kernel.//
===== Modifying your grub/fstab =====
Now you have to change grub and fstab to point to the right device. If you used aliases, your device will likely be accessible with ''/dev/mapper/aliasX'', where alias is the name you choose and X is the partition number.
In Debian don't forget to change the kopt value in ''/boot/grub/menu.lst'', so on the next kernel upgrade it won't break the hard work you've done :
## ## Start Default Options ##
## default kernel options
## default kernel options for automagic boot options
## If you want special options for specific kernels use kopt_x_y_z
## where x.y.z is kernel version. Minor versions can be omitted.
## e.g. kopt=root=/dev/hda1 ro
## kopt_2_6_8=root=/dev/hdc1 ro
## kopt_2_6_8_2_686=root=/dev/hdc2 ro
# kopt=root=/dev/mapper/system1 ro
Run ''update-grub'' to regenerate ''/boot/grub/menu.lst'' with your new options.
In the actual [[http://www.debian.org/News/2009/20090214|Debian Lenny]], the naming scheme has changed. See [[debian/maintenance/upgrade_to_lenny|Lenny Upgrade]] on this wiki for more informations about changes in [[http://www.debian.org/News/2009/20090214|Debian Lenny]].
===== Modifying /boot/grub/device.map =====
When rebuilding grub, it'll use a file called device.map. This file is build during installation of your system and might still contains references to ''/dev/sdX''. So it means that when you use ''update-grub'' with debian, it won't go by the multipath daemon and cause your system switch path. So you should correct entry with, for example, ''(hd0) /dev/mapper/system''.
===== Now reboot =====
When you reboot it might fails and the root file system might not be mounted. So wait until the ''initrd shell'' shows up and try to test behaviour of the different parameters you set before until you have only good values.
For example on my system if I call ''mpath_prio_balance_units'' on the running system, it will return correct value, but in the initrd it won't return anything, so the environment is different, you have to find a solution in ''initrd'' and then adapt your configuration.
//You can always ''chroot'' into your root filesystem from the iniramfs shell. Use it to reconfigure your initrd.//
If your system boot the first time, it means that you're much more lucky than me. Now you can do :
bladeTest:~# multipath -ll
system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600
[size=5.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 0:0:0:0 sda 8:0 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 0:0:1:0 sdc 8:32 [active][ready]
data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600
[size=9.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=6][active]
\_ 0:0:0:1 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 0:0:1:1 sdd 8:48 [active][ready]
And see all your path. For me with a configuration for ''data'' as ''multibus'' would give me this kind of output :
bladeTest:~# multipath -ll
system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600
[size=5.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 0:0:0:0 sda 8:0 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 0:0:1:0 sdc 8:32 [active][ready]
data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600
[size=9.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=7][enabled]
\_ 0:0:0:1 sdb 8:16 [active][ready]
\_ 0:0:1:1 sdd 8:48 [active][ready]
===== Testing configuration =====
When your system boots perfectly, but you want to try other configuration without rebooting every time, just attached another partition to your system and work on it.
When a partition is mounted you cannot change its multipath table, but if it's not mounted you can clear the table with ''multipath -f alias'' and then rebuild a new one with ''multipath alias''.
For example on my test system I've got ''system'' and ''data''. So when the system is running I cannot change ''system'' because this is the root file system, but ''data'' can be used to test.
===== Hot adding host to the system (qlogic) =====
Be careful with that, I've fucked up a bunch of servers with that command (really)
You've got a little script made by qlogic that will scan for new host avaible there : [[http://download.qlogic.com/ms/56615/readme_dynamic_lun_22.html]].
You can do that instead, I find it much better :
echo 1 > /sys/class/fc_host/host/issue_lip
echo "- - -" > /sys/class/scsi_host/host/scan
This was found there : [[http://lists.us.dell.com/pipermail/linux-poweredge/2006-May/025729.html]]
This script will scan new for new host. Then just run :
bladeTest:~# multipath
sdb: checker msg is "tur checker reports path is down"
sdd: checker msg is "tur checker reports path is down"
sdf: checker msg is "tur checker reports path is down"
sdg: checker msg is "tur checker reports path is down"
sdb: checker msg is "tur checker reports path is down"
sdf: checker msg is "tur checker reports path is down"
sdg: checker msg is "tur checker reports path is down"
and then :
bladeTest:~# multipath -ll
mpath2 (3600a0b8000177bcc0000256545f7aa8a) dm-4 IBM,1722-600
[size=5.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=6][enabled]
\_ 0:0:0:2 sdf 8:80 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 0:0:1:2 sdg 8:96 [active][ready]
system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600
[size=5.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 0:0:0:0 sda 8:0 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 0:0:1:0 sdc 8:32 [active][ready]
data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600
[size=9.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=6][enabled]
\_ 0:0:0:1 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 0:0:1:1 sdd 8:48 [active][ready]
As seen above, I'have a new host and I can configure it or just use it with ''/dev/mapper/mpath2'' if it's for a one time use ! Nice !
====== Using XFS ======
===== Resize partition =====
There's a more in-depth document covering online resizing on the wiki : [[debian:maintenance:online_resize|Online resizing]], this document should be used instead of what's below.
In order to resize a partition on the SAN, there is a solution pretty simple. We have data and system as multipath partition, we added 1G to data, so we need to rescan the whole stuff. First check scsi bus on which the partition is on :
bladeTest:/# multipath -ll
sdc: checker msg is "readsector0 checker reports path is down"
sdd: checker msg is "readsector0 checker reports path is down"
system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600
[size=5.0G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 0:0:0:0 sda 8:0 [active][ready]
\_ 0:0:1:0 sdc 8:32 [failed][faulty]
data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600
[size=8.0G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 0:0:0:1 sdb 8:16 [active][ready]
\_ 0:0:1:1 sdd 8:48 [failed][faulty]
So we see that data uses bus 0:0:0:1 and 0:0:1:1. On the SAN side, the resizing process **must** be completed. We assume data is mounted on /srv.
We need to rescan the device like this :
bladeTest:/# echo 1 > /sys/bus/scsi/devices/0\:0\:0\:1/rescan
bladeTest:/# echo 1 > /sys/bus/scsi/devices/0\:0\:1\:1/rescan
Then we umount the partition and rebuild the multipath
bladeTest:/# umount /srv/
bladeTest:/# multipath -f data
bladeTest:/# multipath data
sdc: checker msg is "readsector0 checker reports path is down"
sdd: checker msg is "readsector0 checker reports path is down"
sdd: checker msg is "readsector0 checker reports path is down"
create: data (3600a0b8000177bcc0000256645f7f166) IBM,1722-600
[size=9.0G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 0:0:0:1 sdb 8:16 [undef][ready]
\_ 0:0:1:1 sdd 8:48 [undef][faulty]
bladeTest:/# mount srv/
This process is quite short but we can see that the size went to 9G, so that what we wanted. But if we look at the disk usage we see :
bladeTest:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/system1 4.7G 618M 3.9G 14% /
tmpfs 1015M 0 1015M 0% /lib/init/rw
udev 10M 84K 10M 1% /dev
tmpfs 1015M 0 1015M 0% /dev/shm
/dev/mapper/data 8.0G 384K 8.0G 1% /srv
The partition as not been resize ... Why ? Well if you resize the underlying disk this doesn't mean that the filesystem on it has been resized. If you use filesystem like XFS, you can resize it when mounted :
bladeTest:/# xfs_growfs /srv/
meta-data=/dev/mapper/data isize=256 agcount=11, agsize=196608 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=2097152, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
data blocks changed from 2097152 to 2359296
And that's done. For othe filesystem see filesystem documentation.
===== Filesystem freeze =====
Filesystem freeze are designed to be used with system as snapshot/flashcopy. It makes the filesystem hangs all IO while something is working to backup. The data on the filesystem are not lost and when unfreezing takes action, the system runs normally.
To freeze the filesystem, we just do
xfs_freeze -f /srv
and when we are finished with it, we unfreeze with
xfs_freeze -u /srv
And all it's ok!
====== See also ======
// A lot of link, but I've got over 50 bookmarks just for multipath configuration. I kept only thoses I've been using really//
===== IBM configuration =====
* [[https://www.redhat.com/archives/dm-devel/2007-January/msg00065.html]]
===== Multipath configuration =====
* [[http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=UsageFile]]
* [[http://www.kernel.org/pub/scm/linux/storage/multipath-tools/multipath.conf.annotated]]
* [[http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=DebianInstall]]
* [[http://support.novell.com/techcenter/sdb/en/2005/04/sles_multipathing.html]]
* [[http://www.calivia.com/bk/multipath-tools/multipath-tools-configuration]]
* [[http://christophe.varoqui.free.fr/multipath.html]]
* [[http://storagefoo.blogspot.com/2006/08/linux-native-multipathing-device.html]]
* [[http://mail.digicola.com/wiki/index.php?title=User:Martin:DM_MP]]
* [[http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=FAQ]]
* [[https://www.redhat.com/archives/dm-devel/2007-January/msg00065.html]]
* [[http://www.redhat.com/archives/dm-devel/2006-November/msg00173.html]]