It is almost impossible not having heard about or not having used LVM: it is one of the pillars of every Linux distribution from decades ago. Almost everyone using Linux has used it to create or modify the basic storage structures of its Linux system. The trouble is that very often people are focused on the specific task they are onto, and neglect the time to investigate its amazing features. The goal of LVM Tutorial - A thorough howto on the Logical Volume Manager is to provide an easy yet comprehensive explanation on the most interesting features of LVM that it is very likely you will need to use sooner or later.
What is the Logical Volume Manager
LVM is a Logical Volume Manager for the Linux operating system developed many years ago by Sistina Software.
A Logical Volume Manager provides a layer of abstraction of the disk storage more resilient than the traditional view of disks and partitions: from the storage consumer perspective, volumes under the control of the Logical Volume Manager can be extended but also shrinked in size.
In addition to that, data blocks can be moved across the storage devices at wish: from the hardware perspective, this enables flushing all the data from a storage device into the others, for example to replace it with a new one with no service outages. Besides these basic features, LVM also supports thin-provisioning, snapshots, the use of goldens and it can be used to support High Available clusters.
Physical Volumes and Volume Groups
The Volume Group (VG) is the highest abstraction level within LVM: it is made of all the available storage coming from a set of storage devices (can either be whole disks or partitions of disks) called Physical Volumes (PV).
Having disks with a single partition dedicated to PV is only cumbersome to maintain: for example, when it comes to increasing the size of the PV, you need to extend the partition along with the disk device itself since the PV is the partition itself. But extending a partition very often requires deleting and recreating it, a task that requires a lot of attention to avoid errors. In this post however i show both methods, since you may need to operate on partition based PV, despite I strongly advise you against working like so.
Logical Volumes
The VG can then be split into partitions called Logical Volumes LV. Of course a system may have more VG with multiple LV each. What is not possible is to share the same PV between VG: a PV belongs to one and only one VG.
You can think of LV as partitions of a huge logical storage (VG) that spans on more devices (PV) that can be either whole disks or partitions of disks. This of course implies that a LV can span across more than one disk, and that of course can have a size up to the sum of all the sizes of the PV.
Technical Details
At the time of writing this post, the upper bound for LV size in LVM2 format volumes (assuming 64-bit arch, 2.6+ kernel) is 8 ExaBytes.
Physical Extent (PE)
The Physical Extents (PE), is the smallest unit of allocatable storage when dealing with LVM: you must carefully chose its size depending on the workload of the Volume Group when creating a VG: if the PE size parameter is omitted, it defaults to 4MB. As you can easily guess, you cannot change the PE size once you create the VG. Whenever a LV is created, extended and even shrink. As soon as a PV is added to a VG, it gets split into a number of chunks of data (the PEs) equal to the size of the PV divided by the size of the chunk. These PEs are then assigned to the pool of free extents.
Logical Extent (LE)
This is the minimum block you can assign to a Logical Volume (LV): Logical Extents (LE) are actually PE taken from the pool of free PE and assigned to a Logical Volume. So you can think of PE and LE as two different ways to refer to the same thing depending on the Logical (LE) or Physical (PE) perspective.
Logical Extent (LE) Allocation And Release
LEs are assigned to the LV or returned to the pool of free LE of the VG.
Physical Extent (PE) Allocation
Concerning which PV is used to physically allocate the underlying PEs of a LV, it is determined by the so called allocation policy: for example linear mapping policy try to assign PE of the same PV that are next one another; striped mapping instead interleaves PE across all the available PV..
Discovering existing LVM
The available LVM configuration is automatically guessed at boot time: the system scans if a PV exists and if at least one is found, it looks for LVM metadata to guess the VG the PV belongs to and the LVs defined inside it.
Despite this being automatically done, it is better to know how to manually launch these scans: you need to know them when troubleshooting boot problems.
The scan for LVM metadata can be accomplished by typing the following three commands in the same sequence as follows:
pvscan
vgscan
lvscan
after running these commands all the available LVM devices gets discovered.
When running pvscan, you must get something like the following message:
PV /dev/sdb lvm2 [10.00 GiB]
PV /dev/sdc lvm2 [10.00 GiB]
Total: 2 [20.00 GiB] / in use: 0 [0 ] / in no VG: 2 [20.00 GiB]
In this example we pvscan found two devices ("/dev/sdb" and "/dev/sdc") of 10GB each in size.
If instead you get the following message:
No matching physical volumes found
it means that your system is not using LVM at all. A typical scenario where this often happens is when using prepackaged Vagrant boxes, since most of the time they only use basic partitions.
Listing existing LVM
Listing Physical Volumes
Type the following command:
pvdisplay
on my system, the output is as follows:
"/dev/sdb" is a new physical volume of "10.00 GiB"
--- NEW Physical volume ---
PV Name /dev/sdb
VG Name
PV Size 10.00 GiB
Allocatable NO
PE Size 0
Total PE 0
Free PE 0
Allocated PE 0
PV UUID H7VllH-l0Lk-cT0B-PCg5-X4w9-LkUp-CeFsdc
"/dev/sdc" is a new physical volume of "10.00 GiB"
--- NEW Physical volume ---
PV Name /dev/sdc
VG Name
PV Size 10.00 GiB
Allocatable NO
PE Size 0
Total PE 0
Free PE 0
Allocated PE 0
PV UUID 0hGOEg-BjSJ-0xPw-UWlc-9O8j-mdob-tDpZgg
it shows that both "/dev/sdb" and /dev/sdc" are brand new PVs, indeed:
- "VG Name" is empty
- every information concerning PE is 0.
you may prefer to get only summary information:
pvs
on my system, the output is as follows:
PV VG Fmt Attr PSize PFree
/dev/sdb lvm2 --- 10.00g 10.00g
/dev/sdc lvm2 --- 10.00g 10.00g
When it comes to scripting, the most convenient command to use is pvs, since it provides convenient command line options to output only specific fields: there are lots of fields by the way. For example you can see alignment information by specifying "-o +pe_start" as follows:
pvs -o +pe_start --units k
Listing Volume Groups
Type the following command:
vgdisplay
on my system, the output is as follows:
--- Volume group ---
VG Name data
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 19.99 GiB
PE Size 4.00 MiB
Total PE 5118
Alloc PE / Size 0 / 0
Free PE / Size 5118 / 19.99 GiB
VG UUID QndPIg-Sn6h-lNgL-wNDt-1OPx-4X6c-5OORrV
It shows that "data" VG has been set with a PE size of 4MiB and it has never been used, since Allocated PE is 0 out of 5118 available.
if you want to get summary information just type:
vgs
the output on my system is:
VG #PV #LV #SN Attr VSize VFree
data 2 0 0 wz--n- 19.99g 19.99g
Since you can have more than just one VG on your system, you can of course focus both commands to a specific VG as follows:
vgdisplay data vgs data
Listing Logical Volumes
Type the following command:
lvdisplay
on my system, the output is as follows:
--- Logical volume ---
LV Path /dev/data/pgsql_data
LV Name pgsql_data
VG Name data
LV UUID 3OPj5y-Uio5-pxQt-cY9F-zjfK-QymG-iCdiM7
LV Write Access read/write
LV Creation host, time localhost.localdomain, 2022-11-08 22:30:54 +0100
LV Status available
# open 0
LV Size 16.00 GiB
Current LE 4096
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:0
that is we have only one LV called "pgsql" that belongs to VG "data".
To get summary information, just type:
lvs
the output on my system is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
pgsql_data data -wi-a----- <16.00g
Since you can have more than just one LV on your system, you can of course focus both commands to a specific LV as follows:
lvdisplay data lvs data
When it comes to scripting, the most convenient command to use is lvs, since it provides convenient command line options to output only specific fields: there are lots of fields by the way. For example you can add the devices field to the output of lvs by adding "-o +devices" as follows:
lvs -o +devices
on my system, the output is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
pgsql_data data -wi-a----- 16.00g /dev/sdb(0)
pgsql_data data -wi-a----- 16.00g /dev/sdc(0)
Operating with PV
Creating a PV
The easiest, straightforward and more scalable way of creating a Physical Volume is to simply use a whole device as a PV, without prior partitioning it.
This of course requires an empty disk, so if your VM does not have a free one yet, attach it and rescan the SCSI bus as follows:
for host in $(ls -d /sys/class/scsi_host/host*); do echo "- - -" > $host/scan; done
if instead your hypervisor does not support hot-plug for disks, shutdown the machine, attach a new virtual disk and boot the virtual machine again.
Once added the new disk, we can list the available block devices by typing:
lsblk
the output on my system is as follows:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 78.1G 0 disk
├─sda1 8:1 0 200M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
├─sda3 8:3 0 16G 0 part [SWAP]
└─sda4 8:4 0 60.9G 0 part /
sdb 8:16 0 10G 0 disk
sdc 8:32 0 10G 0 disk
sr0 11:0 1 1024M 0 rom 68
As you see "sdb" and "sdc" are the only unpartitioned disk devices, so the new disks have been added as "sdb" and "sdc".
To configure "/dev/sdb" and "/dev/sdc" as a Physical Volumes just type:
pvcreate /dev/sdb /dev/sdc
on my system the output is as follows:
Physical volume "/dev/sdb" successfully created.
Physical volume "/dev/sdc" successfully created.
Creating devices file /etc/lvm/devices/system.devices
As previously mentioned, there's a school of thought (I do not agree by the way) that states that even if the storage device is fully dedicated to the Physical Volume, you must anyway create a partition on the storage device and use it as a PV.
So for example, following this theory, if "/dev/sdb" would be the empty disk you want to use as a PV you would have to prior partition it as follows
create /dev/sdb1 partition:
parted /dev/sdb --script mklabel gpt
parted -a optimal /dev/sdb --script mkpart volume 1M 100%
parted -a optimal /dev/sdb --script set 1 lvm on
and only then create the PV label into "/dev/sdb1" partition you have just created:
pvcreate /dev/sdb1
As previously told, IMHO there are more cons than pros into using this approach.
Dealing With Alignment On Physical Disks
Until you are working with virtual disks or LUNs from a SAN you don't have to worry about alignment, but when it comes to using physical devices you must take care of it.
When dealing with simple disks, most of the time you can blindly align at 1MiB.
The alignment of the PV to create is specified using the "--dataalignment" command option: for example, to create a PV aligned at 1MiB, just type:
pvcreate --dataalignment 1MiB /dev/sdb
Dealing With Alignment On RAID Devices
If you are on the top of a RAID array things become a little bit more complex, since you must guess the chunk size that was set when configuring the array. The general rule to calculate the PV data alignment is multiplying the chunk size by the number of disks in the RAID array without counting the parity level (RAID 5 has one parity, RAID 6 has double parity).
Of course there are exceptions, for example:
- RAID1 does not have parity: with RAID 1 the number of stripes is equal to the number of disks
- with RAID 1+0 the number of stripes is equal to half of the disks
and so on.
Guessing the chunk size deeply depends on the RAID implementation.
For example, when dealing with hardware RAID, if you have a LSI MegaRAID controllers, you can use guess the chunk size using storcli64 command line utility as follows:
./storcli64 /c0/v1 show all | grep Strip
the output would be similar to the following one:
Stripe Size : 128K
Conversely, when dealing with software RAID, you can just rely on the mdadm command line utility inspecting any disk that is part of the array as follows:
mdadm -E /dev/sdb | grep "Chunk Size"
the output would be similar to the following one:
Chunk Size : 128K
As an example, if we have an hypothetical "/dev/md127" configured as RAID5 with 4 disks, chunk size=128K, the PV data alignment would be 128K * (4-1) = 128K * 3 = 384K.
So the command to create a properly aligned PV would be:
pvcreate -M2 --dataalignment 384K /dev/md127
Resizing a PV
The very first thing to do is of course to increase the size of the disk - the way of doing this deeply depends on the virtualization software / RAID implementation or the SAN you are using.
Once the storage device has been increased, rescan it so that the kernel gets informed that the size is changed. For example, if the resized device is "/dev/sdb", type:
echo 1 > /sys/block/sdb/device/rescan
then things differ a little if the PV is a whole storage device or a partition of a storage device.
PV Created On A Whole Disk
Let's see the partitions defined on sdb:
lsblk /dev/sdb
the output must be similar to the following one:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdb 8:16 0 10G 0 disk
as you see ther are no partitions defined on /dev/sdb: we can directly resize the Physical Volume as follows:
pvresize /dev/sdb
the output on my system is as follows:
Physical volume "/dev/sdb" changed
1 physical volume(s) resized or updated / 0 physical volume(s) not resized
PV Created On A Partition
Using "/dev/sdb" as example, if the output of the command:
lsblk /dev/sdb
is as the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdb 8:16 0 10G 0 disk
└─sdb1 8:17 0 8G 0 part
and the PV is "/dev/sdb1", which means that the PV has been created into a partition.
In such a scenario, in order to resize the PV we have to resize the partition as well - since parted does not support the resize option anymore, we have to remove and recreate the partition:
parted -a optimal /dev/sdb --script rm 1
parted -a optimal /dev/sdb --script mkpart volume 1M 100%
parted -a optimal /dev/sdb --script set 1 lvm on
since we specified 100%, the new partition takes the whole disk. Now that we increased the partition, we must notify LVM that PV size has changed as follows:
pvresize /dev/sdb1
the output must be as follows:
Physical volume "/dev/sdb1" changed
1 physical volume(s) resized or updated / 0 physical volume(s) not resized
Deleting a PV
Sometimes it may happen having the need to delete a PV. For example, you may want to replace a slow performance disk holding a PV with an high performance one, and re-use the old performance disk for something else.
This requires you to set the high-performance disk as a new PV, remove the PV on the low performance disk from the VG and eventually delete the PV label from the low performance disk.
Deleting a PV is simple: for example, if the PV is "/dev/sdb" disk:
pvremove /dev/sdb
lvmdevices --deldev /dev/sdb
if instead we are dealing with a partition as PV, such as "/dev/sdb1":
pvremove /dev/sdb1
lvmdevices --deldev /dev/sdb1
pvremove wipes the LVM label from the device, whereas lvmdevices --deldev removes the related entry from the "/etc/lvm/devices/system.device".
If you forget to run the lvmdevices statement, when typing any LVM command you will get messages as the following one:
Devices file sys_wwid t10.ATA_____rocky_default_1667937593571_37531-0_SSD_9DZXE76HHYJ412N6YMGD PVID none last seen on /dev/sdb1 not found.
REplacing a PV
This is accomplished by removing from the VG the PV you want to replace - see "Reducing a VG'' for the details. Anyway, mind that the VG must have enough free space to hold the PEs that will be migrated from the PV you will remove to the rest of the PVs. If there's not enough space in the remaining VGs, you must extend it first - see "Extending a VG '' for the details.
Working with VG
Creating a VG
To create a VG, you must use the vgcreate command followed by the name you want to give to the VG along with the list of PV you want to assign to it.
For example, to create a VG called "data" that spans across the "/dev/sdb" and "/dev/sdb" PVs type:
vgcreate data /dev/sdb /dev/sdc
Of course, set both "/dev/sdb" and "/dev/sdb" must have been already configured as PV.
When the underlying storage device of the PV is a RAID that performs striping, it is mandatory to set the PE size : getting back to the previous example with a stripe size of 384K, we must set PE size to a multiple of it, so for example 3840K. This can be achieved by adding -s command option as follows:
vgcreate -s 3840K data /dev/md127
Renaming a VG
You may of course need to rename an existing VG: this can be accomplished by using the vgrename command. For example, to rename the "foovg" VG in "data" just type:
vgrename foovg data
Extending a VG
It is very likely that you run out of space (Free PE) on a VG sooner or later
Extending a VG can be accomplished either in the following two ways:
- extending the PV that are already members of the VG (for more information, see Resizing a PV)
- adding new PVs (see below).
Resize VG by adding a new disk
Extending a VG this way is really simple. You just need a new disk to be used as PV, so create a virtual disk or LUN and then rescan the SCSI bus on every SCSI host as follows:
for host in $(ls -d /sys/class/scsi_host/host*); do echo "- - -" > $host/scan; done
then make sure the new disk (or disks) have become available to the system:
lsblk
if everything properly worked out, the new disk (or disks) must be listed in the output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 78.1G 0 disk
├─sda1 8:1 0 200M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
├─sda3 8:3 0 16G 0 part [SWAP]
└─sda4 8:4 0 60.9G 0 part /
sdb 8:16 0 10G 0 disk
sdc 8:32 0 10G 0 disk
sdd 8:48 0 10G 0 disk
sde 8:64 0 10G 0 disk
sr0 11:0 1 1024M 0 rom
In this example I added two new disks, that are seen as "/dev/sdd" and "/dev/sde" - we can add them to the "data" VG as follows:
vgextend data /dev/sdd /dev/sde
if instead we would dealing with a partition as new PVs, such as "/dev/sdd1" and "/dev/sde1", the command would be:
vgextend data /dev/sdd1 /dev/sde1
Reducing a VG
Although this is not very likely to happen, you may need to reduce the size of the VG: it may happen that you want to remove a PV from a VG, either temporarily or permanently. For example:
- you may need to replace an old disk that is quite likely to break that is holding a PV
- you may need to replace a low performance disk holding the PV with a high performance one
- ...
Reducing a VG can be accomplished either in the following two ways:
- removing PVs
- shrinking the PV that are already members of the VG (see the box below)
In order to be removed from a VG, there must not be allocated PE in the PV: this means that you must migrate all the allocated PE of the PV you want to remove to the remaining PVs of the VG: if the available free PE on the remaining PV are not enough, you must add an additional PV or extend the existing ones before proceeding.
For example, if we want to remove "/dev/sdc" PV from the "data" VG, we can get its PE consumption as follows:
pvdisplay /dev/sdc | grep PE
the output on my system is as follows:
PE Size 32.00 MiB
Total PE 156
Free PE 88
Allocated PE 68
Now let's see if “data” VG have enough Free PEs to store the PEs consumed by "/dev/sdc" :
vgs -o vg_free_count data
the output on my system is:
Free
3514
Since to remove "/dev/sdc" we need at least 68 Free PE in the “data” VG, and the VG have 3514 Free PEs, we can safely remove the "/dev/sdc" PV from the VG.
Let's empty "/dev/sdc" as follows:
pvmove /dev/sdc
the output on my system is:
/dev/sdc: Moved: 0.00%
/dev/sdc: Moved: 10.79%
…
/dev/sdc: Moved: 91.40%
/dev/sdc: Moved: 100.00%
now that "/dev/sdc" has no allocated PEs, we can safely reduce the VG by removing the "/dev/sdc" PV from it:
vgreduce data /dev/sdc
it is now safe to remove LVM labels from "/dev/sdc" using pvremove as previously shown:
pvremove /dev/sdc
lvmdevices --deldev /dev/sdc
Manually relocating PEs across the PVs
The pvmove command is much more powerful than we saw so far: it is so granular that it lets you even move ranges of PE from PV to PV, or even within the same PV.
As an example, let's have a look to the current allocation of the PE on "/dev/sdb":
pvs -o +pvseg_all,lv_name /dev/sdb | awk 'NR == 1; NR > 1 {print $0 | "sort -k 9 -k 7"}'
the output on my system is as follows:
PV VG Fmt Attr PSize PFree Start SSize LV
/dev/sdb data lvm2 a-- <9.79g <7.79g 100 21
/dev/sdb data lvm2 a-- <9.79g <7.79g 512 38
/dev/sdb data lvm2 a-- <9.79g <7.79g 571 1934
/dev/sdb data lvm2 a-- <9.79g <7.79g 0 100 pgsql_data
/dev/sdb data lvm2 a-- <9.79g <7.79g 121 391 pgsql_data
/dev/sdb data lvm2 a-- <9.79g <7.79g 550 21 pgsql_data
this shows that we have only three range of PE already allocated to "pgsql_data" LV:
- 0-99 (100 PE),
- 121-511 (391 PE)
- 550-571 (21 PE)
Since we have 21 free PE starting at 100, we can move 21 PE from 550 to fill the hole - the command is as follows:
pvmove --alloc anywhere /dev/sdb:550-570 /dev/sdb:100-120
the output on my system is:
/dev/sdb: Moved: 4.76%
/dev/sdb: Moved: 100.00%
Let's look at the outcome:
pvs -o +pvseg_all,lv_name /dev/sdb | awk 'NR == 1; NR > 1 {print $0 | "sort -k 9 -k 7"}'
now the output on my system is:
PV VG Fmt Attr PSize PFree Start SSize LV
/dev/sdb data lvm2 a-- <9.79g <7.79g 512 1993
/dev/sdb data lvm2 a-- <9.79g <7.79g 0 512 pgsql_data
we reached our goal: the hole disappeared, and now we have all PEs from 0 to 512 allocated without holes.
Working with LV
A logical volume is the storage device that is finally provided to the system. You can think of it as a partition that can be resized at will without having to worry of where data is actually stored on the underlying hardware.
Creating LV
You can create a LV using the lvcreate command line utility. For example, to create the “foo” LV of 1GiB in size into the “data” VG type:
lvcreate -L1Gib -n foo data
if you fancy, you can even create a LV specifying the size using the number of extents:
lvcreate -l10 -n bar data
or even the percentage of remaining free space.
For example, to create the "baz" LV into the "data" VG using the 80% of the remaining available space of the VG, type:
lvcreate -l5%FREE -n baz data
If you need higher performances, you can even configure LVM to stripe writes across the PVs of the VG, providing with the "-i" parameter the number of PV you ant to stripe onto
For example, if you want to have writes striped across two different PVs, :
lvcreate -L512M -i2 -n qux data
the output is as follows:
Using default stripesize 64.00 KiB.
Logical volume "qux" created.
Note how this time it prints information about the default stripe size.
As you can easily guess, the striping setting are viewable also from the device mapper:
dmsetup deps /dev/mapper/data-qux
the output on my system is:
2 dependencies : (8, 48) (8, 16)
we can of course get information on the striping from the LVM itself as follows:
lvdisplay data/qux -m
the output on my system is:
--- Logical volume ---
LV Path /dev/data/qux
LV Name qux
VG Name data
LV UUID ZacQJN-GFJy-xsTv-4teq-0lM3-EUBm-l8xXgw
LV Write Access read/write
LV Creation host, time localhost.localdomain, 2022-11-08 23:34:17 +0100
LV Status available
# open 0
LV Size 512.00 MiB
Current LE 128
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:3
--- Segments ---
Logical extents 0 to 127:
Type striped
Stripes 2
Stripe size 64.00 KiB
Stripe 0:
Physical volume /dev/sdb
Physical extents 636 to 699
Stripe 1:
Physical volume /dev/sdd
Physical extents 0 to 63
As you can see, it confirms that the stripe size is 64K – that is the default value.
You can obviously set the stripe size to a custom value more appropriate to your workload providing it with the "-I" option.
For example, to set a stripe of 256K on 3 PVs, type:
lvcreate -L512M -i3 -I 128 -n waldo data
let's check the outcome:
lvdisplay data/waldo -m
the output on my system is:
--- Logical volume ---
LV Path /dev/data/waldo
LV Name waldo
VG Name data
LV UUID 2pHhoL-3142-f3jf-nNCL-s3DP-SdKH-udZI3I
LV Write Access read/write
LV Creation host, time localhost.localdomain, 2022-11-08 23:40:16 +0100
LV Status available
# open 0
LV Size 516.00 MiB
Current LE 129
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:4
--- Segments ---
Logical extents 0 to 128:
Type striped
Stripes 3
Stripe size 128.00 KiB
Stripe 0:
Physical volume /dev/sdb
Physical extents 700 to 742
Stripe 1:
Physical volume /dev/sdd
Physical extents 64 to 106
Stripe 2:
Physical volume /dev/sde
Physical extents 0 to 42
Renaming a LV
You may of course need to rename an existing LV - this can be accomplished using the lvrename command line utility. For example, to rename the "waldo" LV of the "data" VG into "pgsql_data" type:
lvrename data waldo pgsql_data
Resizing LV
Once created a LV, sooner or later you may need to resize it. This can be accomplished by the lvresize command. For example, to resize "pgsql_data" LV to fill all the remaining available space of the "data" VG:
lvresize -l100%FREE data/pgsql_data
LV shrinking is supported, but it requires you to confirm that you know what you are doing.
For example:
lvresize -L512MiB data/foo
produces the following warning:
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce data/foo? [y/n]:
This is just to remind you that what you are trying to do, although supported, can be dangerous.
Removing LV
You can get rid of LVs using the lvremove command line utility.
For example, to remove "baz" LV from the "data" VG just type:
lvremove data/pgsql_data
Since this is a potentially harmful operation, it asks you to confirm what you are doing.
Thin-Pool Backed LVs
Modern storage systems enable you to over-provision storage, speculating that users will not really need to use the whole provisioned space soon. LVM enables you to do this by the means of thin-pools: a thin-pool lets you provision to LVs more space than the one that is actually available to the VG - with thin provisioning LE are only reserved to the LV, and are allocated only when it is really needed to store some data. This way you can reserve a LV more LE than the PE that has actually been assigned to the VG.
This feature is really useful when dealing with bare metal used to provide shared storage (NFS, CIFS, iSCSI) using low-cost commodity storage that does not natively support storage overcommitting.
An example scenario is Gluster-FS: it is a shared distributed replicated file system that has been designed to be installed on commodity hardware with directly attached storage: since the hardware must be as simple as possible (and so does not support storage over-provisioning), Gluster relies on LVM thins pools to implement it.
The thin pool is itself an LV specifically created for this purpose: it is used as the source of the LEs to be assigned to Thin-LVs. As an example, consider the following scenario:
VG #PV #LV #SN Attr VSize VFree
data 4 0 0 wz--n- 39.98g 39.98g
Creating a Thin-Pool LV
Let's now create the "fs_pool" LV of kind thin-pool that uses 60% of the free space of the VG into the already existent "data":
lvcreate -l60%FREE --thinpool fs_pool data
on my system the output is as follows:
Thin pool volume with chunk size 64.00 KiB can address at most <15.88 TiB of data.
Logical volume "fs_pool" created.
Let's check the outcome:
lvs
on my system the output is:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
qux data -wi-a----- 512.00m
fs_pool data twi-a-tz-- 23.94g 0.00 10.53
an LV of kind thin-pool is quite special, since it leverages on two hidden LV: we can see them providing the "-a" option of the lvs command line utility so to list all the available LVs, including the hidden ones:
lvs -a
on my system the output is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
[lvol0_pmspare] data ewi------- 24.00m
qux data -wi-a----- 512.00m
fs_pool data twi-a-tz-- 23.94g 0.00 10.53
[fs_pool_tdata] data Twi-ao---- 23.94g
[fs_pool_tmeta] data ewi-ao---- 24.00m
- "[fs_pool_tdata]" is the LV containing the actual data
- "[fs_pool_tmeta]" holds the metadata
The metadata LV can be from 2MiB and 16GiB in size – you can manually specify its size using "--poolmetadatasize" command option.
As an example, let's create another thin-pool specifying the size of the metadata LV:
lvcreate -l50%FREE --poolmetadatasize 3MiB --thinpool goldens_pool data
let's check the outcome:
lvs -a
on my system the output is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
[lvol0_pmspare] data ewi------- 24.00m
qux data -wi-a----- 512.00m
fs_pool data twi-a-tz-- 23.94g 0.00 10.53
[fs_pool_tdata] data Twi-ao---- 23.94g
[fs_pool_tmeta] data ewi-ao---- 24.00m
goldens_pool data twi-a-tz-- <7.22g 0.00 11.13
[goldens_pool_tdata] data Twi-ao---- <7.22g
[goldens_pool_tmeta] data ewi-ao---- 4.00m
As you see "goldens_pool_tmeta" has be rounded to 4MiB: this is because the PE size of the VG is 4MiB.
Creating a Thin-Pool backed LV
Now , to test the ''fs_pool" thin-pool, let's try to create a thin provisioned LV with a size bigger than the VG itself:
lvcreate -V 15G --thin -n shared_fs data/fs_pool
Please note the usage of the "--thin" option and that this time we specify an LV as a container rather than a VG.
Let see the outcome:
lvs
the output is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
fs_pool data twi-aotz-- 23.94g 0.00 10.55
qux data -wi-a----- 512.00m
shared_fs data Vwi-a-tz-- 15.00g fs_pool 0.00
goldens_pool data twi-a-tz-- <7.22g 0.00 11.13
As you can see, we have provisioned 15G to the "baz" LV despite the "data" VG being only around 10GiB in size.
Time after time, you can check the percentage usage of the LVM Data and Metadata simply by typing "lvs" as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
fs_pool data twi-aotz-- 23.94g 39.05 26.30
qux data -wi-a----- 512.00m
shared_fs data Vwi-a-tz-- 15.00g fs_pool 5.14
goldens_pool data twi-a-tz-- <7.22g 0.00 11.13
Resizing a Thin-Pool LV
Since the thinpool is actually an LV, if you run out of space you can extend it the same way you would do with a regular LV:
lvresize -L +512MiB data/fs_pool
Of course it may also happen that you run out of space in the metadata pool: you can extend it as well.
For example, to add other 40MiB to the metadata LV of the "fs_pool" thin pool of the "data" VG, type:
lvresize -L +40MiB data/fs_pool_tmeta
Resizing a Thin-Pool backed LV
You can of course resize also any thin-pool backed LV. As an example let's resize the "shared_fs" LV we've just created adding other 40GiB:
lvresize -L+40GiB data/shared_fs
this is the output on my system:
Size of logical volume data/shared_fs changed from 15.00 GiB (3840 extents) to 55.00 GiB (14080 extents).
WARNING: Sum of all thin volume sizes (55.00 GiB) exceeds the size of thin pools and the size of whole volume group (39.98 GiB).
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume data/shared_fs successfully resized.
Since I'm over-provisioning the storage, LVM emitted the above warnings.
Auto-extending Thin-Pools LV
As suggested by the above warnings, if you fancy you can enable the automatic extension of thin pools: this is a handy feature that automatically extends the backing thin-pools LVs when needed.
To configure it, look into the "/etc/lvm/lvm.conf" file and uncomment and modify the following settings as needed:
thin_pool_autoextend_threshold = 100
thin_pool_autoextend_percent = 20
since the default threshold is 100% it never gets hit.
To enable it you must lower it: for example, if you set the threshold to 80 as soon as any data thin-pool gets filled to 80% it gets automatically extended by 20%.
Snapshots
LVM snapshots leverage on Copy-on-Write (CoW) to provide a consistent view of the data at a given moment. They are really useful for example when dealing with backups, to ensure data consistency during the backup. Another typical use-case is having a small time-window to perform actions that modify the file system having the opportunity to rollback if necessary.
Mind that LVM snapshots cannot be used for long-term storage: each time something gets modified its original data are copied to the designated LV that eventually runs out of space. In addition to that take in account that CoW does not come for free – it has a performance cost, since you are basically reading old data from the original LV, writing them to the snapshot LV and eventually you are writing the new data to the original LV. For these reasons the life-time of a snapshot should be as short as possible, and not last more than the purpose it has taken for: if we took it for running a backup, we should remove it as soon as the backup ends.
Taking a snapshot
A snapshot is taken using lvcreate command providing the "-s" option and the path to the LV that is subject of the snapshot.
For example, to take a snapshot called "foo_snap" of "foo" LV belonging to "data" VG:
lvcreate -L 512M -s -n qux_snap data/qux
when creating a snapshot LV it is mandatory to specify the size using the -L parameter. The outcome is a new LV used to store the COW data. We can take the snapshot also for a thin-pool backed LV. For example, to take a snapshot of the "shared_fs" LV we previously created:
lvcreate -L 512M -s -n shared_fs_snap data/shared_fs
Are you enjoying these high quality free contents on a blog without annoying banners? I like doing this for free, but I also have costs so, if you like these contents and you want to help keeping this website free as it is now, please put your tip in the cup below:
Even a small contribution is always welcome!
Anyway, when dealing with a thin-pool backed LV, you can also omit to specify the size, so creating also the snapshot LV itself as thin-pool backed LV.
lvcreate -s --name thin_snap data/shared_fs
since snapshots are actually LV, you can list them using the lvs as usual:
lvs
on my system the output is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
fs_pool data twi-aotz-- <24.19g 0.00 10.12
qux data -wi-a----- 512.00m
qux_snap data swi-a-s--- 512.00m qux 0.00
shared_fs data owi-a-tz-- 15.00g fs_pool 0.00
shared_fs_snap data swi-a-s--- 512.00m shared_fs 0.00
thin_snap data Vwi---tz-k 15.00g fs_pool shared_fs
goldens_pool data twi-a-tz-- <7.22g 0.00 11.13
As you can see you can easily guess which LV are snapshots: they have an Origin.
In the above list:
- the "qux_snap" LV is a snapshot taken from the "qux" LV.
- the "shared_fs_snap" LV is a snapshot taken from the "shared_fs" thin-pool backed LV ("fs_pool").
- the "thin_snap" thin-pool backed LV ("fs_pool") is a snapshot taken from the "shared_fs" thin-pool backed LV.
Now remove the "thin_snap" LV as follows:
lvremove data/thin_snap
we can verify the consumption of the snapshot using the lvs command as usual:
lvs
As soon as data in the "shared_fs" LV is modified, it is copied to the "shared_fs_snap" increasing the value of the "Data%".
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
fs_pool data twi-aotz-- <24.19g 0.00 10.12
qux data -wi-a----- 512.00m
qux_snap data swi-a-s--- 512.00m qux 0.00
shared_fs data owi-a-tz-- 15.00g fs_pool 0.00
shared_fs_snap data swi-a-s--- 512.00m shared_fs 3.42
goldens_pool data twi-a-tz-- <7.22g 0.00 11.13
As you can easily guess, when it reaches 100% meaning there's no more space to copy data from the Origin to the snapshot, and so the snapshot has become unusable:
When the backing LV becomes full the snapshot gets invalidated. This means that you won't be able to use it anymore - if you try to rollback it you get the following message:
Unable to merge invalidated snapshot LV data/foo_snap.
Auto-extending Snapshots
Same way as with thin-pools, a very handy feature you can enable if necessary is auto-extend the backing LV used by snapshots: look into "/etc/lvm/lvm.conf" and locate the following settings:
snapshot_autoextend_threshold = 100
snapshot_autoextend_percent = 20
Even here, since the default threshold is 100% it never gets hit.
To enable the auto-extend feature, you need to lower the threshold: for example if you set "snapshot_autoextend_threshold" to 80 as soon as it gets filled to 80% it gets extended by 20%.
Rollback a snapshot
Rolling back a snapshot requires as first step to unmount the filesystem of the LV we want to restore: for example, if the "shared_fs" LV is mounted on "/srv/nfs/shared_fs", type:
umount /srv/nfs/shared_fs
once unmounted we can safely proceed with the rollback statement:
lvconvert --merge data/shared_fs_snap
the output is as follows:
Merging of volume data/shared_fs_snap started.
data/foo: Merged: 95.39%
data/foo: Merged: 100.00%
Mind that when the rollback completes - that is after merging a backing LV into its Origin, the backing LV (shared_fs_snap) gets automatically removed.
Removing a snapshot
Since the backing LV of the snapshot is actually an LV, you can remove it as any regular LV as follows:
lvremove data/shared_fs_snap
External-Origin snapshots
External-Origin snapshots are a snapshots that works the opposite way of regular ones: when using regular snapshots, you mount the snapshotted (so the Origin) LV, so when writing you are actually modifying its data and the original data are copied on write (COW) to the backing LV.
External-Origin snapshots instead are set-up by making the Origin LV Ready Only, so modified data are directly written to the backing LV. The most straightforward consequence of this is that you can take multiple snapshots of the same origin, and since these snapshots are from a read only LV they always start with the state the Origin was in when it has been set read-only.
Conversely from Same-Origin snapshots, external origin snapshots are long lasting snapshots meaning that you can always rollback to their original state: with this kind of snapshot you have the opposite trouble: you must provide enough space to the backing LV to avoid it to run out of it.
A typical use of this kind of snapshot is creating a golden image used as the base to span several instances: the golden must be immutable – this is why the origin LV is set read-only, whereas each instance gets a read-write backing LV it can consume as LVM overlay. It is straightforward that you cannot merge the backing LV into the Origin LV, otherwise you would invalidate any other snapshot relying on the same origin.
If you try to create an external origin snapshot without previously having set the LV inactive and read-only, then you'll get the following error:
Cannot use writable LV as the external origin.
For example, if the golden is the "base" LV of the "data" VG, set it inactive and read only as follows:
lvchange -a n -p r data/base
You can now take as many snapshot as you want:
lvcreate -s --thinpool data/goldens_pool base --name base_clone_1
lvcreate -s --thinpool data/goldens_pool base --name base_clone_2
Mind that when dealign with thin provisioned LV the commands are exactly the same.
Let's check the outcome:
lvs
the output on my system is as follows:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
bar data -wi-a----- 40.00m
foo data -wi-a----- 1.00g
fs_pool data twi-aotz-- <24.19g 0.00 10.12
base data ori------- 512.00m
base_clone_1 data Vwi-a-tz-- 512.00m goldesn_pool base 0.00
base_clone_2 data Vwi-a-tz-- 512.00m goldens_pool base 0.00
shared_fs data Vwi-a-tz-- 55.00g fs_pool 0.00
goldens_pool data twi-aotz-- <6.76g 0.00 10.79
The "base_clone_1" and "base_clone2" can now be independently used at wish.
Using The Logical Volume
Once created, LV are provisioned by the Device Mapper beneath "/dev" as special device files named "dm-<number>". The Device Mapper also creates a convenient symlink beneath "/dev/mapper" that contains both VG and LV in its name.
For example the "bar" LV of the "foo" VG generates the "/dev/mapper/foo-bar" symlink.
You can then rely on that symlink to format the LV.
For example:
mkfs.xfs -L sharedfs /dev/mapper/data-shared_fs
the same symlink can of course be used as the source device to mount the filesystem.
For example, type the following statements:
mkdir /srv/nfs/shared_fs
mount /dev/mapper/data-shared_fs /srv/nfs/shared_fs
If instead you need to permanently mount it, so that it gets automatically remounted when booting, add a line like the following one into the "/etc/fstab" file:
/dev/mapper/data-shared_fs /srv/nfs/shared_fs xfs defaults 0 0
Footnotes
Here it ends this post about using LVM: we thoroughly saw how to professionally operate it with almost every scenario it is likely you can find. In addition to that, I'm pretty sure now you have also got the necessary skills to autonomously investigate more complex setups.
yotam says:
Great tutorial !!
Marco Antonio Carcano says:
Thank you! It’s always nice and very important to me to get feedback.
Evgeny says:
It is really great and concise tutorial, enjoyed reading it. It neatly sums up most of what one should to know about LVM. For me personally was interesting to learn about external-origin snapshots. Thank you!
Marco Antonio Carcano says:
Hello Evgeny, glad to know you found it useful. Thanks for the feedback