Definition
A sparse file
is a file in which sequences of zero bytes are replaced with information about these sequences (a list of holes)
A hole is a sequence of zero bytes inside a file that is not written to disk. Information about holes (offset from the beginning of the file in bytes and the number of bytes) is stored in the FS metadata
One of the most common use cases can be found in raw virtual machine images, since unlike qcow2, raw images do not contain any meta information to save disk space
However, the virtual size of such a file can exceed the maximum capacity on the FS where it is located
Practice
Let’s try to create such a file in different ways:
qemu-img create -f raw gigantic_image.img 10T
truncate -s 100G big_file
dd of=sparse-file bs=10G seek=10 count=0
If we look at the size of these files in the usual way using the ls
utility, we will see only the virtual size of the files:
ls -lh
total 4.0K
-rw-r--r-- 1 alexander alexander 100G Sep 18 21:10 big_file
-rw-r--r-- 1 alexander alexander 10T Sep 18 21:10 gigantic_image.img
-rw-r--r-- 1 alexander alexander 100G Sep 18 21:10 sparse-file
To see the actual size that these files occupy on the disk, you can use any of the following commands:
ls -s
— the-s, --size
flag will display the real size of the files on the left (before the discretionary rights)du -h <file>
— the-h
flag is needed for human-readable output of file sizes,du
can recognize such files without additional flagsqemu-img info <image>
— for images (also works with regular files)stat <file>
— although it shows the virtual size, the number of blocks will correspond to the real one
The list of commands does not end there, there is no point in listing all possible methods
You can also convert a regular file into a sparse one:
dd if=/dev/zero of=sparse_file bs=50M count=5
ls -lhs
total 250M
250M -rw-r--r-- 1 alexander alexander 250M Sep 18 21:26 sparse_file
fallocate -d sparse_file
ls -lhs
total 0
0 -rw-r--r-- 1 alexander alexander 250M Sep 18 21:27 sparse_file
When copying via cp
, you can pass the --sparse=always
flag, which will create a sparse version of a file from a non-sparse file
dd if=/dev/zero of=sparse_file bs=50M count=5
cp --sparse=always sparse_file sparse_copy
ls -lsh
total 251M
0 -rw-r--r-- 1 alexander alexander 250M Sep 18 21:32 sparse_copy
251M -rw-r--r-- 1 alexander alexander 250M Sep 18 21:32 sparse_file
However, in addition to the obvious advantages in the form of saving disk space, which is especially useful in virtual machines, there are also a number of significant disadvantages:
- file fragmentation when frequently writing data into holes
- copying a sparse file with a program that does not support this type of file may create a file with no size compression
- filling systems with such files may lead to unexpected errors