Definition

A sparse file is a file in which sequences of zero bytes are replaced with information about these sequences (a list of holes)

A hole is a sequence of zero bytes inside a file that is not written to disk. Information about holes (offset from the beginning of the file in bytes and the number of bytes) is stored in the FS metadata

One of the most common use cases can be found in raw virtual machine images, since unlike qcow2, raw images do not contain any meta information to save disk space

However, the virtual size of such a file can exceed the maximum capacity on the FS where it is located

Practice

Let’s try to create such a file in different ways:

qemu-img create -f raw gigantic_image.img 10T
truncate -s 100G big_file
dd of=sparse-file bs=10G seek=10 count=0

If we look at the size of these files in the usual way using the ls utility, we will see only the virtual size of the files:

ls -lh
total 4.0K
-rw-r--r-- 1 alexander alexander 100G Sep 18 21:10 big_file
-rw-r--r-- 1 alexander alexander  10T Sep 18 21:10 gigantic_image.img
-rw-r--r-- 1 alexander alexander 100G Sep 18 21:10 sparse-file

To see the actual size that these files occupy on the disk, you can use any of the following commands:

  • ls -s — the -s, --size flag will display the real size of the files on the left (before the discretionary rights)
  • du -h <file> — the -h flag is needed for human-readable output of file sizes, du can recognize such files without additional flags
  • qemu-img info <image> — for images (also works with regular files)
  • stat <file> — although it shows the virtual size, the number of blocks will correspond to the real one

The list of commands does not end there, there is no point in listing all possible methods

You can also convert a regular file into a sparse one:

dd if=/dev/zero of=sparse_file bs=50M count=5

ls -lhs
total 250M
250M -rw-r--r-- 1 alexander alexander 250M Sep 18 21:26 sparse_file

fallocate -d sparse_file

ls -lhs
total 0
   0 -rw-r--r-- 1 alexander alexander 250M Sep 18 21:27 sparse_file

When copying via cp, you can pass the --sparse=always flag, which will create a sparse version of a file from a non-sparse file

dd if=/dev/zero of=sparse_file bs=50M count=5

cp --sparse=always sparse_file sparse_copy

ls -lsh
total 251M
   0 -rw-r--r-- 1 alexander alexander 250M Sep 18 21:32 sparse_copy
251M -rw-r--r-- 1 alexander alexander 250M Sep 18 21:32 sparse_file

However, in addition to the obvious advantages in the form of saving disk space, which is especially useful in virtual machines, there are also a number of significant disadvantages:

  • file fragmentation when frequently writing data into holes
  • copying a sparse file with a program that does not support this type of file may create a file with no size compression
  • filling systems with such files may lead to unexpected errors