Docker storage drivers

Docker storage drivers

Docker storage drivers are components responsible for managing the way Docker containers and images are stored on disk. Different storage drivers offer varying performance characteristics, capabilities, and compatibility with different storage technologies.

To use storage drivers effectively, it’s important to know how Docker builds and stores images, and how these images are used by containers. You can use this information to make informed choices about the best way to persist data from your applications and avoid performance problems along the way.

👉 Storage drivers VS Docker volumes

Storage drivers and Docker volumes are both integral components of Docker containers, but they serve different purposes and operate at different levels within the Docker architecture.

Storage Drivers:

  1. Storage drivers in Docker are responsible for managing the interface between the container runtime and the underlying storage system. They handle tasks such as reading and writing data to containerized file systems, managing image layers, and optimizing storage operations.

  2. Storage drivers are crucial for achieving performance, scalability, and reliability in Docker containers. They determine how data is stored, retrieved, and managed within containers, impacting aspects like speed, efficiency, and compatibility with various storage backends.

Docker Volumes:

  1. Docker volumes provide a way to persist data generated by containers or share data between containers and the host system. Volumes are separate entities from container file systems and can exist independently, allowing for data persistence even if containers are stopped or removed.

  2. Volumes enable data sharing and persistence across container lifecycles. They can be managed and manipulated independently of containers, making them suitable for scenarios like database storage, configuration files, log files, and other persistent data needs.

👉 Images and layers

A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Dockerfile. Each layer except the very last one is read-only. Consider the following Dockerfile:

# syntax=docker/dockerfile:1

FROM ubuntu:22.04
LABEL org.opencontainers.image.authors="org@example.com"
COPY . /app
RUN make /app
RUN rm -r $HOME/.cache
CMD python /app/app.py

This Dockerfile contains four commands. Commands that modify the filesystem create a layer. The FROM statement starts out by creating a layer from the ubuntu:22.04 image. The LABEL command only modifies the image's metadata, and doesn't produce a new layer. The COPY command adds some files from your Docker client's current directory. The first RUN command builds your application using the make command, and writes the result to a new layer. The second RUN command removes a cache directory, and writes the result to a new layer. Finally, the CMD instruction specifies what command to run within the container, which only modifies the image's metadata, which doesn't produce an image layer.

Each layer is only a set of differences from the layer before it. Note that both adding, and removing files will result in a new layer. In the example above, the $HOME/.cache directory is removed, but will still be available in the previous layer and add up to the image's total size.

The layers are stacked on top of each other. When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the “container layer”. All changes made to the running container, such as writing new files, modifying existing files, and deleting files, are written to this thin writable container layer. The diagram below shows a container based on an ubuntu:15.04 image.

A storage driver handles the details about the way these layers interact with each other. Different storage drivers are available, which have advantages and disadvantages in different situations.

👉Container and layers

The major difference between a container and an image is the top writable layer. All writes to the container that add new or modify existing data are stored in this writable layer. When the container is deleted, the writable layer is also deleted. The underlying image remains unchanged.

Because each container has its own writable container layer, and all changes are stored in this container layer, multiple containers can share access to the same underlying image and yet have their own data state. The diagram below shows multiple containers sharing the same Ubuntu 15.04 image.

Docker uses storage drivers to manage the contents of the image layers and the writable container layer. Each storage driver handles the implementation differently, but all drivers use stackable image layers and the copy-on-write (CoW) strategy.

👉 Container size on disk

To view the approximate size of a running container, you can use the docker ps -s command. Two different columns relate to size.

  • size: the amount of data (on disk) that's used for the writable layer of each container.

  • virtual size: the amount of data used for the read-only image data used by the container plus the container's writable layer size. Multiple containers may share some or all read-only image data. Two containers started from the same image share 100% of the read-only data, while two containers with different images which have layers in common share those common layers. Therefore, you can't just total the virtual sizes. This over-estimates the total disk usage by a potentially non-trivial amount.

The total disk space used by all of the running containers on disk is some combination of each container’s size and the virtual size values. If multiple containers started from the same exact image, the total size on disk for these containers would be SUM (size of containers) plus one image size (virtual size - size).

This also doesn’t count the following additional ways a container can take up disk space:

  • Disk space used for log files stored by the logging-driver. This can be non-trivial if your container generates a large amount of logging data and log rotation isn’t configured.

  • Volumes and bind mounts used by the container.

  • Disk space used for the container’s configuration files, which are typically small.

  • Memory written to disk (if swapping is enabled).

  • Checkpoints, if you’re using the experimental checkpoint/restore feature.

The Docker Engine provides the following storage drivers on Linux:

Driver:

overlay2:is the preferred storage driver for all currently supported Linux distributions, and requires no extra configuration.

fuse-overlayfs: is preferred only for running Rootless Docker on a host that does not provide support for rootless overlay2. On Ubuntu and Debian 10, the fuse-overlayfs driver does not need to be used, and overlay2 works even in rootless mode.

btrfs and zfs: The btrfs and zfs storage drivers allow for advanced options, such as creating "snapshots", but require more maintenance and setup. Each of these relies on the backing filesystem being configured correctly.

vfs: The vfs storage driver is intended for testing purposes, and for situations where no copy-on-write filesystem can be used. Performance of this storage driver is poor, and is not generally recommended for production use.

The Docker Engine has a prioritized list of which storage driver to use if no storage driver is explicitly configured, assuming that the storage driver meets the prerequisites, and automatically selects a compatible storage driver.

👉 Check your current storage driver

To see what storage driver Docker is currently using, use docker info and look for the Storage Driver line:

$ docker info

Containers: 0
Images: 0
Storage Driver: overlay2
 Backing Filesystem: xfs
<...>

To change the storage driver, see the specific instructions for the new storage driver. Some drivers require additional configuration, including configuration to physical or logical disks on the Docker host.

Did you find this article valuable?

Support Megha Sharma's Blog by becoming a sponsor. Any amount is appreciated!