Docker adoption rises constantly π and many are familiar with it, but not everyone is using Docker according to the best practices. π
Before moving on, if you don't know what Docker is, you can learn everything you need to get started in this free Docker Crash Course π³
Why using Best Practices? π€·ββοΈ
So, in my new video 'Top 8 Docker Production Best Practices' I want to show you 8 ways you can use Docker in a right way in your projects to:
- β improve security,
- β optimize the image size,
- β take advantage of some of the useful Docker features
- β and also write cleaner and more maintainable Dockerfiles
1οΈβ£ Best Practice
Use an official and verified Docker image as a base image, whenever available.
Let's say you are developing a Node.js application and want to build and run it as a Docker image.
Instead of taking a base operating system image and installing node.js, npm and whatever other tools you need for your application, use the official node image for your application.
Improvements:
- Cleaner Dockerfile
- Official and verified image, which is already built with the best practices
2οΈβ£ Best Practice
Use specific Docker image versions
Okay, so we have selected the base image, but now when we build our applications image from this Dockerfile, it will always use the latest
tag of the node image.
Now why is this a problem? π€
β - you might get a different image version as in the previous build
β - the new image version may break stuff
β - latest
tag is unpredictable, causing unexpected behavior
So instead of a random latest image tag, you want to fixate the version and just like you deploy your own application with a specific version you want to use the official image with a specific version.
And the rule here is: the more specific the better
Improvements:
- Transparency to know exactly what version of the base image you're using
3οΈβ£ Best Practice
Use Small-Sized Official Images
When choosing a Node.js image, you will see there are actually multiple official images. Not only with different version numbers, but also with different operating system distributions:
So the question is: Which one do you choose and why is it important? π€·π»ββοΈ
1) Image Size
β Well, if the image is based on a full-blown OS distribution like Ubuntu or Centos, you will have a bunch of tools already packaged in the image. So the image size will be larger, but you don't need most of these tools in your application images.
β In contrast having smaller images means you need less storage space in image repository as well as on a deployment server and of course you can transfer the images faster when pulling or pushing them from the repository.
2) Security Issue
β In addition to that, with lots of tools installed inside, you need to consider the security aspect. Because such base images usually contain hundreds of known vulnerabilities and basically create a larger attack surface to your application image.
This way you basically end up introducing unnecessary security issues from the beginning to your image! π
β
In comparison by using smaller images with leaner OS distributions, which only bundle the necessary system
tools and libraries, you're also minimizing the attack surface and making sure that you build more secure images.
So the best practice here would be to select an image with a specific version based on a leaner OS distribution like alpine for example:
Alpine has everything you need to start your application in a container, but is much more lightweight. And for most of the images that you look on a Docker Hub, you will see a version tag with alpine distribution inside.
It is one of the most common and popular base images for Docker containers.
4οΈβ£ Best Practice
Optimize caching for image layers when building an image
So what are image layers and what does caching and image layer mean? π€
1) What are Image Layers?
A Docker image is built based on a Dockerfile.
And in a Dockerfile each command or instruction creates an image layer:
So when we use a base image of node alpine like in the above example it already has layers, because it was already built using its own Dockerfile. Plus, in our Dockerfile on top of that we have a couple of other commands that each will add a new layer to this image.
2) Now what about caching?
Each layer will get cached by Docker. π
So when you rebuild your image, if your Dockerfile hasn't changed, Docker will just use the cached layers to build the image.
Advantages of cached image layers:
β
- Faster image building
β
- Faster pulling and pushing of new image versions:
If I pull a new image version of the same application and let's say 2 new layers have been added in the new version: Only the newly added layers will be downloaded, the rest are already locally cached by Docker.
3) Optimize the Caching
So to optimize the caching, you need to know that:
Once a layer changes, all following or downstream layers have to be re-created as well. In other words: when you change the contents of one line in the Dockerfile, caches of all the following lines or layers will be busted and invalidated. π£
So the rule here and the best practice is:
Order your commands in the Dockerfile from the least to the most frequently changing commands to take advantage of caching and this way optimize how fast the image gets built. π
5οΈβ£ Best Practice
Use
.dockerignore
file
Now usually when we build the image, we don't need everything we have in the project to run the application inside. We
don't need the auto-generated folders, like targets
or build
folder, we don't need the readme
file etc.
So how do we exclude such content from ending up in our application image? π€
π Using a .dockerignore
file.
It's pretty straightforward. We basically just create this .dockerignore
file and list all the files and folders that we want to be ignored and when building the image, Docker will look at the contents and ignore anything specified inside.
Improvements:
- Reduced image size
6οΈβ£ Best Practice
Make use of Multi-Stage Builds
But now let's say there are some contents (like development, testing tools and libraries) in your project that you NEED for building the image - so during the
build process - but you DON'T NEED them in the final image itself to run the application.
If you keep these artifacts in your final image even though they're absolutely unnecessary for running the application, it will again result in an increased image size and increased attack surface. π§
So how do we separate the build stage from the runtime stage.
In other words, how do we exclude the build dependencies from the image, while still having them available while building the image? π€·ββοΈ
Well, for that you can use what's called multi-stage builds π‘
The multi-stage builds feature allows you to use multiple temporary images during the build process, but keep only
the latest image as the final artifact:
So these previous steps (marked "1st" in the above picture) will be discarded.
Improvements:
- Separation of Build Tools and Dependencies from what's needed for runtime
- Less dependencies and reduced image size
7οΈβ£ Best Practice
Use the Least Privileged User
Now, when we create this image and eventually run it as a container, which operating system user will be used to start the application inside? π€
By default, when a Dockerfile does not specify a user, it uses a root user. π But in reality there is mostly no reason to run containers with root privileges.
β This basically introduces a security issue, because when container starts on the host it, will potentially have root access on the Docker host.
So running an application inside the container with a root user will make it easier for an attacker to escalate privileges on the host and basically get hold of the underlying host and its processes, not only the container itself π€― Especially if the application inside the container is vulnerable to exploitation.
β
To avoid this, the best practice is to simply create a dedicated user and a dedicated group in the Docker image to run the application and also run the application inside the container with that user:
You can use a directive called USER
with the username and then start the application conveniently.
Tip: Some images already have a generic user bundled in, which you can use. So you don't have to create a new one. For example the node.js image already bundles a generic user called node
, which you can simply use to run the application inside the container. π
8οΈβ£ Best Practice
Scan your Images for Security Vulnerabilities
Finally, how do you make sure and validate the image you build has a few or no security vulnerabilities? π§
So my final best practice is, once you build the image to scan it for security vulnerabilities using the docker scan
command. π
In the background Docker actually uses a service called snyk to do the vulnerability scanning of the images. The scan uses a database of vulnerabilities, which gets constantly updated.
Example output of docker scan
command:
You see:
1) the type of vulnerability,
2) a URL for more information
3) but also what's very useful and interesting you see which version of the relevant library actually fixes that vulnerability. So you can update your libraries to get rid of these issues. π
Automate the scanning π
In addition to scanning your images manually with docker scan
command on a CLI, you can also configure Docker Hub to scan the images automatically, when they get pushed to the repository. And of course you can integrate this check in your CI/CD pipeline when building your Docker images.
So these are 8 production best practices that you can apply today to make your Docker images leaner and more secure! ππ Hope it is helpful for some of you! Of course there are many more best practices related to Docker, but I think applying these will already give you great results when using Docker in production.
Do you know some other best practices, which you think are
super important and have to be mentioned?
Please share them in the comments for others π π
The full video is available here: π€
Like, share and follow me π for more content: