Before diving into what a Dockerfile is, let's conduct a small experiment.

Remember the Python image we pulled in the previous section? Let's first enter the container using the docker run -it command. Recall what the -it parameter does—it assigns a pseudo-TTY to the container, allowing the user to interact with it.

Once inside the container, let's list all the files in the current directory:

[root@novita ~]# docker run -it python:3.7 /bin/bash
root@692f87774bf7:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@692f87774bf7:/#

Next, we'll create a file named hello.py and then list all the files again:

root@692f87774bf7:/# touch hello.py
root@692f87774bf7:/# ls
bin  boot  dev  etc  hello.py  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

You can see that hello.py has been created. Now, let's exit the container and re-enter the interactive mode of the container using the same

docker run -it python:3.7 /bin/bash

command, followed by listing all the files:

root@692f87774bf7:/# exit
exit
[root@novita ~]# docker run -it python:3.7 /bin/bash
root@65c767655e8a:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@65c767655e8a:/#

You've probably noticed the issue by now. Curiously, the file we created earlier has vanished.

This is because actions performed inside the container do not modify the image. The container merely adds a writable layer on top of the image, much like tracing paper over a calligraphy template—no matter what you write on the tracing paper, it doesn't affect the layer beneath. This feature is somewhat akin to a VM snapshot.

But here's the problem: if actions within the container don't modify the image, how can we achieve rapid deployment without repeating certain operations before each code deployment? That would be quite inefficient.

Suppose you're the tech lead of a social media platform, and suddenly there's a viral event that requires scaling up to hundreds of cloud servers quickly to handle massive user traffic. Your team has chosen Docker for application deployment. If you were to deploy using the above method, users might be disappointed. By the time your service recovers, they might have already moved on.

Enter the Dockerfile. As per Docker's official introduction:

Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build, users can create an automated build that executes several command-line instructions in succession.

In simpler terms, a Dockerfile is a text document containing all the commands needed to build an image from the command line. It's like an instruction manual for assembling a set of blocks into the desired image.

Let's explore how to construct a Dockerfile from a practical scenario, using a Python image as an example. On my local machine (outside the container), there's a Python script named Optimal_Hotel_Matching.py. It's a web scraping program. Let's try running it inside the container:

As expected, there's an error because the requests HTTP library, which is a third-party module, isn't installed in the Python image. This is a typical scenario where we need to set up a dependent environment. Let's attempt to solve this using a Dockerfile.

Before writing our first Dockerfile, let's familiarize ourselves with some common Dockerfile instructions:

FROM: Specifies the base image for the build.
MAINTAINER: Describes the creator of the image with their name and email.
RUN: One of the most crucial commands in a Dockerfile. It executes commands in the container and commits the results.
CMD: Specifies the command to be executed when the container starts.
COPY: Copies files from the host into the container's filesystem.
WORKDIR: Sets the working directory for any RUN, CMD, or ENTRYPOINT commands that follow in the Dockerfile.

Now, let's write our first Dockerfile to address our needs: installing the requests library and running the script.

Here's a possible solution:

FROM python:3.7                             # Specify the base image as python:3.7
MAINTAINER ultra "tech@novita.ai"     # Image creator's name and email
RUN pip install requests                  # Install the requests library
WORKDIR /dockerfileTest                   # Set the working directory
COPY . .                                  # Copy the current directory contents into the container
CMD ["python", "Optimal_Hotel_Matching.py"]  # Command to run the script

Save the above content in a file named Dockerfile in your project directory.

Now, let's build the image using the docker build command, tagging it for easy identification:

[root@novita dockerfileTest]# docker build --tag python:requests .
Sending build context to Docker daemon  5.12 kB
...

After the build completes, use the docker images command to verify:

[root@novita dockerfileTest]# docker images
REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
python                  requests            04f20acdc288        2 hours ago         926 MB
...

You can see that a new Python image tagged as requests has been generated locally. Try running this image to see if the No module named 'requests' error persists, and if the program runs as expected.

[root@novita dockerfileTest]# docker run python:requests
Computing distance between 116.368816,39.866464  and  116.438946,39.921624
Computing distance between 116.370910,39.869603  and  116.438946,39.921624
...

Success! The container ran the code and exited.

Is this the end?

In this piece, I've shared a simple application of Dockerfiles. In actual enterprise production environments, Dockerfiles tend to be more complex.

How to write Dockerfile for beginners