This is part two in a series on taking a simple Python project from local script to production. In part one I talked about a gotcha I ran into when converting an old project from Python 2 to Python 3.
This part will go over how I put my Python process, its inputs, and its outputs into a Docker container and made an image publicly available on Dockerhub.
Requirements that I will not go over here. Go to Docker.com and follow the instructions there
- Download docker
- Create a docker id
- Log in with your docker id on Dockerhub
What is Docker?
Docker is a containerization platform. Containerization is a way to package units of code with their dependencies so that they have everything they need to run in isolation.
Using Docker can help fix the "it works on my machine" problem, and writing dockerized code is a great way to encourage thoughtful code practices. Docker containers should be simple, responsible for as little as possible, and dependent on as few externals as possible.
Docker image vs docker container
Throughout this post, and online, you'll see the terms container
and image
. An image
is basically a snapshot of your dockerized code that is created when you use the docker build
command - more on that below. Docker images start a container when you use docker run
on that image. So a container
is a running instance of an image
.
Anatomy of a Dockerfile
I decided to dockerize my csv writer from the previous post in this series so that I could move it between environments easily.
For this I needed a Dockerfile. A Dockerfile is a text file that does not have a file extension.
Here's what the dockerfile for my Python code looks like:
FROM python:3.7
ARG export_file=goodreads.csv
COPY $export_file goodreads_export.csv
COPY converter.py /
CMD ["python", "./converter.py"]
FROM
The FROM keyword here indicates a dependency. Docker containers don't have languages automatically loaded. To access Python to run the code, we need to instruct the image to include python:3.7
.
A note on Docker registries:
the default Docker registry is Dockerhub. If a docker image is available on Dockerhub, you don't need to specify a url when pulling or pushing from a docker repo. You just need the author's username and the repo name. For example, you can pull the docker image from this post with the commanddocker pull thejessleigh/goodreads-libib-converter
. If you're using a different registry you'll need to tell Docker where to go. For example, if you're using Quay you'd dodocker pull quay.io/example-username/test-docker-repo
.The python dependency in my Dockerfile doesn't have a username because it's an official repo hosted on Dockerhub.
ARG
ARG
declares an argument. It is the only instruction in a Dockerfile that can precede FROM
, although I prefer to have FROM
come first for the sake of consistency.
In the above example, I declare an ARG
export_file
and give it a default. It expects a file called goodreads.csv
in the same directory as the Dockerfile. If I want to pass in something different, I instruct it to use a different filename with --build-arg=export_file=my_goodreads_export.csv
when building the image.
COPY
COPY
and ADD
duplicate the contents of a file into the docker image. This is where I'm importing the input file and also the actual Python code that the Docker image executes.
COPY
takes two arguments:
- the location of the file you're putting into the image
- the location of the file inside the docker image
So whatever file I include as the CSV to convert will be referred to as goodreads_export.csv
inside the Docker container. This is nifty, because it means that no matter what I build the docker image with, the filename will always be consistent. I don't have to worry about making the Python code handle different filenames or paths. It can always look for ./goodreads_export.csv
.
There are some subtle differences between COPY
and ADD
that @ryanwhocodes has already written about, so I'll leave his post here.
Update September 2019: It appears that this post is no longer available on dev.to so I have replaced the embedded post with an archive.org link.
RUN
RUN
issues an instruction that is executed and committed as part of the image. If I were dockerizing a Python project that needed to install external packages, I could use RUN
to pip install
those dependencies. However, converter.py
is a very simple process that doesn't need external packages, so I don't need to run anything as part of my build process.
CMD
There can only be one CMD
instruction per Dockerfile. If the Dockerfile contains multiple CMD
s, only the last one will execute.
CMD
is the command you intend the image to do when you run an instance of it as a container. It is not executed as part of the build process for an image. CMD
is different from RUN
in this way.
Building a docker image
Now we have everything necessary to build a Docker image for our Python code from the Dockerfile.
As stated above, a Docker image
is an inert snapshot of an environment that is ready to execute a command or program, but has not yet executed that command.
To build using the above Dockerfile, we run
docker build --build-arg=export_file=goodreads_export.csv -t goodreads-libib-converter .
--build-arg
tells Docker to build the image with a file called goodreads_export.csv
, overriding the default expectation of goodreads.csv
.
-t goodreads-libib-converter
"tags" the image as goodreads-libib-converter
. This is how you create your container with a human readable REPOSITORY
name.
.
tells Docker to look for a Dockerfile to build in the current directory.
After I do this, I can see that the image was successfully created by checking my image list.
> docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
goodreads-libib-converter latest 1234567890 12 seconds ago 924MB
Running a Docker container
Now that I have an image
, I have a standalone environment capable of running my program, but it hasn't actually executed the core procedure specified with CMD
yet. Here's how I do that:
docker run goodreads-libib-converter
I see the print debugging statements I have in my converter.py
file execute, so I know how many CSV rows are being converted. When I ran the program locally, it created an output file called libib_export.csv
. However, when I check the contents of my directory now, it's not there. How is that useful!?
Accessing Files Written Out
I'm no longer running the Python code in the directory I was before. I'm running it inside the Docker container. Therefore, any files that are written out will also be stored inside the Docker container. The output file doesn't do me much good in there!
I'm running the Docker container locally, so all I have to do is find the container and copy the output file from it's dockerized location to the place I actually want it.
docker cp container_id:/libib_export.csv ~/outputs/libib_export.csv
This extracts the resultant CSV output from converter.py
and puts it somewhere I can access it.
I can figure out the container_id
(or the human readable name) with
> docker ps -a
CONTAINER ID IMAGE COMMAND CREATED NAMES
e00000000000 goodreads-libib-export "python ./converter.…" 24 seconds ago naughty_mcclintock
Yes, naughty_mcclintock is actually the procedurally generated name for the container I've been working with locally.
Copying a file from a container to my desired location is fine for a local environment, but has limited uses if I ever want to take this project to production. There are other, better options for dealing with output files from Docker containers, but we'll get into that ✨ in another installment in this series ✨
Committing a docker image
After we've run the container to confirm that it works, we probably to create a new image based on the changes it made when it executed. We're preparing the image that we want to push up into an external Docker registry, like Dockerhub.
When committing a Docker image, we need to specify the registry (if it's something other than dockerhub), the author name, the repository name, and the tag name.
docker commit -m "Working Python 3 image" naughty_mcclintock thejessleigh/goodreads-libib-converter:python3
My docker commit
was successful, so I see a sha256 hash output in my terminal. Creating a commit message is, of course, optional. But I like to do it to keep organized.
A note on Docker image tags:
When you pull a Docker image and you don't specify a tag it will use the default tag (usuallylatest
). Tags are the way you can keep track of changes in your project without overwriting previous versions. For example, if you (for some reason) are still using Python 2, you can access the Python 2 image by runningdocker pull thejessleigh/goodreads-libib-converter:python2
. Right now the:python3
andlatest
tags on my rocker repo are the same, but you can pull either one.
Pushing a docker image to Dockerhub
Now that I have an image I want to put out into the world, I can push it up to Dockerhub.
First, I need to log into Dockerhub and create a repository. Repositories require a name, and should have a short description which details the purpose of the project, and a long description that explains dependencies, requirements, build arguments, etc. You can also make a Docker repository private.
Once I've done that, I run docker push
, which sends the latest commit of the project and tag I've specified up to the external registry. If you didn't specify a tag, this push will override the latest
tag in your repository.
docker push thejessleigh/goodreads-libib-converter:python3
If you go to my Dockerhub profile you can see the goodreads-libib-converter
project, and pull both the Python 2 and Python 3 incarnations.
Next Steps
Now that I have a working Docker image, I want to put it into production so that anyone can convert their Goodreads library CSV into a Libib library CSV. I'm going to go about this using AWS, which requires a bit of setup.
The next installment in this series will go over setting up an AWS IAM account, setting up awscli
and configuring your local profiles, and creating an s3 bucket that your IAM account can access.
EDIT: Never did get around to that next post in the series. I should do that someday.