We continue our "Kubernetes in a Nutshell" journey and this part will cover Kubernetes Volumes! You will learn about:
- Overview of
Volume
s and why they are needed - How to use a
Volume
- Hands-on example to help explore
Volume
s practically
The code is available on GitHub
Happy to get your feedback via Twitter or just drop a comment!
Pre-requisites:
You are going to need minikube
and kubectl
.
Install minikube
as a single-node Kubernetes cluster in a virtual machine on your computer. On a Mac, you can simply:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64 \
&& chmod +x minikube
sudo mv minikube /usr/local/bin
Install kubectl
to interact with yur AKS cluster. On a Mac, you can simply:
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
Overview
Data stored in Docker containers is ephemeral i.e. it only exists until the container is alive. Kubernetes can restart a failed or crashed container (in the same Pod
), but you will still end up losing any data which you might have stored in the container filesystem. Kubernetes solves this problem with the help of Volume
s. It supports many types of Volume
s including external cloud storage (e.g. Azure Disk, Amazon EBS, GCE Persistent Disk etc.), networked file systems such as Ceph, GlusterFS etc. and others options like emptyDir
, hostPath
, local
, downwardAPI
, secret
, config
etc.
How are Volumes used?
Using a Volume
is relatively straightforward - look at this partial Pod
spec as an example
spec:
containers:
- name: kvstore
image: abhirockzz/kvstore:latest
volumeMounts:
- mountPath: /data
name: data-volume
ports:
- containerPort: 8080
volumes:
- name: data-volume
emptyDir: {}
Notice the following:
-
spec.volumes
- declares the available volume(s), itsname
(e.g.data-volume
) and other (volume) specific characteristics e.g. in this case, its points to an Azure Disk -
spec.containers.volumeMounts
- it points to a volume declared inspec.volumes
(e.g.data-volume
) and specifies exactly where it wants to mount that volume within the container file system (e.g./data
).
A Pod
can have more than one Volume
declared in spec.volumes
. Each of these Volume
s is accessible to all containers in the Pod
but it's not mandatory for all the containers to mount or make use of all the volumes. If needed, a container within the Pod
can mount more than one volume into different paths in its file system. Also, different containers can possibly mount a single volume at the same time.
Another way of categorizing Volumes
I like to divide them as:
-
Ephemeral -
Volume
s which are tightly coupled with thePod
lifetime (e.g.emptyDir
volume) i.e. they are deleted if thePod
is removed (for any reason). -
Persistent -
Volume
s which are meant for long term storage and independent of thePod
or theNode
lifecycle. This could beNFS
or cloud based storage in case of managed Kubernetes offerings such as Azure Kubernetes Service, Google Kubernetes Engine etc.
Let's look at emptyDir
as an example
emptyDir volume in action
An emptyDir
volume starts out empty (hence the name!) and is ephemeral in nature i.e. exists only as long as the Pod
is alive. Once the Pod
is deleted, so is the emptyDir
data. It is quite useful in some scenarios/requirements such as a temporary cache, shared storage for multiple containers in a Pod
etc.
To run this example, we will use a naive, over-simplified key-value store that exposes REST APIs for
- adding key value pairs
- reading the value for a key
Here is the code if you're interested
Initial deployment
Start minikube
if already not running
minikube start
Deploy the kvstore
application. This will simply create a Deployment
with one instance (Pod
) of the application along with a NodePort
service
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/volumes-1/kvstore.yaml
To keep things simple, the YAML file is being referenced directly from the GitHub repo, but you can also download the file to your local machine and use it in the same way.
Confirm they have been created
kubectl get deployments kvstore
NAME READY UP-TO-DATE AVAILABLE AGE
kvstore 1/1 1 1 28s
kubectl get pods -l app=kvstore
NAME READY STATUS RESTARTS AGE
kvstore-6c94877886-gzq25 1/1 Running 0 40s
It's ok if you do not know what a
NodePort
service is - it will be covered in a subsequent blog post. For the time being, just understand that it is a way to access our app (REST endpoint in this case)
Check the value of the random port generated by the NodePort
service - You might see a result similar to this (with different IPs, ports)
kubectl get service kvstore-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kvstore-service NodePort 10.106.144.48 <none> 8080:32598/TCP 5m
Check the PORT(S)
column to find out the random port e.g. it is 32598
in this case (8080
is the internal port within the container exposed by our app - ignore it)
Now, you just need the IP of your minikube
node using minikube ip
This might return something like
192.168.99.100
if you're using a VirtualBox VM
In the commands that follow replace host
with the minikube VM IP and port
with the random port value
Create a couple of new key-value pair entries
curl http://[host]:[port]/save -d 'foo=bar'
curl http://[host]:[port]/save -d 'mac=cheese'
e.g.
curl http://192.168.99.100:32598/save -d 'foo=bar'
curl http://192.168.99.100:32598/save -d 'mac=cheese'
Access the value for key foo
curl http://[host]:[port]/read/foo
You should get the value you had saved for foo
- bar
. Same applies for mac
i.e. you'll get cheese
as its value. The program saves the key-value data in /data
- let's confirm that by peeking directly into the Docker container inside the Pod
kubectl exec <pod name> -- ls /data/
foo
mac
foo
, mac
are individual files named after the keys. If we dig in further, we should be able to confirm thier respective values as well
To confirm value for the key mac
kubectl exec <pod name> -- cat /data/mac`
cheese
As expected, you got cheese
as the answer since that's what you had stored earlier. If you try to look for a key which you haven't store yet, you'll get an error
cat: can't open '/data/moo': No such file or directory
command terminated with exit code 1
Kill the container ;-)
Alright, so far so good! Using a Volume
ensures that the data will be preserved across container restarts/crash. Let's 'cheat' a bit and manually kill the Docker container.
kubectl exec [pod name] -- ps
PID USER TIME COMMAND
1 root 0:00 /kvstore
31 root 0:00 ps
Notice the process ID for the
kvstore
application (should be1
)
In a different terminal, set a watch on the Pods
kubectl get pods -l app=kvstore --watch
We kill our app process
kubectl exec [pod name] -- kill 1
You will notice that the Pod will transition through a few phases (like Error
etc.) before going back to Running
state (re-started by Kubernetes).
NAME READY STATUS RESTARTS AGE
kvstore-6c94877886-gzq25 1/1 Running 0 15m
kvstore-6c94877886-gzq25 0/1 Error 0 15m
kvstore-6c94877886-gzq25 1/1 Running 1 15m
Execute kubectl exec <pod name> -- ls /data
to confirm that the data in fact survived inspite of the container restart.
Delete the Pod!
But the data will not survive beyond the Pod's lifetime. To confirm this, let's delete the Pod
manually
kubectl delete pod -l app=kvstore
You should see a confirmation such as below
pod "kvstore-6c94877886-gzq25" deleted
Kubernetes will restart the Pod
again. You can confirm the same after a few seconds
kubectl get pods -l app=kvstore
you should see a new
Pod
inRunning
state
Get the pod name and peek into the file again
kubectl get pods -l app=kvstore
kubectl exec [pod name] -- ls /data/store
As expected, the /data/
directory will be empty!
The need for persistent storage
Simple (ephemeral) Volume
s live and die with the Pod
- but this is not going to suffice for a majority of applications. In order to be resilient, reliable, available and scalable, Kubernetes applications need to be able to run as multiple instances across Pods and these Pods themselves might be scheduled or placed across different Nodes in your Kubernetes cluster. What we need is a stable, persistent store that outlasts the Pod
or even the Node
on which the Pod is running.
As mentioned in the beginning of this blog, it's simple to use a Volume
- not just temporary ones like the one we just saw, but even long term persistent stores.
Here is a (contrived) example of how to use Azure Disk as a storage medium for your apps deployed to Azure Kubernetes Service.
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
volumes:
- name: logs-volume
azureDisk:
kind: Managed
diskName: myAKSDiskName
diskURI: myAKSDiskURI
containers:
- image: myapp-docker-image
name: myapp
volumeMounts:
- mountPath: /app/logs
name: logs-volume
So that's it? Not quite! 😉 There are limitations to this approach. This and much more will be discussed in the next part of the series - so stay tuned!
I really hope you enjoyed and learned something from this article 😃😃 Please like and follow if you did!