Introduction

Docker is a set of software and services that enable virtualization at the level of the operating system; this is also known as containerization.

Docker allows developers to package applications, together with their dependencies and configuration settings, into (virtual) containers that can run on any Linux, Windows or MacOS machine, be it on a desktop, in the cloud, or on the node of an IoT network.

When a Docker container is run on Linux, Docker leverages the Linux kernel and the Overlay File System to ensure the process is isolated and its computing resources are limited. Since no hardware virtualization is involved, this has a very low overhead. On MacOS and Windows, a lightweight virtual machine is provisoned within which Docker is run and containers are executed.

What is Docker

The ability to run containers transparently on any OS gives developers a unifying experience regardless where they develop. Once an application is developed and packaged into a Docker image, the developer can be sure the application will run anywhere where docker runs.

The best way to see how one can use Docker to streamline and unify development and deployment is to use it.

To follow along with examples, you'll need to have Docker installed. You can install all required tools by installing Docker Desktop.

Packaging an application into a Docker image

To test drive Docker, let's implement a simple web application and package it as a docker image–or containerize it.

The web application

First, let's implement the app and run it natively.

We'll create a simple HTTP end-point that takes a query parameter named path and returns the list of files under location specified by the parameter. (Security wise, having an app that serves the filesystem contents without authorization is generally a bad idea; we're using it here merely to demonstrate the isolation that is offered by Docker.)

We'll implement this in Python using the Falcon web framework and the Gunicorn application server. Here's the code.

import json
import os

import falcon


class FileBrowser:
    def on_get(self, req, resp):
        if "path" not in req.params:
            resp.status = falcon.HTTP_400
            resp.text = "Missing path parameter!"
            return

        path = req.params["path"]

        if not os.path.isdir(path):
            resp.status = falcon.HTTP_404
            resp.text = "Path '%s' does not exist" % path
            return

        try:
            files = json.dumps(os.listdir(path))
            resp.status = falcon.HTTP_200
            resp.text = files
        except Exception as e:
            resp.status = falcon.HTTP_500
            resp.text = "Unexpected error: '%s'" % e


app = falcon.App()
app.add_route('/', FileBrowser())

This is the entire code which we save into fileapi.py. The web application accepts GET requests to the root endpoint /, where the following logic takes place.

  1. If the query parameter is missing, a 400 Bad Request response is returned;
  2. If the query parameter is present, but it points to a non-existing path, a 404 Not Found response is returned;
  3. If the query parameter points to a valid directory path, the list of files is obtained and returned as a JSON array; the default content-type in Falcon is application/json.
  4. However, if an error occurs during the file listing, a 500 Internal Server Error response is returned.

To run this application, we need a system that has Python, the Falcon framework and the gunicorn web application server installed.

On a typical Debian-based system one would install them with sudo apt install python3 python3-pip and then use pip to further install python dependencies, for instance pip3 install falcon gunicorn. Alternatively, we could also use the venv command to create a Python virtual environment. Needless to say, this process is different, if one is using MacOS or Windows.

Finally, we can run the application by issuing gunicorn fileapi:app --bind 127.0.0.1:8000. This will use the gunicorn application server to start the application in file fileapi.py and listen on the loopback interface; make sure the command is run from the same directory as the said file.

Next, let's test the server with cURL.

$ curl -i "localhost:8000/"
HTTP/1.1 400 Bad Request
Server: gunicorn
Date: Wed, 21 Sep 2022 13:27:42 GMT
Connection: close
content-length: 23
content-type: application/json

Missing path query parameter!

$ curl -i "localhost:8000/?path=/not-a-dir"
HTTP/1.1 404 Not Found
Server: gunicorn
Date: Wed, 21 Sep 2022 13:28:40 GMT
Connection: close
content-length: 32
content-type: application/json

Path '/not-a-dir' does not exist

$ curl -i "localhost:8000/?path=/usr"
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 21 Sep 2022 13:29:35 GMT
Connection: close
content-length: 78
content-type: application/json

["lib", "include", "libexec", "local", "sbin", "src", "share", "games", "bin"]

$ curl -i "localhost:8000/?path=/root"
HTTP/1.1 500 Internal Server Error
Server: gunicorn
Date: Wed, 21 Sep 2022 13:30:53 GMT
Connection: close
content-length: 57
content-type: application/json

Unexpected error: '[Errno 13] Permission denied: '/root''

Looks like the server is working. Now let's package all of this into a Docker image.

The Docker file

To create a Docker image, we have to provide a set of instructions that will build it. These instructions are provided with a Dockerfile.

Create a new file named Dockerfile in the same directory as the fileapi.py and populate it with the following.

FROM python:3.10.7-alpine3.16

WORKDIR /app
RUN pip install gunicorn==20.1.0 falcon==3.1.0
COPY . .
EXPOSE 8000
CMD ["gunicorn", "fileapi:app", "--bind", "0.0.0.0:8000"]

These six lines define the entire image. Let's unpack them line-by-line.

  1. The command FROM sets the base image to use. While we could start with an empty image, we are going to leverage one of many pre-configured Python images from the DockerHub.

    In our case, we are picking Python version 3.10.7 and the supporting libraries that are part of the Alpine Linux distribution.

    (This does not mean, that we'll be running the Alpine Linux in a virtual machine, only that the libraries packaged in the image will come from the said Linux distribution.)

    We are selecting Alpine Linux because of its small disk footprint.

  2. The WORKDIR command sets the working directory inside the image. If the directory does not exist, it will be created; this will be the location of our application.

  3. We install required Python dependencies with the RUN command.

    Here we are pinning the libraries to specific versions. This is good practice, since we know that our application works fine, if Python is 3.10.7, gunicorn is 20.1.0 and the falcon is 3.1.0.

    (If we had many such dependencies, it would be better to use the requirements.txt file, but let's keep things simple for now.)

  4. Next we use COPY . . to copy all resources from the current directory on the host computer to the working directory (/app) in the image.

    As it currently stands, the command will copy all files from the host which is often undesirable; we show how to list exclusions a bit later.

  5. The EXPOSE 8000 command will allow services running on port 8000 in the container to be accessible to other processes inside the container.

    While in our case no other container processes will access this service (there will only be a single process running in container), the command is still needed, because processes from outside of the container will access the service. But we will have to provide additional commands to allow this.

  6. And finally, the CMD command specifies the command that runs when the container is started.

    In our case the command is gunicorn fileapi:app --bind 0.0.0.0:8000; we changed the IP from loopback device to all interfaces. The reason is that the container will have to listen on all interfaces if we want to access it from the host computer.

    If we used the container's loopback device, we would be unable to reach it from the host computer, since the loopback device inside the container is different than loopback device on the host computer.

To exclude certain files from being copied from the host into the image (command COPY . . in step 4 above), create a file called .dockerignore and populate it with the following.

.*
__pycache__/

These two lines instruct the Docker COPY command to ignore all hidden files (files starting with dot .), and the __pycache__ directory.

Building the Docker image

Now we are ready to build the image. Inside the directory that contains the Dockerfile, issue the following command.

$ docker build -t file-api .
...
Successfully tagged file-api:latest

The command builds the image and tags it file-api. During the build, all required dependencies are also installed. We can get the list of images that are available on our system as follows.

$ docker images
REPOSITORY                TAG                               IMAGE ID            CREATED             SIZE
file-api                  latest                            fbbf93095fb6        3 minutes ago       63.2MB
python                    3.10.7-alpine3.16                 4da4c1dc8c72        13 days ago         48.7MB

Running the container

Now that the image has been built, we can run it and create a container.

$ docker run -p 127.0.0.1:5000:8000 file-api
[2022-09-21 15:07:50 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-09-21 15:07:50 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2022-09-21 15:07:50 +0000] [1] [INFO] Using worker: sync
[2022-09-21 15:07:50 +0000] [6] [INFO] Booting worker with pid: 6

We have now run the image file-api and started the container. Docker is mapping address 127.0.0.1:5000 on the host to 0.0.0.0:8000 in the container; this was achieved with the -p 127.0.0.1:5000:8000 switch.

If we open a new terminal—the container is running in the current one—and issue a few GET requests, we should get familiar responses.

$ curl -i "localhost:5000/?path=/usr"
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 21 Sep 2022 15:12:34 GMT
Connection: close
content-length: 47
content-type: application/json

["lib", "local", "sbin", "share", "bin", "src"]

$ curl -i "localhost:5000/?path=/root"
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 21 Sep 2022 15:12:54 GMT
Connection: close
content-length: 29
content-type: application/json

[".cache", ".python_history"]

However, notice how the contents of the /usr and /root are now different. This is because the app is now running inside the container which has its own filesystem and directory structure and is isolated from the host computer.

Moreover, applications inside the container are run as root by default; this is why accessing /root is now allowed. However, in certain situation, there might be good security reasons to avoid this. But this is a material for another topic.

We can now query the Docker to see which containers are running.

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                     NAMES
ee841890bee4        file-api            "gunicorn fileapi:ap…"   13 minutes ago      Up 13 minutes       127.0.0.1:5000->8000/tcp                                  cool_johnson

Since we did not name the container explicitly, Docker came up with a random name cool_johnson. If we press CTRL+C in the terminal that is running the container, the container should stop. Now we have to run docker ps -a to see the list of all containers, stopped and running.

To delete the container, run docker rm cool_johnson. You might have to change the name, since it is unlikely that yours is also called cool_johnson.

Managing more complex setups with docker compose

Web applications often consist of multiple services: an application server, a database, a cache layer, a background task system and so on. To make our example more realistic, let's add another service to the web application.

Suppose we bench-marked our system and found out that the operation that lists the contents of a directory is rather slow. Since our filesystem rarely changes, if ever, we decide to implement a simple cache mechanism using Redis database.

Redis is an in-memory fast key-value store. As keys, we'll store the query path values, and as the corresponding values, we'll store the list of files under given paths.

Next, we'll change the application so that when a request to a valid path is received, it will first consult redis if it contains the key under given path, and if so, it will serve the cached contents.

If the key does not exist, the application will list the files from the filesystem, serve them to the client, and save the result to the cache. Consequentely, all subsequent requests to the same path should then be fetched from the cache and not from the slow filesystem.

This modification will introduce a new service to our application set-up and add complexity: we have to modify the application to use the Redis database, we have to create another container that will run it, and we have to connect both containers.

Application modification

The modifications to the web application are rather straightforward; below we list the entire Falcon application that uses Redis for caching. Let's modify fileapi.py to contain the following code.

import json
import os

import falcon
import redis


class FileBrowser:
    def __init__(self, cache):
        self.cache = cache

    def on_get(self, req, resp):
        if "path" not in req.params:
            resp.status = falcon.HTTP_400
            resp.text = "Missing path query parameter!"
            return

        path = req.params["path"]

        cached = self.cache.get(path)
        if cached:
            resp.status = falcon.HTTP_200
            resp.text = cached
            return

        if not os.path.isdir(path):
            resp.status = falcon.HTTP_404
            resp.text = "Path '%s' does not exist" % path
            return

        try:
            files = json.dumps(os.listdir(path))
            resp.status = falcon.HTTP_200
            resp.text = files
            self.cache.set(path, files)
        except Exception as e:
            resp.status = falcon.HTTP_500
            resp.text = "Unexpected error: '%s'" % e


app = falcon.App()
redis_cache = redis.Redis(host='redis-cache', port=6379, db=0, decode_responses=True)
app.add_route('/', FileBrowser(redis_cache))

Notice how we set the address of the Redis database to redis-cache; this is an actual hostname that will be assigned to the container that will run the Redis database.

Because our modifications also add a new Python dependency, namely Python libraries that connect to Redis, we have to update the Dockerfile.

FROM python:3.10.7-alpine3.16

WORKDIR /app
RUN pip install gunicorn==20.1.0 falcon==3.1.0 redis==4.3.4
COPY . .
EXPOSE 8000
CMD ["gunicorn", "fileapi:app", "--bind", "0.0.0.0:8000"]

The only change is in the RUN command that now additionally installs Python redis bindings.

Setting up additional docker containers

Next, we have to spin-up another container that will run the Redis database, and link it with the web application container.

While we could do all these things manually with multiple but separate commands, we can package everything into a docker-compose.yml that specifies all required services, their dependencies, configuration, and start-up sequence. And then we can start our application with a single command.

Here is the docker-compose.yml that we'll need.

version: "3"

services:
  redis-cache:
    image: redis:7.0.4-alpine3.16
    restart: always
    expose:
      - 6379
  falcon-webapp:
    restart: always
    build: .
    image: file-api
    ports:
      - 127.0.0.1:5000:8000
    depends_on:
      - redis-cache

Let's parse the contents line-by-line.

  1. First we have to specify the schema version; it needs to be provided as a string.

  2. Next, we specify the list of services, or containers, that will run in this setup; this is defined with the service keyword.

  3. We are naming the first service redis-cache. The container will be assigned an interal IP and redis-cache will be its hostname; recall the Python code.

    As with Python image, we browse the Dockerhub for Redis images and pin it to a specific version (7.0.4) and environment (Alpine Linux 3.16). We expose port 6379 which Redis uses by default. If the container unexpectedly stops, Docker will attempt to restart it.

  4. Finally, we define the web application service and name it falcon-webapp.

    This service container gets created from the image defined in the Dockerfile defined above. It needs to be in the same directory as the docker-compose.yml, hence build: ..

    Next, we set the name of the image to be built to file-api, we set the port forwarding to allow the host computer to access the container on localhost:5000, and we require the redis-cache service to be online before the falcon-webapp.

Running and inspecting services

Once the docker-compose.yml is ready, we build required images; in our case, only image file-api, image for Redis will get downloaded when we start the application.

$ docker compose build
[+] Building 3.2s (9/9) FINISHED
 => [internal] load build definition from Dockerfile                                                 0.8s
 => => transferring dockerfile: 32B                                                                  0.0s
 => [internal] load .dockerignore                                                                    1.2s
 => => transferring context: 34B                                                                     0.0s
 => [internal] load metadata for docker.io/library/python:3.10.7-alpine3.16                          0.0s
 => [1/4] FROM docker.io/library/python:3.10.7-alpine3.16                                            0.0s
 => [internal] load build context                                                                    0.6s
 => => transferring context: 100B                                                                    0.0s
 => CACHED [2/4] WORKDIR /app                                                                        0.0s
 => CACHED [3/4] RUN pip install gunicorn==20.1.0 falcon==3.1.0 redis==4.3.4                         0.0s
 => CACHED [4/4] COPY . .                                                                            0.0s
 => exporting to image                                                                               0.6s
 => => exporting layers                                                                              0.0s
 => => writing image sha256:eb8a14d65dcfe9c29ce1ae5020a3f15ea01ac307941aaba5c45101c11cf47bc7         0.0s
 => => naming to docker.io/library/file-api                                                          0.0s

To run the application, we issue docker compose up -d. The command up -d means start containers in detached mode–in the background. Once the application is running, we can issue requests as before.

$ curl -i "localhost:5000/?path=/root"
HTTP/1.1 200 OK
Server: gunicorn
Date: Thu, 22 Sep 2022 11:51:26 GMT
Connection: close
content-length: 29
content-type: application/json

[".cache", ".python_history"]

$ curl -i "localhost:5000/?path=/home"
HTTP/1.1 200 OK
Server: gunicorn
Date: Thu, 22 Sep 2022 11:51:36 GMT
Connection: close
content-length: 2
content-type: application/json

[]

The following commands are often useful:

  • docker compose down – stops all containers,
  • docker compose logs – shows logs from all containers,
  • docker compose exec <service> <command> – executes the command in the container that is running the given service.

To demonstrate how we can attach to a container and run commands in it, let's examine the contents of the Redis database. First, we attach to the redis container as follows.

$ docker compose exec redis-cache sh
/data # ps ax
PID   USER     TIME  COMMAND
    1 redis     0:00 redis-server *:6379
   22 root      0:00 sh
   28 root      0:00 ps ax
/data #

Running docker compose exec redis-cache sh will execute the sh (or shell) binary within the container running the redis-cache service which effectively gives us shell access.

If we execute ps ax, we see that besides the processes that we are running—namely sh and ps ax—the only other process is redis-server listening on port 6379 on all interfaces. Let's exit by pressing CTRL+D.

To inspect the contents of Redis, issue the following command on the host computer.

$ docker compose exec redis-cache redis-cli
127.0.0.1:6379> keys *
1) "/home"
2) "/root"
127.0.0.1:6379> get /root
"[\".cache\", \".python_history\"]"
127.0.0.1:6379> get /home
"[]"
127.0.0.1:6379>

Now we directly run redis-cli to jump Redis command-line prompt inside the container. Then we list all keys using keys * and then inspect the contents under keys /root and /home. We close the prompt with CTRL+D.

Conclusion

While this was everything but short, it only scraped the surface of what Docker is. For further reference, consider visiting the Docker documentation.

Glossary

CDN

A CDN, or "Content Delivery Network," is a network of servers (typically placed around the world) used to deliver content (such as videos, photos, and CSS).