Review a Docker image

Problem

Consider we have a Docker image from dockerhub that we know nothing about. How can we ensure that no obvious malware exists in that image, and may put at risk the usage we'll do from it?

This article is only about my view and methods. Any remark and contradiction is welcome via Twitter.

Two steps (from hell)

There are two steps in this process of "secure-reviewing" the Docker image:

Pre-review

The first step is to ensure that the image is acceptable enough to be used. This means that we want to be sure that no malware exists in the image, and that the image is "sane" enough to be used.
So the goal of this step is to make sure we're not installing a malware.

Post-review

The second step is to ensure that the image has no exploitable vulnerability. Indeed, even if the image is "sane" (from the first step), we must also ensure that no external threat actor may exploit and abuse the Docker image we'll use. This second step is not "frozen" in time: it must be done all along the image's life cycle, because a new CVE may be released at any moment, impacting any image that used to be considered "non-vulnerable".
So the goal of this second step is to ensure the image will not be abused along its lifecycle.

In this article, I'll focus on gathering methods for the first step. My goal is to gather here technics for reviewing Docker image and ensuring that no backdoor, malware and so on exists in that image, prior to use it.

Downloading content

In order to review the image, the first step is to download it. But I don't want to use a docker pull repo/name:tag command because I don't want to risk crushing my Docker daemon locally. So I want to download the docker layers instead.

Roughly said, Docker layers (as detailed in the doc) are similar to "diff" of the files in the "virtual Docker OS" for each construction step.

We'll use download-frozen-image script as described here

I've also tried the skopeo package from there, but it wasn't working as expected. Same for the docker_pull.py script, which was not very usable. This download-frozen-image script is better.

Once the script has been run, we get the complete list of layers, each being stored in a directory with a hashed name

Each layer directory contains a "layer.tar" file that you must extract in order to access the actual files content. A command to do so may be for i in ./*/*.tar; do echo "$i"; done.

Review

Now that we have the layers and have extracted their files, we must review their content. We can use any classic forensic.

Review the commands

First, you should review the construction commands. These are in the JSON file of the downloaded main directory, so a command like cat scan/b38d20dc45df4b90927e70e299840c351a0abb759a9bf6788e683498a45b5dd6.json|jq should work.

You also have the same construction commands in the dockerhub directly, in the tags tab then clicking the digest . But in these commands, you may have some FILE that you know nothing about, or some wget that are then run without you knowingthe downloaded-and-executed content.

In this example, construction command shows an HTTP request to get an APT key, so we have no idea whether the APT key is the correct one: a man-in-the-middle could have intercepted it and may inject some arbitrary responses in the apt packages installed afterward.
Another example is this pip install from an arbitrary git repo, that could have been a rogue one.
Or, it may be an arbitrary python wheel downloaded from a weird domain (not box.nvidia.com but nvidia.box.com)
After some rewearches, it seems that this is an actual valid Nvidia wheels source. Not just because of this homepage…
…but also because of the other references to this domain we may find on the Nvidia support forums.
Another type of command that is too much "arbitrary" for you to be able to review it from DockerHub are the python/bash run scripts. They require accessing the layer to review what the script actually did.
In this case, investigating the layers content and comparing with online sources shown that the script was the legit script from the mmdeploy GitHub repository.

Review the content

Since we now have the actual Docker layers content, we can start reviewing them for IOCs, like searching for URLs/domains and seeing some surprising China-based ones. A suggested simple command for that may be grep -horaisE 'https?://[a-zA-Z0-9_.-]+'|sort -u.

That command outlines some Huawei domains, as shown in this file.
But this is actually part of the openCV backend, so, it's not a surprise to see it there. Despite, we don't know if that entire thing could be backdoored: we just ensured this was not something injected in the image by the image maker.
We may also find some China-based domains because several contributors for the packages involved in this image example are from China. So, no surprise so far.
What should also be review are the licences used in the installed packages. This could be done by smarted automated tools, but a simple find . -name 'LICENSE' (or a grep) may quickly return some commercial-incompatible ones.
Installed python packages should also be reviewed, to check that each of them has a business-need to be installed, and that none is in a backdoored version.

I may add other IOCs to look for in this article, later on. Stay tuned!

Review the CVEs

Now that we know this image seems "sane" and not a malware one, we may check if it has some CVE. What's sad is that DockerHub does not allow for seeing the vulnerability report on an uploaded package, unless you're the package's uploader. So the only way I found was to reupload the package.
Dumb, but working.

Dockerhub says you may sign up, then create a private repo, then activate the Docker Scout on it to review CVEs for the uploaded images. Let's do so
# We first install docker if not already installed sudo apt show docker.io sudo apt install docker.io # Then pull the image sudo docker pull mraheld/mmdeploy:v0.0.1 # Then we tag it with our username/repository_name:whatever sudo docker image tag mraheld/mmdeploy:v0.0.1 yaunxenos/scanner:mraheldmmdeploy # We need to log into the Dockerhub (--password is optional and will be prompted) sudo docker login --username=yaunxenos --password=... # And push the image sudo docker push yaunxenos/scanner:mraheldmmdeploy # For next images, it will be simpler as we will only need a pull, image tag, push sudo docker pull iiiii/something sudo docker image tag iiiii/something yaunxenos/scanner:iiiii sudo docker push yaunxenos/scanner:iiiii
Commands to use for installing Docker, pulling the image, log into Dockerhub, and pushing it. Don't forget to create the repository and enabled Docker Scout on it first
A few minutes after the push, the scan result is provided, and we have the CVE of the images.
Details are given for every construction command and CVE.

This is a one-shot solution. It's only meant to ensure no critical CVE exists on the image and may put its user at risk. A lifecycle solution is recommended for actual Docker image security, but it is out of scope for this short article.

I think it's a shame Docker Hub does not allow seeing the CVEs of existing packages. Making this more open would be far more efficient than reuploading images like I did.