本文介绍了Dockerfile中的多运行与单链运行,哪个更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Dockerfile.1 执行多个 RUN

FROM busybox
RUN echo This is the A > a
RUN echo This is the B > b
RUN echo This is the C > c

Dockerfile.2 加入他们:

FROM busybox
RUN echo This is the A > a &&\
    echo This is the B > b &&\
    echo This is the C > c

每个 RUN 创建一个图层,因此我一直认为层越少越好,因此 Dockerfile.2 越好。

Each RUN creates a layer, so I always assumed that less layers is better and thus Dockerfile.2 is better.

RUN 删除以前的 RUN 添加的内容(即 yum install nano& yum清理所有),但是在每个 RUN 添加了某些内容的情况下,我们需要考虑以下几点:

This is obviously true when a RUN removes something added by a previous RUN (i.e. yum install nano && yum clean all), but in cases where every RUN adds something, there are a few points we need to consider:


  1. 层应该只是在前一层之上添加差异,因此,如果后一层没有删除前一层中添加的内容,则不应两种方法之间在磁盘空间节省方面的优势...

  1. Layers are supposed to just add a diff above the previous one, so if the later layer does not remove something added in a previous one, there should be not much disk space saving advantage between both methods...

层是从Docker Hub并行提取的,因此 Dockerfile.1 尽管可能稍大一些,但从理论上讲下载速度会更快。

Layers are pulled in parallel from Docker Hub, so Dockerfile.1, although probably slightly bigger, would theoretically get downloaded faster.

如果添加第四句(即 echo This是D> d )并在本地重建, Dock erfile.1 由于有了缓存,构建速度更快,但是 Dockerfile.2 将不得不再次运行所有4条命令。

If adding a 4th sentence (i.e. echo This is the D > d) and locally rebuilding, Dockerfile.1 would build faster thanks to cache, but Dockerfile.2 would have to run all 4 commands again.

所以,问题是:哪种方法是制作Dockerfile的更好方法?

推荐答案

在可能的情况下,我总是将创建文件的命令与将相同文件删除到单个 RUN 行。这是因为每条 RUN 行都会在图像上添加一层,其输出实际上是可以使用 docker diff 在它创建的临时容器上。如果删除在不同层中创建的文件,则联合文件系统所做的全部工作就是在新层中注册文件系统更改,该文件仍存在于上一层中,并通过网络运送并存储在磁盘上。因此,如果您下载源代码,将其解压缩,将其编译为二进制文件,然后最后删除tgz和源文件,则您确实希望所有这些操作都在单个层中完成以减小图像大小。

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

接下来,我个人根据图层在其他映像中的重用潜力和预期的缓存使用情况对其进行了拆分。如果我有4个映像,并且所有映像都具有相同的基本映像(例如debian),则可以将大多数这些映像的通用实用程序集合拉入第一个运行命令,以便其他映像可以从缓存中受益。

Next, I personally split up layers based on their potential for reuse in other images and expected caching usage. If I have 4 images, all with the same base image (e.g. debian), I may pull a collection of common utilities to most of those images into the first run command so the other images benefit from caching.

在查看映像缓存重用时,Dockerfile中的顺序很重要。我查看的是很少更新的任何组件,可能只有在基本映像更新并将其放在Dockerfile中时才会更新。在Dockerfile的末尾,我包含了所有将快速运行并且可能会经常更改的命令,例如添加具有主机特定UID的用户或创建文件夹并更改权限。如果容器中包含正在积极开发的解释代码(例如JavaScript),则会尽快添加该代码,以便重建仅运行该单个更改。

Order in the Dockerfile is important when looking at image cache reuse. I look at any components that will update very rarely, possibly only when the base image updates and put those high up in the Dockerfile. Towards the end of the Dockerfile, I include any commands that will run quick and may change frequently, e.g. adding a user with a host specific UID or creating folders and changing permissions. If the container includes interpreted code (e.g. JavaScript) that is being actively developed, that gets added as late as possible so that a rebuild only runs that single change.

在这些变更组中,我会尽我所能地将层数减至最少。因此,如果有4个不同的源代码文件夹,则将它们放置在一个文件夹中,以便可以使用单个命令将其添加。如果可能的话,从apt-get之类的软件包安装的任何软件包都将合并到一个RUN中,以最大程度地减少软件包管理器的开销(更新和清理)。

In each of these groups of changes, I consolidate as best I can to minimize layers. So if there are 4 different source code folders, those get placed inside a single folder so it can be added with a single command. Any package installs from something like apt-get are merged into a single RUN when possible to minimize the amount of package manager overhead (updating and cleaning up).

多阶段构建的更新:

我不太担心在非最终版本中减小图像大小多阶段构建的各个阶段。如果未标记这些阶段并将其发送到其他节点,则可以通过将每个命令拆分到单独的 RUN 行中来最大程度地提高缓存重用的可能性。

I worry much less about reducing image size in the non-final stages of a multi-stage build. When these stages aren't tagged and shipped to other nodes, you can maximize the likelihood of a cache reuse by splitting each command to a separate RUN line.

但是,这不是挤压层的完美解决方案,因为您在阶段之间复制的只是文件,而不是其余的图像元数据,例如环境变量设置,入口点和命令。而且,当您在linux发行版中安装软件包时,库和其他依赖项可能会分散在整个文件系统中,从而使所有依赖项的副本都变得很困难。

However, this isn't a perfect solution to squashing layers since all you copy between stages are the files, and not the rest of the image meta-data like environment variable settings, entrypoint, and command. And when you install packages in a linux distribution, the libraries and other dependencies may be scattered throughout the filesystem, making a copy of all the dependencies difficult.

因此,我使用多阶段构建代替在CI / CD服务器上构建二进制文件,因此我的CI / CD服务器只需要具有运行 docker build 的工具,并且没有安装jdk,nodejs,go和其他任何编译工具。

Because of this, I use multi-stage builds as a replacement for building binaries on a CI/CD server, so that my CI/CD server only needs to have the tooling to run docker build, and not have a jdk, nodejs, go, and any other compile tools installed.

这篇关于Dockerfile中的多运行与单链运行,哪个更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 12:25