nvcc 和 NVIDIA-smi 显示的不同 CUDA 版本

本文介绍了nvcc 和 NVIDIA-smi 显示的不同 CUDA 版本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对运行 which nvcc 和 nvidia-smi 显示的不同 CUDA 版本感到非常困惑.我的 ubuntu 16.04 上同时安装了 cuda9.2 和 cuda10.现在我将 PATH 设置为指向 cuda9.2.所以当我跑步时

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run

$ which nvcc
/usr/local/cuda-9.2/bin/nvcc

但是，当我跑步时

$ nvidia-smi
Wed Nov 21 19:41:32 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72       Driver Version: 410.72       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    26W /  N/A |    379MiB /  6078MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1324      G   /usr/lib/xorg/Xorg                           225MiB |
|    0      2844      G   compiz                                       146MiB |
|    0     15550      G   /usr/lib/firefox/firefox                       1MiB |
|    0     19992      G   /usr/lib/firefox/firefox                       1MiB |
|    0     23605      G   /usr/lib/firefox/firefox                       1MiB |

那么我是按照 which nvcc 的建议使用 cuda9.2，还是按照 nvidia-smi 的建议使用 cuda10?我看到了这个答案，但它并没有直接回答混乱，它只是要求我们重新安装 CUDA Toolkit，我已经这样做了.

So am I using cuda9.2 as which nvcc suggests, or am I using cuda10 as nvidia-smi suggests? I saw this answer but it does not provide direct answer to the confusion, it just asks us to reinstall the CUDA Toolkit, which I already did.

推荐答案

CUDA 有 2 个主要 API，运行时和驱动程序 API.两者都有对应的版本(例如 8.0、9.0 等)

CUDA has 2 primary APIs, the runtime and the driver API. Both have a corresponding version (e.g. 8.0, 9.0, etc.)

驱动 API 的必要支持(例如 Linux 上的 libcuda.so)由 GPU 驱动安装程序安装.

The necessary support for the driver API (e.g. libcuda.so on linux) is installed by the GPU driver installer.

对运行时 API 的必要支持(例如 Linux 上的 libcudart.so，以及 nvcc)由 CUDA 工具包安装程序(其中可能还捆绑了一个 GPU 驱动程序安装程序)安装.

The necessary support for the runtime API (e.g. libcudart.so on linux, and also nvcc) is installed by the CUDA toolkit installer (which may also have a GPU driver installer bundled in it).

在任何情况下，(已安装)驱动程序 API 版本可能并不总是与(已安装)运行时 API 版本匹配，尤其是当您独立于安装 CUDA(即 CUDA 工具包)安装 GPU 驱动程序时.

In any event, the (installed) driver API version may not always match the (installed) runtime API version, especially if you install a GPU driver independently from installing CUDA (i.e. the CUDA toolkit).

nvidia-smi 工具由 GPU 驱动程序安装程序安装，通常显示 GPU 驱动程序，而不是由 CUDA 工具包安装程序安装的任何内容.

The nvidia-smi tool gets installed by the GPU driver installer, and generally has the GPU driver in view, not anything installed by the CUDA toolkit installer.

最近(Linux 上的驱动程序版本介于 410.48 和 410.73 之间)NVIDIA 的当权者决定在 nvidia-smi 的输出中添加驱动程序安装的 CUDA 驱动程序 API 版本的报告代码>.

Recently (somewhere between 410.48 and 410.73 driver version on linux) the powers-that-be at NVIDIA decided to add reporting of the CUDA Driver API version installed by the driver, in the output from nvidia-smi.

这与安装的 CUDA 运行时版本无关.

This has no connection to the installed CUDA runtime version.

nvcc，与 CUDA 工具包一起安装的 CUDA 编译器驱动工具，将始终报告它被构建为识别的 CUDA 运行时版本.它不知道安装了什么驱动程序版本，甚至不知道是否安装了 GPU 驱动程序.

nvcc, the CUDA compiler-driver tool that is installed with the CUDA toolkit, will always report the CUDA runtime version that it was built to recognize. It doesn't know anything about what driver version is installed, or even if a GPU driver is installed.

因此，根据设计，这两个数字不一定匹配，因为它们反映了两种不同的事物.

Therefore, by design, these two numbers don't necessarily match, as they are reflective of two different things.

如果您想知道为什么 nvcc -V 会显示您不期望的 CUDA 版本(例如，它显示的版本不是您认为已安装的版本)或在所有，版本方面，这可能是因为您没有遵循 cuda linux 安装指南

If you are wondering why nvcc -V displays a version of CUDA you weren't expecting (e.g. it displays a version other than the one you think you installed) or doesn't display anything at all, version wise, it may be because you haven't followed the mandatory instructions in step 7 (prior to CUDA 11) (or step 6 in the CUDA 11 linux install guide) of the cuda linux install guide

请注意，虽然这个问题主要针对 linux，但相同的概念也适用于 windows CUDA 安装.驱动程序有一个与之关联的 CUDA 驱动程序版本(例如，可以使用 nvidia-smi 查询).CUDA 运行时也有一个与之关联的 CUDA 运行时版本.两者不一定在所有情况下都匹配.

Note that although this question mostly has linux in view, the same concepts apply to windows CUDA installs. The driver has a CUDA driver version associated with it (which can be queried with nvidia-smi, for example). The CUDA runtime also has a CUDA runtime version associated with it. The two will not necessarily match in all cases.

在大多数情况下，如果 nvidia-smi 报告的 CUDA 版本在数值上等于或高于 nvcc -V 报告的版本，这不是原因为关注.这是 CUDA 中定义的兼容性路径(较新的驱动程序/驱动程序 API 支持较旧的"CUDA 工具包/运行时 API).例如，如果 nvidia-smi 报告 CUDA 10.2，而 nvcc -V 报告 CUDA 10.1，则通常无需担心.它应该可以正常工作，并不一定意味着您在打算安装 CUDA 10.1 时实际安装了 CUDA 10.2"

In most cases, if nvidia-smi reports a CUDA version that is numerically equal to or higher than the one reported by nvcc -V, this is not a cause for concern. That is a defined compatibility path in CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime API). For example if nvidia-smi reports CUDA 10.2, and nvcc -V reports CUDA 10.1, that is generally not cause for concern. It should just work, and it does not necessarily mean that you "actually installed CUDA 10.2 when you meant to install CUDA 10.1"

如果 nvcc 命令根本没有报告任何内容(例如 Command 'nvcc' not found...)或者如果它报告了意外的 CUDA 版本，这可能也可能是由于 CUDA 安装不正确，即上述强制性步骤未正确执行.您可以通过使用像 find 或 locate 之类的 linux 实用程序来解决这个问题(请使用手册页来了解如何)来找到您的 nvcc 可执行.假设只有一个，则可以使用它的路径来修复 PATH 环境变量.CUDA linux 安装指南也解释了如何设置.您可能需要调整 PATH 变量中的 CUDA 版本以匹配您想要/安装的实际 CUDA 版本.

If nvcc command doesn't report anything at all (e.g. Command 'nvcc' not found...) or if it reports an unexpected CUDA version, this may also be due to an incorrect CUDA install, i.e the mandatory steps mentioned above were not performed correctly. You can start to figure this out by using a linux utility like find or locate (use man pages to learn how, please) to find your nvcc executable. Assuming there is only one, the path to it can then be used to fix your PATH environment variable. The CUDA linux install guide also explains how to set this. You may need to adjust the CUDA version in the PATH variable to match your actual CUDA version desired/installed.

同样，使用 docker 时，nvidia-smi 命令一般会报告安装在基础机器上的驱动版本，而其他版本方法如 nvcc --version 会报告安装在 docker 容器中的 CUDA 版本.

Similarly, when using docker, the nvidia-smi command will generally report the driver version installed on the base machine, whereas other version methods like nvcc --version will report the CUDA version installed inside the docker container.

同样，如果您对 CUDA工具包"使用了另一种安装方法.例如 Anaconda，您可能会发现 Anaconda 指示的版本不匹配".nvidia-smi 指示的版本.但是，上述评论仍然适用.Anaconda 安装的较旧的 CUDA 工具包可以与 nvidia-smi 报告的较新版本一起使用，并且 nvidia-smi 报告的 CUDA 版本比已安装的更新/更高这一事实by Anaconda 并不意味着您有安装问题.

Similarly, if you have used another installation method for the CUDA "toolkit" such as Anaconda, you may discover that the version indicated by Anaconda does not "match" the version indicated by nvidia-smi. However the above comments still apply. Older CUDA toolkits installed by Anaconda can be used with newer versions reported by nvidia-smi, and the fact that nvidia-smi reports a newer/higher CUDA version than the one installed by Anaconda does not mean you have an installation problem.

这里是另一个涵盖类似领域的问题.上述处理绝不表明此答案仅适用于您有意或无意安装了多个 CUDA 版本.任何时候安装 CUDA 时都会出现这种情况.nvcc 和 nvidia-smi 报告的版本可能不匹配，这是预期行为，在大多数情况下很正常.

Here is another question that covers similar ground. The above treatment does not in any way indicate that this answer is only applicable if you have installed multiple CUDA versions instentionally or unintentionally. The situation presents itself any time you install CUDA. The version reported by nvcc and nvidia-smi may not match, and that is expected behavior and in most cases quite normal.

                        这篇关于nvcc 和 NVIDIA-smi 显示的不同 CUDA 版本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！