本文介绍了nvcc和NVIDIA-smi显示的不同CUDA版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对运行 nvcc 和<$ c $ nvidia-smi 显示的不同CUDA版本感到困惑。 p>

我在Ubuntu 16.04上同时安装了cuda9.2和cuda10。现在,我将PATH设置为指向cuda9.2。所以当我运行时:

  $其中nvcc 
/usr/local/cuda-9.2/bin/nvcc

但是,当我运行

  $ nvidia-smi 
2018年11月21日星期三19:41:32
+ ---------------------- -------------------------------------------------- ----- +
| NVIDIA-SMI 410.72驱动程序版本:410.72 CUDA版本:10.0 |
| ------------------------------- + ------------- --------- + ---------------------- +
| GPU名称持久性-M |总线编号Disp.A |挥发性不佳。 ECC |
|风扇温度性能:用法/上限|内存使用| GPU实用计算M。
| =============================== + ============ ========= + ===================== |
| 0 GeForce GTX 106 ...关闭| 00000000:01:00.0关闭| N / A |
|不适用53C P0 26W /不适用| 379MiB / 6078MiB | 2%违约|
+ ------------------------------- + ------------- --------- + ---------------------- +

+ -------- -------------------------------------------------- ------------------- +
|进程:GPU内存|
| GPU PID类型进程名称用法|
| ============================================= =============================== |
| 0 1324 G / usr / lib / xorg / Xorg 225MiB |
| 0 2844 G compiz 146MiB |
| 0 15550 G / usr / lib / firefox / firefox 1MiB |
| 0 19992 G / usr / lib / firefox / firefox 1MiB |
| 0 23605 G / usr / lib / firefox / firefox 1MiB |

所以我将cuda9.2用作 nvcc 建议,还是我将cuda10用于 nvidia-smi 建议?



我,但它并不能直接为混乱提供答案,它只是要求我们重新安装cudatoolkit,而我已经这样做了。

解决方案

CUDA有两个主要API,即运行时API和驱动程序API。两者都有对应的版本(例如8.0、9.0等)


对驱动程序API的必要支持(例如linux上的libcuda.so)是由GPU驱动程序安装程序安装的。


对运行时API的必要支持(例如linux上的libcudart.so,以及 nvcc )由CUDA工具包安装程序安装(


无论如何,(已安装的)驱动程序API版本可能并不总是与(已安装的)运行时API版本匹配,尤其是在安装时独立于安装CUDA(即CUDA工具包)的GPU驱动程序。


nvidia-smi 工具由GPU驱动程序安装程序安装,并且通常具有GPU驱动程序,而不是CUDA工具包安装程序安装的任何东西。


最近(在Linux上,驱动程序版本介于410.48和410.73之间)在NVIDIA上具有强大的功能决定在输出中添加由驱动程序安装的CUDA驱动程序API版本的报告t来自 nvidia-smi


这与已安装的CUDA运行时版本无关。


nvcc (随CUDA工具包一起安装的CUDA编译器驱动程序工具)将始终报告其可识别的CUDA运行时版本。它对安装什么驱动程序版本一无所知,即使安装了GPU驱动程序也不知道。


因此,根据设计,这两个数字不一定匹配,因为它们是反映了两种不同的情况。


如果您想知道为什么 nvcc -V 显示的是您没想到的CUDA版本(例如它显示的版本不是您认为已安装的版本,或者根本不显示任何内容,这可能是因为您没有遵循步骤7(CUDA 11之前)(或步骤6)中的强制性说明在


请注意,尽管此问题主要针对Linux,但相同的概念也适用于 windows CUDA安装。该驱动程序具有与之关联的CUDA驱动程序版本(例如,可以使用 nvidia-smi 查询)。 CUDA运行时还具有与之关联的CUDA运行时版本。


在大多数情况下,如果 nvidia-smi 报告的CUDA版本在数值上是相等的达到或高于 nvcc -V 所报告的值,这无需引起关注。那是CUDA(较新的驱动程序/驱动程序API支持较旧的 CUDA工具包/运行时API)中定义的兼容性路径。例如,如果 nvidia-smi 报告CUDA 10.2,而 nvcc -V 报告CUDA 10.1,则通常不会导致关心。它应该可以正常工作,并且不一定意味着您当您打算安装CUDA 10.1时实际上已经安装了CUDA 10.2。


如果 nvcc 命令根本不报告任何内容(例如,未找到命令'nvcc'... ),这也可能是由于不正确的CUDA安装所致,例如上述强制性步骤未正确执行。您可以通过使用 find locate 之类的linux实用工具开始弄清楚(使用手册页了解如何,请)找到您的 nvcc 可执行文件。假设只有一个,则可以使用它的路径来修复PATH环境变量。


I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi.

I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run :

 $ which nvcc
 /usr/local/cuda-9.2/bin/nvcc

However, when I run

$ nvidia-smi
Wed Nov 21 19:41:32 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72       Driver Version: 410.72       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    26W /  N/A |    379MiB /  6078MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1324      G   /usr/lib/xorg/Xorg                           225MiB |
|    0      2844      G   compiz                                       146MiB |
|    0     15550      G   /usr/lib/firefox/firefox                       1MiB |
|    0     19992      G   /usr/lib/firefox/firefox                       1MiB |
|    0     23605      G   /usr/lib/firefox/firefox                       1MiB |

So am I using cuda9.2 as which nvcc suggests, or am I using cuda10 as nvidia-smi suggests?

I saw this answer but it does not provide direct answer to the confusion, it just asks us to reinstall the cudatoolkit, which I already did.

解决方案

CUDA has 2 primary APIs, the runtime and the driver API. Both have a corresponding version (e.g. 8.0, 9.0, etc.)

The necessary support for the driver API (e.g. libcuda.so on linux) is installed by the GPU driver installer.

The necessary support for the runtime API (e.g. libcudart.so on linux, and also nvcc) is installed by the CUDA toolkit installer (which may also have a GPU driver installer bundled in it).

In any event, the (installed) driver API version may not always match the (installed) runtime API version, especially if you install a GPU driver independently from installing CUDA (i.e. the CUDA toolkit).

The nvidia-smi tool gets installed by the GPU driver installer, and generally has the GPU driver in view, not anything installed by the CUDA toolkit installer.

Recently (somewhere between 410.48 and 410.73 driver version on linux) the powers-that-be at NVIDIA decided to add reporting of the CUDA Driver API version installed by the driver, in the output from nvidia-smi.

This has no connection to the installed CUDA runtime version.

nvcc, the CUDA compiler-driver tool that is installed with the CUDA toolkit, will always report the CUDA runtime version that it was built to recognize. It doesn't know anything about what driver version is installed, or even if a GPU driver is installed.

Therefore, by design, these two numbers don't necessarily match, as they are reflective of two different things.

If you are wondering why nvcc -V displays a version of CUDA you weren't expecting (e.g. it displays a version other than the one you think you installed) or doesn't display anything at all, version wise, it may be because you haven't followed the mandatory instructions in step 7 (prior to CUDA 11) (or step 6 in the CUDA 11 linux install guide) of the cuda linux install guide

Note that although this question mostly has linux in view, the same concepts apply to windows CUDA installs. The driver has a CUDA driver version associated with it (which can be queried with nvidia-smi, for example). The CUDA runtime also has a CUDA runtime version associated with it. The two will not necessarily match in all cases.

In most cases, if nvidia-smi reports a CUDA version that is numerically equal to or higher than the one reported by nvcc -V, this is not a cause for concern. That is a defined compatibility path in CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime API). For example if nvidia-smi reports CUDA 10.2, and nvcc -V reports CUDA 10.1, that is generally not cause for concern. It should just work, and it does not necessarily mean that you "actually installed CUDA 10.2 when you meant to install CUDA 10.1"

If nvcc command doesn't report anything at all (e.g. Command 'nvcc' not found...), this may also be due to an incorrect CUDA install, i.e the mandatory steps mentioned above were not performed correctly. You can start to figure this out by using a linux utility like find or locate (use man pages to learn how, please) to find your nvcc executable. Assuming there is only one, the path to it can then be used to fix your PATH environment variable.

这篇关于nvcc和NVIDIA-smi显示的不同CUDA版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-16 00:35