本文介绍了nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力获取一个依赖TensorFlow的应用程序,该应用程序可以与 nvidia-docker 一起用作Docker容器。我已经在 tensorflow / tensorflow:latest-gpu-py3 映像的顶部编译了我的应用程序。我使用以下命令运行Docker容器:

I have been working on getting an application that relies on TensorFlow to work as a docker container with nvidia-docker. I have compiled my application on top of the tensorflow/tensorflow:latest-gpu-py3 image. I run my docker container with the following command:

sudo nvidia-docker run -d -p 9090:9090 -v / src / weights: / weights myname / myrepo:mylabel

通过 portainer 查看日志时,我看到以下内容:

When looking at the logs through portainer I see the following:

2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1
2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1
2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0
2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.57  Mon Oct  3 20:37:01 PDT 2016
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) 
"""
2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0

The容器似乎可以正常启动,并且我的应用程序确实正在运行。当我向其发送预测请求时,预测会正确返回-但是在CPU上进行推理时,我期望的速度很慢,因此,我认为很明显,由于某种原因未使用GPU。我还尝试过在同一容器中运行 nvidia-smi ,以确保它可以看到我的GPU,并且这些结果是这样的:

The container does seem to start properly, and my application does appear to be running. When I send requests to it for predictions the predictions are returned correctly - however at the slow speed I would expect when running inference on the CPU, so I think it's pretty clear that the GPU is not being used for some reason. I've also tried running nvidia-smi from within that same container to make sure it is seeing my GPU and these are the results for that:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K1             Off  | 0000:00:07.0     Off |                  N/A |
| N/A   28C    P8     7W /  31W |     25MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

我当然不是专家-但确实可以从容器内部看到GPU。关于如何使用TensorFlow进行操作的任何想法?

I'm certainly no expert in this - but it does appear that the GPU is visible from inside the container. Any ideas on how to get this working with TensorFlow?

推荐答案

我尝试安装nvidia-modrpobe,但仍然存在相同的错误。
然后一个简单的系统重启对我有用

I tried installing nvidia-modrpobe, but still the same error.Then a simple system reboot worked for me

这篇关于nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 02:11