如何在AWS p2.xlarge实例，AMI ami-edb11e8d和nvidia驱动程序最新版本的Tensorflow(1.0)的最新版本中安装CUDA 8.0(375.39)

本文介绍了如何在AWS p2.xlarge实例，AMI ami-edb11e8d和nvidia驱动程序最新版本的Tensorflow(1.0)的最新版本中安装CUDA 8.0(375.39)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经升级到Tensorflow版本1.0，并安装了具有cudnn 5.1版本和最新375.39的nvidia驱动程序的CUDA 8.0.我的NVIDIA硬件是使用p2.xlarge实例(特斯拉K-80)在Amazon Web Services上使用的硬件.我的操作系统是Linux 64位.

I have upgraded to Tensorflow version 1.0 and installed CUDA 8.0 with the cudnn 5.1 version and the nvidia drivers up to date 375.39. My NVIDIA hardware is the one that is on Amazon Web Services using the p2.xlarge instance, a Tesla K-80. My OS is Linux 64-bit.

每次使用以下命令，我都会收到下一条错误消息:tf.Session()

I get the next error message every time I use the command: tf.Session()

[ec2-user@ip-172-31-7-96 CUDA]$ python
Python 2.7.12 (default, Sep  1 2016, 22:14:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
>>> sess = tf.Session()
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: ip-172-31-7-96
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: ip-172-31-7-96
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got "1"
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017
GCC version:  gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 375.39.0

我对如何解决这个问题一无所知.我尝试了Nvidia驱动程序和CUDA的不同版本，但仍然无法正常工作.

I'm completely clueless about how to fix this. I have tried different versions of Nvidia drivers and CUDA but still it does not work.

任何提示将不胜感激.

推荐答案

您需要安装NVIDIA驱动程序并运行CUDA 8.0安装程序.

You need to install a NVIDIA Driver and run the CUDA 8.0 installer.

# Requirements
# - NVIDIA Driver - NVIDIA-Linux-x86_64-375.39.run - http://www.nvidia.fr/Download/index.aspx
# - CUDA runfile (local) - cuda_8.0.61_375.26_linux.run - https://developer.nvidia.com/cuda-downloads
# - cudnn-8.0-linux-x64-v5.0-ga.tgz

sudo apt update -y && sudo apt upgrade -y
sudo apt install build-essential linux-image-extra-`uname -r` -y

chmod +x NVIDIA-Linux-x86_64-375.39.run
sudo ./NVIDIA-Linux-x86_64-375.39.run

chmod +x cuda_8.0.61_375.26_linux.run
./cuda_8.0.61_375.26_linux.run --extract=`pwd`/extracts
sudo ./extracts/cuda-linux64-rel-8.0.61-21551265.run

echo -e "export CUDA_HOME=/usr/local/cuda\nexport PATH=\$PATH:\$CUDA_HOME/bin\nexport LD_LIBRARY_PATH=\$LD_LINKER_PATH:\$CUDA_HOME/lib64" >> ~/.bashrc
source .bashrc

tar xf cudnn-8.0-linux-x64-v5.0-ga.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/

这篇关于如何在AWS p2.xlarge实例，AMI ami-edb11e8d和nvidia驱动程序最新版本的Tensorflow(1.0)的最新版本中安装CUDA 8.0(375.39)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！