问题描述
此问题类似于此处的内容和此处.不幸的是,就我而言,建议的解决方案无法解决问题.
This question is similar to what asked here and here. Unfortunately, in my case the suggested solution didn't fix the problem.
我需要使用MNIST数据集,但是即使指定了scikit_learn_data/mldata/
文件夹的地址也无法获取(请参见下文).我该如何解决?
I need to work with the MNIST dataset but I can't fetch it, even if I specify the address of the scikit_learn_data/mldata/
folder (see below). How can I fix this?
以防万一,我正在使用Anaconda.
In case it might help, I'm using Anaconda.
代码:
from sklearn.datasets.mldata import fetch_mldata
dataset = fetch_mldata('mnist-original', data_home='/Users/michelangelo/scikit_learn_data/mldata/')
mnist = fetch_mldata('MNIST original')
错误:
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-5-dc4d45bc928e> in <module>()
----> 1 mnist = fetch_mldata('MNIST original')
/Users/michelangelo/anaconda2/lib/python2.7/site-packages/sklearn/datasets/mldata.pyc in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)
168 # load dataset matlab file
169 with open(filename, 'rb') as matlab_file:
--> 170 matlab_dict = io.loadmat(matlab_file, struct_as_record=True)
171
172 # -- extract data from matlab_dict
/Users/michelangelo/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio.pyc in loadmat(file_name, mdict, appendmat, **kwargs)
134 variable_names = kwargs.pop('variable_names', None)
135 MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 136 matfile_dict = MR.get_variables(variable_names)
137 if mdict is not None:
138 mdict.update(matfile_dict)
/Users/michelangelo/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.pyc in get_variables(self, variable_names)
290 continue
291 try:
--> 292 res = self.read_var_array(hdr, process)
293 except MatReadError as err:
294 warnings.warn(
/Users/michelangelo/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.pyc in read_var_array(self, header, process)
250 `process`.
251 '''
--> 252 return self._matrix_reader.array_from_header(header, process)
253
254 def get_variables(self, variable_names=None):
mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.array_from_header()
mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.array_from_header()
mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex()
mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_numeric()
mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_element()
streams.pyx in scipy.io.matlab.streams.FileStream.read_string()
IOError: could not read bytes
推荐答案
我刚刚遇到了同样的问题,花了一些时间才找到问题.一个原因是,首次下载期间数据可能会损坏.删除缓存的数据.找到scikit数据主目录,如下所示:
I just faced the same issue and it took me some time to find the problem. One reason is, data can be corrupted during the first download. Remove the cached data. Find the scikit data home dir as follows:
from sklearn.datasets.base import get_data_home
print (get_data_home())
清理目录并重新下载数据集.这个解决方案对我有用.以供参考: https://github.com/ageron/handson-ml/issues/143
Clean the directory and redownload the dataset. This solution works for me.For reference:https://github.com/ageron/handson-ml/issues/143
这也与以下问题有关:如何在sklearn中使用datasets.fetch_mldata()? /a>
This is also related with the following question:How to use datasets.fetch_mldata() in sklearn?
这篇关于无法使用sklearn加载'mnist-original'数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!