本文介绍了无法使用sklearn加载'mnist-original'数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题类似于此处的内容此处.不幸的是,就我而言,建议的解决方案无法解决问题.

This question is similar to what asked here and here. Unfortunately, in my case the suggested solution didn't fix the problem.

我需要使用MNIST数据集,但是即使指定了scikit_learn_data/mldata/文件夹的地址也无法获取(请参见下文).我该如何解决?

I need to work with the MNIST dataset but I can't fetch it, even if I specify the address of the scikit_learn_data/mldata/ folder (see below). How can I fix this?

以防万一,我正在使用Anaconda.

In case it might help, I'm using Anaconda.

代码:

from sklearn.datasets.mldata import fetch_mldata

dataset = fetch_mldata('mnist-original', data_home='/Users/michelangelo/scikit_learn_data/mldata/')
mnist = fetch_mldata('MNIST original')

错误:

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-5-dc4d45bc928e> in <module>()
----> 1 mnist = fetch_mldata('MNIST original')

/Users/michelangelo/anaconda2/lib/python2.7/site-packages/sklearn/datasets/mldata.pyc in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)
    168     # load dataset matlab file
    169     with open(filename, 'rb') as matlab_file:
--> 170         matlab_dict = io.loadmat(matlab_file, struct_as_record=True)
    171
    172     # -- extract data from matlab_dict

/Users/michelangelo/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio.pyc in loadmat(file_name, mdict, appendmat, **kwargs)
    134     variable_names = kwargs.pop('variable_names', None)
    135     MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 136     matfile_dict = MR.get_variables(variable_names)
    137     if mdict is not None:
    138         mdict.update(matfile_dict)

/Users/michelangelo/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.pyc in get_variables(self, variable_names)
    290                 continue
    291             try:
--> 292                 res = self.read_var_array(hdr, process)
    293             except MatReadError as err:
    294                 warnings.warn(

/Users/michelangelo/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.pyc in read_var_array(self, header, process)
    250            `process`.
    251         '''
--> 252         return self._matrix_reader.array_from_header(header, process)
    253
    254     def get_variables(self, variable_names=None):

mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.array_from_header()

mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.array_from_header()

mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex()

mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_numeric()

mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_element()

streams.pyx in scipy.io.matlab.streams.FileStream.read_string()

IOError: could not read bytes

推荐答案

我刚刚遇到了同样的问题,花了一些时间才找到问题.一个原因是,首次下载期间数据可能会损坏.删除缓存的数据.找到scikit数据主目录,如下所示:

I just faced the same issue and it took me some time to find the problem. One reason is, data can be corrupted during the first download. Remove the cached data. Find the scikit data home dir as follows:

from sklearn.datasets.base import get_data_home
print (get_data_home())

清理目录并重新下载数据集.这个解决方案对我有用.以供参考: https://github.com/ageron/handson-ml/issues/143

Clean the directory and redownload the dataset. This solution works for me.For reference:https://github.com/ageron/handson-ml/issues/143

这也与以下问题有关:如何在sklearn中使用datasets.fetch_mldata()? /a>

This is also related with the following question:How to use datasets.fetch_mldata() in sklearn?

这篇关于无法使用sklearn加载'mnist-original'数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!