本文介绍了TypeError:sklearn.feature_extraction.FeatureHasher中需要float的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用sklearn版本0.16.1。看起来FeatureHasher不支持字符串(就像DictVectorizer一样)。
例如:

 值= [
{'city':'Dubai','temperature' :}},
{'city':'London','temperature':12.},
{'city':'San Fransisco','temperature':18.}


print(Starting FeatureHasher ...)
hasher = FeatureHasher(n_features = 2)
X = hasher.transform(values).toarray()
print X

但收到以下错误:

  _hashing.transform(raw_X,self.n_features,self.dtype)
文件_hashing.pyx,行46,位于sklearn.feature_extraction._hashing.transform (sklearn \feature_extraction\_hashing.c:1762)
TypeError:需要一个浮点数

我无法使用因为我的da taset非常大,功能高基数,所以我得到一个MemoryError。
有什么建议?



更新(2016年10月):



NirIzr评论说,现在支持,因为sklearn开发团队在



FeatureHasher应该正确处理从0.18版本开始的字符串字典值。

解决方案

这是一个已知的sklearn问题:
FeatureHasher目前不支持其字典输入格式的字符串值




I'm using sklearn version 0.16.1. It seems that FeatureHasher doesn't support strings (as DictVectorizer does). For example:

values = [
          {'city': 'Dubai', 'temperature': 33.},
          {'city': 'London', 'temperature': 12.},
          {'city': 'San Fransisco', 'temperature': 18.}
          ]

print("Starting FeatureHasher ...")
hasher = FeatureHasher(n_features=2)
X = hasher.transform(values).toarray()
print X

But the following error is received:

    _hashing.transform(raw_X, self.n_features, self.dtype)
  File "_hashing.pyx", line 46, in sklearn.feature_extraction._hashing.transform (sklearn\feature_extraction\_hashing.c:1762)
TypeError: a float is required

I can't use DictVectorizer since my dataset is very big and the features are with high cardinality so I get a MemoryError. Any suggestions?

Update (October 2016):

As NirIzr commented, this is now supported, as sklearn dev team addressed this issue in https://github.com/scikit-learn/scikit-learn/pull/6173

FeatureHasher should properly handle string dictionary values as of version 0.18.

解决方案

It is a known sklearn issue: FeatureHasher does not currently support string values for its dict input format

https://github.com/scikit-learn/scikit-learn/issues/4878

这篇关于TypeError:sklearn.feature_extraction.FeatureHasher中需要float的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:26