问题描述
我是Python的新手,一直在尝试找出很多单词.我按如下方式使用vectorizer.fit_transform函数
I am new to Python and have been trying to find out bag of words. I used vectorizer.fit_transform function as follows
vectorizer = CountVectorizer(vocabulary=set_of_words, tokenizer=nltk.word_tokenize)
bag_of_words = vectorizer.fit_transform(doc).toarray().astype(np.float64)
其中doc包含要提取其词袋的文本.
where doc contains the text whose bag of words is to be extracted.
并且我得到如下警告:/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2499:此处VisibleDeprecationWarning:
rank is deprecated; use the
ndim attribute or function instead. To find the rank of a matrix see
numpy. linalg.matrix_rank`. VisibleDeprecationWarning)
and i get a warning as follows:/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2499: hereVisibleDeprecationWarning:
rankis deprecated; use the
ndimattribute or function instead. To find the rank of a matrix see
numpy.linalg.matrix_rank`. VisibleDeprecationWarning)
在显示矢量化器时,我会得到类似的东西
On displaying vectorizer I get something like this
CountVectorizer(analyzer=u'word', binary=False, charset=None,
charset_error=None, decode_error=u'strict',
dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
lowercase=True, max_df=1.0, max_features=None, min_df=1,
ngram_range=(1, 1), preprocessor=None, stop_words=None,
strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',
tokenizer=<function word_tokenize at 0xafbc6f4>,
vocabulary=[u'dissolution', u'comparatively', u'desirable', u'four', u'obstruction', u'nursery', u'perverted', u'appetite', u'repress', u'consider'])
推荐答案
您是否正在使用Scipy/Scite并遇到此错误 https://github.com/scikit-learn/scikit-learn/issues/3866 吗?
Are you using Scipy / Scite and hitting this bug https://github.com/scikit-learn/scikit-learn/issues/3866 ?
这篇关于Python中的VisibleDeprecationWarning的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!