本文介绍了使用libsvm进行文本分类C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用libsvm预测情绪.我想知道输入必须采用什么格式假设我正在使用字数统计.

I am using libsvm to predict sentiment. I wanted to know what format the input has to be inassuming I am using word count.

     [label] [index]:[value] [index]:[value]

这是libsvm中的必需格式.那是否意味着我只有两个标签(一个用于正数,一个用于负数),索引将是该标签下的每个单词,而值将是每个单词的出现频率?

That is required format from libsvm. So does that mean I just have two labels ( one for positive and one for negative), the index would be each word under that label and the value would be the frequency of each word ?

这是否还意味着我需要存储单词到索引的映射以在测试集中使用?

Does this also mean I need to store the mapping of word to index to use in my test set ?

推荐答案

LIBSVM使用所谓的稀疏"格式,其中不需要存储零值.因此,具有属性的数据
5 0 2 0
表示为
1:5 3:2
因此,您只需要指定 nonzero 属性的 index value .

LIBSVM uses the so called "sparse" format where zero values do not need to be stored. Hence a data with attributes
5 0 2 0
is represented as
1:5 3:2
Therefore, you only need to specifiy the index and the value of nonzero attributes.

标签位于第一列.对于二进制情况,您可以将+1用于正样本,将-1用于负样本.顺便说一句,您不仅限于2个标签.您可以使用其他数字(例如1,2,3,4,5,...)

Labels stand in the first column. For binary cases you may use +1 for positive and -1 for negative samples. By the way, you are not limited to only 2 labels. You can use other numbers (e.g. 1,2,3,4,5,...)

这篇关于使用libsvm进行文本分类C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:39