本文介绍了Sklearn LinearSVC 库中惩罚和损失参数的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对SVM理论不是很熟悉,并且在python中使用了这个LinearSVC类:

I'm not very familiar with SVM Theory and I'm using this LinearSVC class in python:

http://scikit-Learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC

我想知道惩罚和损失参数之间有什么区别?

I was wondering what is the difference between penalty and loss parameters?

推荐答案

在机器学习中,损失函数衡量解决方案的质量,而惩罚函数则施加了一些约束在您的解决方案上.

In machine learning, loss function measures the quality of your solution, while penalty function imposes some constraints on your solution.

具体来说,假设 X 是您的数据,而 y 是您的数据标签.然后损失函数V(f(X),y)测量模型 f 将数据映射到标签的程度.在此,f(X)是预测标签的向量.

Specifically, Let X be your data, and y be labels of your data. Then loss function V(f(X),y) measures how well your model f maps your data to the labels. Here, f(X) is a vector of predicted labels.

L1和L2规范是常用且直观理解的损失函数(请参阅 * ). L1规范: V(f(X),y)= | f(x1)-y1 |+ ... + | f(xn)-yn | ,其中f(xi)-第i个对象的预测标签,而yi是实际标签. L2范数: V(f(X),y)= sqrt(| f(x1)-y1 | ^ 2 + ... + | f(xn)-yn | ^ 2),其中sqrt是平方根.

L1 and L2 norms are commonly used and intuitively understood loss functions (see *). L1 norm: V(f(X),y) = |f(x1) - y1| + ... + |f(xn) - yn|, where f(xi) - the predicted label of the i-th object, and yi is the actual label. L2 norm: V(f(X),y) = sqrt(|f(x1) - y1|^2 + ... + |f(xn) - yn|^2) , where sqrt is square root.

对于惩罚函数,它用于对解决方案 f 施加一些约束 R(f).L1范数可以是 R(f)= | f1 |+ ... + | fm | ,同样,您可以定义L2范数.这里, f1,...,fm 是模型的系数.您最初并不了解它们,这些是机器学习算法从数据中获悉的值.

As for penalty function, it is used to impose some constraints R(f) on the your solution f. The L1 norm could be R(f)=|f1| + ... + |fm|, and similarly you can define L2 norm. Here, f1,..., fm are the coefficients of the model. You don't know them initially, these are the values that are being learned from your data by a machine learning algorithm.

最终,总成本函数为 V(f(X),y)+ lambda * R(f).我们的目标是找到可以使成本函数最小化的f.然后,该f将用于对新的看不见的物体进行预测.为什么我们需要惩罚函数?事实证明,惩罚函数可以为您的解决方案添加一些不错的属性.例如,当您具有太多功能时,L1规范会生成稀疏解,从而有助于防止过拟合.

Eventually, the overall cost function is V(f(X),y) + lambda*R(f). And the goal is to find f which would minimize your cost function. Then this f will be used to make predictions for the new unseen objects. Why do we need a penalty function? It turns out, penalty function may add some nice properties to your solution. For example, when you have too many features, L1 norm helps to prevent overfitting, by generating sparse solutions.

* 这并不完全是支持向量机的工作原理,但可能会让您对这些术语的含义有所了解.例如,在SVM中,使用L1铰链损耗和L2铰链损耗函数. L1铰链: V(f(X),y)= max(0,1-y1 * f(x1))+ ... + max(0,1-yn * f(xn)),和L2相似,但具有平方项.您可以在 Andrew Ng在Coursera上进行的机器学习课程中找到有关ML的很好的介绍

* This is not exactly how support vector machines works, but might give you some idea of what these terms mean. For example, in SVM, L1-hinge loss and L2-hinge loss functions are used. L1-hinge:V(f(X),y) = max(0,1 - y1*f(x1)) + ... + max(0,1 - yn*f(xn)), and L2 is similar but with squared terms. You may find a good introduction to ML in Machine Learning class by Andrew Ng on Coursera

这篇关于Sklearn LinearSVC 库中惩罚和损失参数的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-01 07:54