SkikitLearn学习曲线在很大程度上取决于MLPClassifier的批量大小???或者:如何诊断NN的偏倚/差异?

本文介绍了SkikitLearn学习曲线在很大程度上取决于MLPClassifier的批量大小???或者:如何诊断NN的偏倚/差异?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在ScikitLearn中解决两个类的分类问题，其中包括求解器adam和激活relu.为了探索我的分类器是否遭受高偏差或高方差，我使用Scikitlearns内置函数绘制了学习曲线:

https://scikit-learn.org/stable/auto_examples/model_selection /plot_learning_curve.html

我正在使用Group-K_Fold交叉验证和8个拆分.但是，我发现我的学习曲线在很大程度上取决于分类器的批量大小:

应该是这样吗?我认为学习曲线是根据训练数据的一部分来解决精度分数的问题，而不受任何批次/时期的影响?我实际上可以将此内置函数用于批处理方法吗?如果是，我应该选择哪个批次大小(完整的批次或批次大小=训练示例的数量或介于两者之间)以及从中得到什么诊断?还是您通常如何诊断神经网络分类器的偏差/方差问题?

我们将不胜感激！

解决方案

是的，学习曲线取决于批量大小.

最佳批处理大小取决于数据类型和数据总量.
在理想情况下，批处理大小最好为1，但实际上，在具有大量数据的情况下，此方法不可行.
我认为您必须通过实验来做到这一点，因为您无法轻松地计算出最佳值.

此外，当您更改批处理大小时，您可能还希望更改学习率，以便对流程进行控制.
但是确实有一个工具可以找到最佳的(内存和时间)批处理大小，这确实很有趣.

什么是随机梯度下降?

随机梯度下降法，通常缩写为SGD，是梯度下降算法的一种变体，可以计算误差并为训练数据集中的每个示例更新模型.

每个训练示例的模型更新意味着随机梯度下降通常被称为在线机器学习算法.

什么是批次梯度下降?

批处理梯度下降是梯度下降算法的一种变体，该算法计算训练数据集中每个示例的误差，但仅在评估所有训练示例后才更新模型.

整个训练数据集的一个循环称为训练时期.因此，通常会说批次梯度下降是在每个训练时期结束时进行模型更新.

什么是小批量梯度下降?

小批量梯度下降是梯度下降算法的一种变体，该算法将训练数据集分成小批，用于计算模型误差和更新模型系数.

实施可以选择对小批量上的梯度求和，也可以取梯度的平均值，从而进一步减小梯度的方差.

小批量梯度下降试图在随机梯度下降的鲁棒性和间歇梯度下降的效率之间找到平衡.这是深度学习领域中最常见的梯度下降实现.

来源: https://machinelearningmastery. com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/

I am currently working on a classification problem with two classes in ScikitLearn with the solver adam and activation relu. To explore if my classifier suffers from high bias or high variance, I plotted the learning curve with Scikitlearns build-in function:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html

I am using a Group-K_Fold crossvalidation with 8 splits.However, I found that my learning curve is strongly dependent on the batch size of my classifier:

https://imgur.com/a/FOaWKN1

Is it supposed to be like this? I thought learning curves are tackling the accuracy scores dependent on the portion of training data independent from any batches/ epochs? Can I actually use this build-in function for batch methods? If yes, which batch size should I choose (full batch or batch size= number of training examples or something in between) and what diagnosis do I get from this? Or how do you usually diagnose bias/ variance problems of a neural network classifier?

Help would be really appreciated!

解决方案

Yes, the learning curve depends on the batch size.

The optimal batch size depends on the type of data and the total volume of the data.
In ideal case batch size of 1 will be best, but in practice, with big volumes of data, this approach is not feasible.
I think you have to do that through experimentation because you can’t easily calculate the optimal value.

Moreover, when you change the batch size you might want to change the learning rate as well so you want to keep the control over the process.
But indeed having a tool to find the optimal (memory and time-wise) batch size is quite interesting.

What is Stochastic Gradient Descent?

Stochastic gradient descent, often abbreviated SGD, is a variation of the gradient descent algorithm that calculates the error and updates the model for each example in the training dataset.

The update of the model for each training example means that stochastic gradient descent is often called an online machine learning algorithm.

What is Batch Gradient Descent?

Batch gradient descent is a variation of the gradient descent algorithm that calculates the error for each example in the training dataset, but only updates the model after all training examples have been evaluated.

One cycle through the entire training dataset is called a training epoch. Therefore, it is often said that batch gradient descent performs model updates at the end of each training epoch.

What is Mini-Batch Gradient Descent?

Mini-batch gradient descent is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coefficients.

Implementations may choose to sum the gradient over the mini-batch or take the average of the gradient which further reduces the variance of the gradient.

Mini-batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent. It is the most common implementation of gradient descent used in the field of deep learning.

Source: https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/

这篇关于SkikitLearn学习曲线在很大程度上取决于MLPClassifier的批量大小???或者:如何诊断NN的偏倚/差异?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！