本文介绍了ConvergenceWarning:lbfgs 未能收敛(状态 = 1):停止:总共没有.达到限制的迭代次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由数字和分类数据组成的数据集,我想根据患者的医疗特征预测其不良结果.我为我的数据集定义了一个预测管道,如下所示:

I have a dataset consisting of both numeric and categorical data and I want to predict adverse outcomes for patients based on their medical characteristics. I defined a prediction pipeline for my dataset like so:

X = dataset.drop(columns=['target'])
y = dataset['target']

# define categorical and numeric transformers
numeric_transformer = Pipeline(steps=[
    ('knnImputer', KNNImputer(n_neighbors=2, weights="uniform")),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

#  dispatch object columns to the categorical_transformer and remaining columns to numerical_transformer
preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude="object")),
    ('cat', categorical_transformer, selector(dtype_include="object"))
])

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression())])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))

但是,在运行此代码时,我收到以下警告消息:

However, when running this code, I get the following warning message:

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

    model score: 0.988

有人可以向我解释这个警告是什么意思吗?我是机器学习的新手,所以对我可以做些什么来改进预测模型有点迷茫.正如您从 numeric_transformer 中看到的,我通过标准化对数据进行了缩放.我也很困惑,模型得分是多么高,这是好事还是坏事.

Can someone explain to me what this warning means? I am new to machine learning so am a little lost as to what I can do to improve the prediction model. As you can see from the numeric_transformer, I scaled the data through standardisation. I am also confused as to how the model score is quite high and whether this is a good or bad thing.

推荐答案

warning 的意思主要是:建议尝试制作 solver(算法)收敛.

The warning means what it mainly says: Suggestions to try to make the solver (the algorithm) converges.

lbfgs 代表:有限记忆 Broyden–Fletcher–Goldfarb–Shanno 算法".它是 Scikit-Learn 库提供的求解器算法之一.

lbfgs stand for: "Limited-memory Broyden–Fletcher–Goldfarb–Shanno Algorithm". It is one of the solvers' algorithms provided by Scikit-Learn Library.

术语有限内存仅仅意味着它存储仅几个隐式表示梯度近似的向量.

The term limited-memory simply means it stores only a few vectors that represent the gradients approximation implicitly.

它在相对小型的数据集上具有更好的收敛.

It has better convergence on relatively small datasets.

但是什么是算法收敛?

简单来说.如果求解的误差在非常小的范围内(即几乎没有变化),那么这意味着算法达到了解决方案(不必成为最佳解决方案,因为它可能会停留在所谓的本地最优").

In simple words. If the error of solving is ranging within very small range (i.e., it is almost not changing), then that means the algorithm reached the solution (not necessary to be the best solution as it might be stuck at what so-called "local Optima").

另一方面,如果误差变化显着(即使误差相对较小[例如在您的情况下得分很好],而是每次迭代的误差之间的差异大于某个容差),那么我们说算法没有收敛.

On the other hand, if the error is varying noticeably (even if the error is relatively small [like in your case the score was good], but rather the differences between the errors per iteration is greater than some tolerance) then we say the algorithm did not converge.

现在,您需要知道 Scikit-Learn API 有时会为用户提供选项,以指定算法在以迭代方式搜索解决方案时应采用的最大迭代次数:

Now, you need to know that Scikit-Learn API sometimes provides the user the option to specify the maximum number of iterations the algorithm should take while it's searching for the solution in an iterative manner:

LogisticRegression(... solver='lbfgs', max_iter=100 ...)

如您所见,LogisticRegression 中的默认求解器是lbfgs",默认最大迭代次数为 100.

As you can see, the default solver in LogisticRegression is 'lbfgs' and the maximum number of iterations is 100 by default.

最后,请注意,增加最大迭代次数不一定能保证收敛,但肯定会有所帮助!

Final words, please, however, note that increasing the maximum number of iterations does not necessarily guarantee convergence, but it certainly helps!

根据您在下面的评论,一些可能有助于算法收敛的尝试(从很多)技巧是:

Based on your comment below, some tips to try (out of many) that might help the algorithm to converge are:

  • Increase the number of iterations: As in this answer;
  • Try a different optimizer: Look here;
  • Scale your data: Look here;
  • Add engineered features: Look here;
  • Data pre-processing: Look here - use case and here;
  • Add more data: Look here.

这篇关于ConvergenceWarning:lbfgs 未能收敛(状态 = 1):停止:总共没有.达到限制的迭代次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-01 08:40