如何确定梯度下降算法中的学习率和方差?

Plotting is the best way to see how your algorithm is performing. To see if you have achieved convergence you can plot the evolution of the cost function after each iteration, after a certain given of iteration you will see that it does not improve much you can assume convergence, take a look to the following code:cost_f = []while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance): theta1_last = theta1_guess theta0_last = theta0_guess hypothesis = create_hypothesis(theta1_guess, theta0_guess) cost_f.append((1./(2*m))*sum([ pow(hypothesis(point[0]) - point[1], 2) for point in data])) theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data]) theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])import pylabpylab.plot(range(len(cost_f)), cost_f)pylab.show()将绘制以下图形(以learning_rate = 0.01，方差= 0.00001执行)Which will plot the following graphic (execution with learning_rate=0.01, variance=0.00001)如您所见，经过一千次迭代，您并没有太大的进步.通常，如果成本函数在一次迭代中降低到0.001以下，我会声明收敛，但这只是基于我自己的经验.As you can see, after a thousand iteration you don't get much improvement. I normally declare convergence if the cost function decreases less than 0.001 in one iteration, but this just based on my own experience.对于选择学习率，您可以做的最好的事情就是绘制成本函数并查看其性能，并始终记住以下两点:For choosing learning rate, the best thing you can do is also plot the cost function and see how it is performing, and always remember these two things:如果学习率太小，收敛速度会变慢如果学习率太大，您的成本函数可能不会在每次迭代中都减小，因此不会收敛如果运行代码选择learning_rate> 0.029且方差= 0.001，则您将处于第二种情况，梯度下降不会收敛，而如果您选择的值learning_rate< 0.0001，方差= 0.001，您将看到算法需要大量迭代才能收敛.If you run your code choosing learning_rate > 0.029 and variance=0.001 you will be in the second case, gradient descent doesn't converge, while if you choose values learning_rate < 0.0001, variance=0.001 you will see that your algorithm takes a lot iteration to converge.不收敛示例，其中learning_rate = 0.03Not convergence example with learning_rate=0.03 learning_rate = 0.0001的慢速收敛示例Slow convergence example with learning_rate=0.0001 这篇关于如何确定梯度下降算法中的学习率和方差?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！