本文介绍了TensorFlow:如何编写多步衰减的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Caffe 中存在多步衰减.它的计算方式为 base_lr * gamma ^ (floor(step)),其中 step 在每个衰减步骤后递增.例如,对于 [100, 200] 衰减步骤和 global step=101 我想得到 base_lr * gamma ^ 1,对于 globalstep=201 和更多我想得到 base_lr * gamma ^ 2 等等.

There is multistep decay in Caffe. It is calculated as base_lr * gamma ^ (floor(step)) where step is incremented after each of your decay steps. For example with [100, 200] decay steps and global step=101 I want get base_lr * gamma ^ 1, for global step=201 and more I want get base_lr * gamma ^ 2 and so on.

我尝试基于指数衰减源来实现它,但我无能为力.这是指数衰减的代码(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/learning_rate_decay.py#L27):

I tried to implement it based on exponential decay sources but I can do nothing. Here is code of exponential decay (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/learning_rate_decay.py#L27 ):

def exponential_decay(learning_rate, global_step, decay_steps, decay_rate,
staircase=False, name=None):
  with ops.name_scope(name, "ExponentialDecay",
                      [learning_rate, global_step,
                       decay_steps, decay_rate]) as name:
    learning_rate = ops.convert_to_tensor(learning_rate, name="learning_rate")
    dtype = learning_rate.dtype
    global_step = math_ops.cast(global_step, dtype)
    decay_steps = math_ops.cast(decay_steps, dtype)
    decay_rate = math_ops.cast(decay_rate, dtype)
    p = global_step / decay_steps
    if staircase:
      p = math_ops.floor(p)
return math_ops.mul(learning_rate, math_ops.pow(decay_rate, p), name=name)

我必须将 decay_steps 作为某种数组传递 - python 数组或 Tensor.另外我必须(?)通过 current_decay_step (上面公式中的step).

I must pass decay_steps as some sort of array - python array or Tensor. Also I must(?) pass current_decay_step (step in above formula).

第一个选项:在没有张量的纯 python 中非常简单:

First option: In pure python without tensors it is very simple:

decay_steps.append(global_step)
p = sorted(decay_steps).index(global_step) # may be there must be `+1` or `-1`. I hope that main idea is clear

我不能这样做,因为 TF 中没有排序.不知道要花多少时间才能实现.

I cant' do it because there is no sort in TF. I don't know how many time it take to implement it.

第二个选项:类似于下面的代码.由于多种原因,它不起作用.首先,我不知道如何将 args 传递给 tf.cond 中的函数.其次,即使我会传递参数,它也可能不起作用:Can cond supportTF ops 有副作用吗?

Second option: something like code below. It doesn't work for many reasons. Firstly I don't know how to pass args to funtion in tf.cond. Secondly, it may not work even if I will pass args: Can cond support TF ops with side effects?

def new_decay_step(decay_steps):
        decay_steps = decay_steps[1:]
        current_decay_step.assign(current_decay_step + 1)
        return tf.no_op()

tf.cond(tf.greater(tf.shape(decay_steps)[0], 0),
                             tf.cond(tf.greater(global_step, decay_steps[0]), new_decay_step, tf.no_op()),
tf.no_op())

p = current_decay_step

第三个选项:它不起作用,因为我无法使用 tensor[another_tensor] 获取元素.

Third option: It will not work because I can't get element with tensor[another_tensor].

    # if len(decay_steps) > (current_step + 1):
    #    if global_step > decay_steps[current_step + 1]:
    #        current_step += 1


    current_decay_step = tf.cond(tf.greater(tf.shape(current_decay_step)[0], tf.add(current_decay_step,1)),
                                 tf.cond(tf.greater(global_step, decay_steps[tf.add(current_decay_step + 1]), tf.add(current_decay_step,1), tf.add(current_decay_step,0)),
                                 tf.add(current_decay_step, 0)

我能做什么?

UPD:我几乎可以通过第二个选项来实现.

UPD: I almost can make it with second option.

我可以制作

   def nothing: return tf.no_op()
   tf.cond(tf.greater(global_step, decay_steps[0]),
                    functools.partial(new_decay_step, decay_steps),
                    nothing)

但由于某种原因内部 tf.cond 不起作用

But for some reason inner tf.cond doesn't work

对于此代码,我收到错误 fn1 must be callable

For this code I get error fn1 must be callable

   def nothing: return tf.no_op()
   tf.cond(tf.greater(tf.shape(decay_steps)[0], 0),
            tf.cond(tf.greater(global_step, decay_steps[0]),
                    functools.partial(new_decay_step, decay_steps),
                    nothing),
            nothing)

UPD2: 内部 tf.cond 将不起作用,因为它们返回张量并且 args 必须是函数.

UPD2: Inner tf.cond will not work because they return tensor and args must be functions.

我没有检查它,但它似乎有效(至少它不会因错误而崩溃):

I didn't check it but seems like it works (at least it doesn't crash with errors):

 tf.cond(tf.logical_and(tf.greater(tf.shape(decay_steps)[0], 0),  tf.greater(global_step, decay_steps[0])),
                    functools.partial(new_decay_step, decay_steps),
                    nothing)

UPD3:我意识到UPD2中的代码将不起作用,因为我无法更改函数内的列表.

UPD3: I realized that code in UPD2 wil not work because I can't change list inside the function.

我也不知道tf.logical_and的哪些部分被真正执行了.

Also I don't know what parts of tf.logical_and are really executed.

我做了以下代码:

class ohmy:
    def __init__(self, decay_steps):
        self.decay_steps = decay_steps

    def multistep_decay(self, learning_rate, global_step, current_decay_step, decay_steps, decay_rate,
                    staircase=False, name=None):

        learning_rate = tf.convert_to_tensor(learning_rate, name="learning_rate")
        dtype = learning_rate.dtype
        global_step = tf.cast(global_step, dtype)

        decay_rate = tf.cast(decay_rate, dtype)

        def new_step():
            self.decay_steps = self.decay_steps[1:]
            current_decay_step.assign(current_decay_step + 1)
            return current_decay_step

        def curr_step():
            return current_decay_step

        current_decay_step = tf.cond(tf.logical_and(tf.greater(tf.shape(self.decay_steps)[0], 0),  tf.greater(global_step, self.decay_steps[0])),
                new_step,
                curr_step)

        a = tf.Print(global_step, [global_step], "global")
        b = tf.Print(self.decay_steps, [self.decay_steps], "decay_steps")
        c = tf.Print(current_decay_step, [current_decay_step], "step")

        with tf.control_dependencies([a, b, c, current_decay_step]):
            p = current_decay_step

            if staircase:
                p = tf.floor(p)

            return tf.mul(learning_rate, tf.pow(decay_rate, p), name=name)


decay_steps = [3,4,5,6,7]
decay_steps = tf.convert_to_tensor(decay_steps, dtype=tf.float32)
current_decay_step = tf.Variable(0.0, trainable=False)
global_step = tf.Variable(0, trainable=False)
decay_rate = 0.5

c=ohmy(decay_steps)
lr = ohmy.multistep_decay(c, 0.010, global_step, current_decay_step, decay_steps, decay_rate)
#lr = tf.train.exponential_decay(0.001, global_step=global_step, decay_steps=2, decay_rate=0.5, staircase=True)
tf.scalar_summary('learning_rate', lr)

opt = tf.train.AdamOptimizer(lr)
#...train loop and so on

它根本不起作用.这是输出:

It doesn't work at all. Here is output :

I tensorflow/core/kernels/logging_ops.cc:79] step[0]
I tensorflow/core/kernels/logging_ops.cc:79] global[0]
E tensorflow/core/client/tensor_c_api.cc:485] The tensor returned for MergeSummary/MergeSummary:0 was not valid.
Traceback (most recent call last):
  File "flownet_new.py", line 528, in <module>
    summary_str = sess.run(summary_op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 382, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 655, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 723, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 743, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: The tensor returned for MergeSummary/MergeSummary:0 was not valid.

如您所见,没有衰减步骤的输出.我什至无法调试它!

As you can see there is no output of decay steps. I can't even debug it!

现在我绝对不知道如何用一个函数来制作它.顺便说一句,要么我做错了什么,要么 tf.contrib.slim 对学习率衰减不起作用.

Now I definitely don't know how to make it with one function. Btw, either I do something wrong, or tf.contrib.slim doesn't work with learning rate decay.

目前最简单的解决方案是按照 cleros 所说的在火车循环中制作您想要的内容.

For now most simple solution is make what you want in train loop as cleros said.

推荐答案

我在 tensorflow 中寻找这个功能,我发现它可以使用 tf.train.piecewise_constant 轻松实现.这是 tensorflow 的 api_docs 的一个例子:(https://www.tensorflow.org/api_docs/python/tf/train/piecewise_constant)

I was looking for this feature in tensorflow and I found out it can be easily implemented using tf.train.piecewise_constant. Here is an example from tensorflow's api_docs: (https://www.tensorflow.org/api_docs/python/tf/train/piecewise_constant)

示例:前 100000 步使用 1.0 的学习率,100001 到 110000 步使用 0.5,任何其他步骤使用 0.1.

Example: use a learning rate that's 1.0 for the first 100000 steps, 0.5 for steps 100001 to 110000, and 0.1 for any additional steps.

global_step = tf.Variable(0, trainable=False)
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)

之后,每当我们执行优化步骤时,我们都会增加 global_step.

Later, whenever we perform an optimization step, we increment global_step.

这篇关于TensorFlow:如何编写多步衰减的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 19:46