计算复合损失函数各部分的梯度范数

本文介绍了计算复合损失函数各部分的梯度范数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有以下损失函数:

loss_a = tf.reduce_mean(my_loss_fn(model_output, targets))loss_b = tf.reduce_mean(my_other_loss_fn(model_output, 目标))loss_final = loss_a + tf.multiply(alpha, loss_b)

为了可视化梯度 w.r.t 到 loss_final 的范数，可以这样做:

optimizer = tf.train.AdamOptimizer(learning_rate=0.001)grads_and_vars = optimizer.compute_gradients(loss_final)毕业，_ = 列表(zip(*grads_and_vars))规范 = tf.global_norm(grads)gradnorm_s = tf.summary.scalar('梯度范数', 范数)train_op = optimizer.apply_gradients(grads_and_vars, name='train_op')

但是，我想分别绘制梯度 w.r.t 到 loss_a 和 loss_b 的范数.我怎样才能以最有效的方式做到这一点?我是否必须分别在 loss_a 和 loss_b 上调用 compute_gradients(..) 然后将这两个渐变加在一起，然后再将它们传递给 optimizer.apply_gradients(..)?我知道由于求和规则，这在数学上是正确的，但这似乎有点麻烦，而且我也不知道如何正确实现梯度求和.另外，loss_final 相当简单，因为它只是一个总和.如果 loss_final 更复杂，例如一个部门?

我使用的是 Tensorflow 0.12.

解决方案

您说得对，合并渐变可能会变得混乱.相反，只需计算每个损失的梯度以及最终损失.因为tensorflow优化了

Assume I have the following loss function:

loss_a = tf.reduce_mean(my_loss_fn(model_output, targets))
loss_b = tf.reduce_mean(my_other_loss_fn(model_output, targets))
loss_final = loss_a + tf.multiply(alpha, loss_b)

To visualize the norm of the gradients w.r.t to loss_final one could do this:

optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
grads_and_vars = optimizer.compute_gradients(loss_final)
grads, _ = list(zip(*grads_and_vars))
norms = tf.global_norm(grads)
gradnorm_s = tf.summary.scalar('gradient norm', norms)
train_op = optimizer.apply_gradients(grads_and_vars, name='train_op')

However, I would like to plot the norm of the gradients w.r.t to loss_a and loss_b separately. How can I do this in the most efficient way? Do I have to call compute_gradients(..) on both loss_a and loss_b separately and then add those two gradients together before passing them to optimizer.apply_gradients(..)? I know that this would mathematically be correct due to the summation rule, but it just seems a bit cumbersome and I also don't know how you would implement the summation of the gradients correctly. Also, loss_final is rather simple, because it's just a summation. What if loss_final was more complicated, e.g. a division?

I'm using Tensorflow 0.12.

解决方案

You are right that combining gradients could get messy. Instead just compute the gradients of each of the losses as well as the final loss. Because tensorflow optimizes the directed acyclic graph (DAG) before compilation, this doesn't result in duplication of work.

For example:

import tensorflow as tf

with tf.name_scope('inputs'):
    W = tf.Variable(dtype=tf.float32, initial_value=tf.random_normal((4, 1), dtype=tf.float32), name='W')
    x = tf.random_uniform((6, 4), dtype=tf.float32, name='x')

with tf.name_scope('outputs'):
    y = tf.matmul(x, W, name='y')

def my_loss_fn(output, targets, name):
    return tf.reduce_mean(tf.abs(output - targets), name=name)

def my_other_loss_fn(output, targets, name):
    return tf.sqrt(tf.reduce_mean((output - targets) ** 2), name=name)

def get_tensors(loss_fn):

    loss = loss_fn(y, targets, 'loss')
    grads = tf.gradients(loss, W, name='gradients')
    norm = tf.norm(grads, name='norm')

    return loss, grads, norm

targets = tf.random_uniform((6, 1))

with tf.name_scope('a'):
    loss_a, grads_a, norm_a = get_tensors(my_loss_fn)

with tf.name_scope('b'):
    loss_b, grads_b, norm_b = get_tensors(my_loss_fn)

with tf.name_scope('combined'):
    loss = tf.add(loss_a, loss_b, name='loss')
    grad = tf.gradients(loss, W, name='gradients')

with tf.Session() as sess:
    tf.global_variables_initializer().run(session=sess)

    writer = tf.summary.FileWriter('./tensorboard_results', sess.graph)
    res = sess.run([norm_a, norm_b, grad])

    print(*res, sep='\n')

Edit: In response to your comment... You can check the DAG of a tensorflow model using tensorboard. I've updated the code to store the graph.

Run tensorboard --logdir $PWD/tensorboard_results in a terminal and navigate to the url printed on the commandline (typically http://localhost:6006/). Then click on GRAPH tab to view the DAG. You can recursively expand the tensors, ops, namespaces to see subgraphs to see individual operations and their inputs.

这篇关于计算复合损失函数各部分的梯度范数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！