torch CrossEntropyLoss nan
During the training, the loss is Nan. The loss function is Torch.nn.CrossEntropyLoss.
Solution:
The NaNs appear, because softmax + log separately can be a numerically unstable operation.
If you’re using CrossEntropyLoss
for training, you could use the F.log_softmax
function at the end of your model and use NLLLoss
. The loss will be equivalent, but much more stable.
转载本文请联系原作者获取授权,同时请注明本文来自高琳琳科学网博客。
链接地址: