torch CrossEntropyLoss nan

During the training, the loss is Nan. The loss function is Torch.nn.CrossEntropyLoss.

 

Solution:

The NaNs appear, because softmax + log separately can be a numerically unstable operation.

If you’re using CrossEntropyLoss for training, you could use the F.log_softmax function at the end of your model and use NLLLoss. The loss will be equivalent, but much more stable.

 

转载本文请联系原作者获取授权,同时请注明本文来自高琳琳科学网博客。

链接地址:

10-03 16:10